| 1 | \documentclass[twoside,a4paper,11pt]{article} |
| 2 | |
| 3 | \usepackage[T1]{fontenc} |
| 4 | \usepackage[utf8x]{inputenc} |
| 5 | \usepackage[ps2pdf]{hyperref} |
| 6 | \usepackage{reqlist} |
| 7 | |
| 8 | \newcommand{\urlink}[1]{\texttt{<#1>}} |
| 9 | \newcommand{\unix}{\textsc{Unix}} |
| 10 | |
| 11 | \title{Dolda Connect protocol} |
| 12 | \author{Fredrik Tolf\\\texttt{<fredrik@dolda2000.com>}} |
| 13 | |
| 14 | \begin{document} |
| 15 | |
| 16 | \maketitle |
| 17 | |
| 18 | \section{Introduction} |
| 19 | Dolda Connect consists partly of a daemon (a.k.a. server) that runs in |
| 20 | the background and carries out all the actual work, and a number of |
| 21 | client programs (a.k.a. user interfaces) that connect to the daemon in |
| 22 | order to tell it what to do. In order for the daemon and the clients |
| 23 | to be able to talk to each other, a protocol is needed. This document |
| 24 | intends to document that protocol, so that third parties can write |
| 25 | their own client programs. |
| 26 | |
| 27 | It is worthy of note that there exists a library, called |
| 28 | \texttt{libdcui} that carries out much of the low level work of |
| 29 | speaking the protocol, facilitating the creation of new client |
| 30 | programs. In itself, \texttt{libdcui} is written in the C programming |
| 31 | language and is intended to be used by other programs written in C, |
| 32 | but there also exist wrapper libraries for both GNU Guile (the GNU |
| 33 | project's Scheme interpreter) and for Python. The former is |
| 34 | distributed with the main Dolda Connect source tree, while the latter |
| 35 | is distributed separately (for technical reasons). To get a copy, |
| 36 | please refer to Dolda Connect's homepage at |
| 37 | \urlink{http://www.dolda2000.com}. |
| 38 | |
| 39 | \section{Transport format} |
| 40 | Note: Everything covered in this section is handled by the |
| 41 | \texttt{libdcui} library. Thus, if you read this because you just want |
| 42 | to write a client, and are using the library (or any of the wrapper |
| 43 | libraries), you can safely skip over this section. It may still be |
| 44 | interesting to read in order to understand the semantics of the |
| 45 | protocol, however. |
| 46 | |
| 47 | The protocol can be spoken over any channel that features a |
| 48 | byte-oriented, reliable virtual (or not) circuit. Usually, it is |
| 49 | spoken over a TCP connection or a byte-oriented \unix\ socket. The |
| 50 | usual port number for TCP connections is 1500, but any port could be |
| 51 | used\footnote{However, port 1500 is what the \texttt{libdcui} library |
| 52 | uses if no port is explicitly stated, so it is probably to be |
| 53 | preferred}. |
| 54 | |
| 55 | \subsection{Informal description} |
| 56 | |
| 57 | On top of the provided byte-oriented connection, the most basic level |
| 58 | of the protocol is a stream of Unicode characters, encoded with |
| 59 | UTF-8. The Unicode stream is then grouped in two levels: lines |
| 60 | consisting of words (a.k.a. tokens). Lines are separated by CRLF |
| 61 | sequences (\emph{not} just CR or LF), and words are separated by |
| 62 | whitespace. Both whitespace and CRLFs can be quoted, however, |
| 63 | overriding their normal interpretation of separators and allowing them |
| 64 | to be parts of words. NUL characters are not allowed to be transferred |
| 65 | at all, but all other Unicode codepoints are allowed. |
| 66 | |
| 67 | Lines transmitted from the daemon to the client are slightly |
| 68 | different, however. They all start with a three-digit code, followed |
| 69 | by either a space or a dash\footnote{Yes, this is inspired by FTP and |
| 70 | SMTP.}, followed by the normal sequence of words. The three-digit |
| 71 | code identifies that type of line. Overall, the protocol is a |
| 72 | lock-step protocol, where the clients sends one line that is |
| 73 | interpreted as a request, and the daemon replies with one or more |
| 74 | lines. In a multi-line response, all lines except the last have the |
| 75 | three-digit code followed by a dash. The last line of a multi-line |
| 76 | response and the only line of a single-line response have the |
| 77 | three-digit code followed by a space. All lines of a multi-line |
| 78 | response have the same three-digit code. The client is not allowed to |
| 79 | send another request until the last line of the previous response has |
| 80 | been received. The exception is that the daemon might send (but only |
| 81 | if the client has requested it to do so) sporadic lines of |
| 82 | asynchronous notification messages. Notification message lines are |
| 83 | distinguished by having their three-digit codes always begin with the |
| 84 | digit 6. Otherwise, the first digit of the three-digit code indicates |
| 85 | the overall success or failure of a request. Codes beginning with 2 |
| 86 | indicate the the request to which they belong succeeded. Codes |
| 87 | beginning with 3 indicate that the request succeeded in itself, but |
| 88 | that it is considered part of a sequence of commands, and that the |
| 89 | sequence still requires additional interaction before considered |
| 90 | successful. Codes beginning with 5 are indication of errors. The |
| 91 | remaining two digits merely distinguish between different |
| 92 | outcomes. Note that notification message lines may come at \emph{any} |
| 93 | time, even in the middle of multiline responses (though not in the |
| 94 | middle of another line). There are no multiline notifications. |
| 95 | |
| 96 | The act of connecting to the daemon is itself considered a request, |
| 97 | solicitating a success or failure response, so it is the daemon that |
| 98 | first transmits actual data. A failure response may be provoked by a |
| 99 | client connecting from a prohibited source. |
| 100 | |
| 101 | Quoting of special characters in words may be done in two ways. First, |
| 102 | the backslash character escapes any special interpretation of the |
| 103 | character that comes after it, no matter where or what the following |
| 104 | character is (it is not required even to be a special |
| 105 | character). Thus, the only way to include a backslash in a word is to |
| 106 | escape it with another backslash. Second, any interpretation of |
| 107 | whitespace may be escaped using the citation mark character (only the |
| 108 | ASCII one, U+0022 -- not any other Unicode quotes), by enclosing a |
| 109 | string containing whitespace in citation marks. (Note that the citation |
| 110 | marks need not necessarily be placed at the word boundaries, so the |
| 111 | string ``\texttt{a"b c"d}'' is parsed as a single word ``\texttt{ab |
| 112 | cd}''.) Technically, this dual layer of quoting may seem like a |
| 113 | liability when implementing the protocol, but it is quite convenient |
| 114 | when talking directly to the daemon with a program such as |
| 115 | \texttt{telnet}. |
| 116 | |
| 117 | \subsection{Formal description} |
| 118 | |
| 119 | Formally, the syntax of the protocol may be defined with the following |
| 120 | BNF rules. Note that they all operate on Unicode characters, not bytes. |
| 121 | |
| 122 | \begin{tabular}{lcl} |
| 123 | <session> & ::= & <SYN> <response> \\ |
| 124 | & & | <session> <transaction> \\ |
| 125 | & & | <session> <notification> \\ |
| 126 | <transaction> & ::= & <request> <response> \\ |
| 127 | <request> & ::= & <line> \\ |
| 128 | <response> & ::= & <resp-line-last> \\ |
| 129 | & & | <resp-line-not-last> <response> \\ |
| 130 | & & | <notification> <response> \\ |
| 131 | <resp-line-last> & ::= & <resp-code> <SPACE> <line> \\ |
| 132 | <resp-line-not-last> & ::= & <resp-code> <DASH> <line> \\ |
| 133 | <notification> & ::= & <notification-code> <SPACE> <line> \\ |
| 134 | <resp-code> & ::= & ``\texttt{2}'' <digit> <digit> \\ |
| 135 | & & | ``\texttt{3}'' <digit> <digit> \\ |
| 136 | & & | ``\texttt{5}'' <digit> <digit> \\ |
| 137 | <notification-code> & ::= & ``\texttt{6}'' <digit> <digit> \\ |
| 138 | <line> & ::= & <CRLF> \\ |
| 139 | & & | <word> <ws> <line> \\ |
| 140 | <word> & ::= & <COMMON-CHAR> \\ |
| 141 | & & | ``\texttt{$\backslash$}'' <CHAR> \\ |
| 142 | & & | ``\texttt{"}'' <quoted-word> ``\texttt{"}'' \\ |
| 143 | & & | <word> <word> \\ |
| 144 | <quoted-word> & ::= & ``'' \\ |
| 145 | & & | <COMMON-CHAR> <quoted-word> \\ |
| 146 | & & | <ws> <quoted-word> \\ |
| 147 | & & | ``\texttt{$\backslash$}'' <CHAR> <quoted-word> \\ |
| 148 | <ws> & ::= & <1ws> | <1ws> <ws> \\ |
| 149 | <1ws> & ::= & <SPACE> | <TAB> \\ |
| 150 | <digit> & ::= & ``\texttt{0}'' | |
| 151 | ``\texttt{1}'' | ``\texttt{2}'' | |
| 152 | ``\texttt{3}'' | ``\texttt{4}'' \\ |
| 153 | & & | ``\texttt{5}'' | ``\texttt{6}'' | |
| 154 | ``\texttt{7}'' | ``\texttt{8}'' | |
| 155 | ``\texttt{9}'' |
| 156 | \end{tabular} |
| 157 | |
| 158 | As for the terminal symbols, <SPACE> is U+0020, <TAB> is U+0009, |
| 159 | <CRLF> is the sequence of U+000D and U+000A, <DASH> is U+002D, <CHAR> |
| 160 | is any Unicode character except U+0000, <COMMON-CHAR> is any |
| 161 | Unicode character except U+0000, U+0009, U+000A, U+000D, U+0020, |
| 162 | U+0022 and U+005C, and <SYN> is the out-of-band message that |
| 163 | establishes the communication channel\footnote{This means that the |
| 164 | communication channel must support such a message. For example, raw |
| 165 | RS-232 would be hard to support.}. The following constraints also |
| 166 | apply: |
| 167 | \begin{itemize} |
| 168 | \item <SYN> and <request> must be sent from the client to the daemon. |
| 169 | \item <response> and <notification> must be sent from the daemon to |
| 170 | the client. |
| 171 | \end{itemize} |
| 172 | Note that the definition of <word> means that the only way to |
| 173 | represent an empty word is by a pair of citation marks. |
| 174 | |
| 175 | In each request line, there should be at least one word, but it is not |
| 176 | considered a syntax error if there is not. The first word in each |
| 177 | request line is considered the name of the command to be carried out |
| 178 | by the daemon. An empty line is a valid request as such, but since no |
| 179 | matching command, it will provoke the same kind of error response as |
| 180 | if a request with any other non-existing command were sent. Any |
| 181 | remaining words on the line are considered arguments to the command. |
| 182 | |
| 183 | \section{Requests} |
| 184 | For each arriving request, the daemon checks so that the request |
| 185 | passes a number of tests before carrying it out. First, it matches the |
| 186 | name of the command against the list of known commands to see if the |
| 187 | request calls a valid command. If the command is not valid, the daemon |
| 188 | sends a reponse with code 500. Then, it checks so that the request has |
| 189 | the minimum required number of parameters for the given command. If it |
| 190 | does not, it responds with a 501 code. Last, it checks so that the |
| 191 | user account issuing the request has the necessary permissions to have |
| 192 | the request carried out. If it does not, it responds with a 502 |
| 193 | code. After that, any responses are individual to the command in |
| 194 | question. The intention of this section is to list them all. |
| 195 | |
| 196 | \subsection{Permissions} |
| 197 | |
| 198 | As for the permissions mentioned above, it is outside the scope of |
| 199 | this document to describe the administration of |
| 200 | permissions\footnote{Please see the \texttt{doldacond.conf(5)} man |
| 201 | page for more information on that topic.}, but some commands require |
| 202 | certain permission, they need at least be specified. When a connection |
| 203 | is established, it is associated with no permissions. At that point, |
| 204 | only requests that do not require any permissions can be successfully |
| 205 | issued. Normally, the first thing a client would do is to authenticate |
| 206 | to the daemon. At the end of a successful authentication, the daemon |
| 207 | associates the proper permissions with the connection over which |
| 208 | authentication took place. The possible permissions are listed in |
| 209 | table \ref{tab:perm}. |
| 210 | |
| 211 | \begin{table} |
| 212 | \begin{tabular}{rl} |
| 213 | Name & General description \\ |
| 214 | \hline |
| 215 | \texttt{admin} & Required for all commands that administer the |
| 216 | daemon. \\ |
| 217 | \texttt{fnetctl} & Required for all commands that alter the state of |
| 218 | connected hubs. \\ |
| 219 | \texttt{trans} & Required for all commands that alter the state of |
| 220 | file transfers. \\ |
| 221 | \texttt{transcu} & Required specifically for cancelling uploads. \\ |
| 222 | \texttt{chat} & Required for exchanging chat messages. \\ |
| 223 | \texttt{srch} & Required for issuing and querying searches. \\ |
| 224 | \end{tabular} |
| 225 | \caption{The list of available permissions} |
| 226 | \label{tab:perm} |
| 227 | \end{table} |
| 228 | |
| 229 | \subsection{Protocol revisions} |
| 230 | \label{rev} |
| 231 | Since Dolda Connect is developing, its command set may change |
| 232 | occasionally. Sometimes new commands are added, sometimes commands |
| 233 | change argument syntax, and sometimes commands are removed. In order |
| 234 | for clients to be able to cleanly cope with such changes, the protocol |
| 235 | is revisioned. When a client connects to the daemon, the daemon |
| 236 | indicates in the first response it sends the range of protocol |
| 237 | revisions it supports, and each command listed below specifies the |
| 238 | revision number from which its current specification is valid. A |
| 239 | client should should check the revision range from the daemon so that |
| 240 | it includes the revision that incorporates all commands that it wishes |
| 241 | to use. |
| 242 | |
| 243 | Whenever the protocol changes at all, it is given a new revision |
| 244 | number. If the entire protocol is backwards compatible with the |
| 245 | previous version, the revision range sent by the server is updated to |
| 246 | extend forward to the new revision. If the protocol in any way is not |
| 247 | compatible with the previous revision, the revision range is moved |
| 248 | entirely to the new revision. Therefore, a client can check for a |
| 249 | certain revision and be sure that everything it wants is supported by |
| 250 | the daemon. |
| 251 | |
| 252 | At the time of this writing, the latest protocol revision is 2. Please |
| 253 | see the file \texttt{doc/protorev} that comes with the Dolda Connect |
| 254 | source tree for a full list of revisions and what changed between |
| 255 | them. |
| 256 | |
| 257 | \subsection{List of commands} |
| 258 | |
| 259 | Follows does a (hopefully) exhaustive listing of all commands valid |
| 260 | for a request. For each possible request, it includes the name of the |
| 261 | command for the request, the permissions required, the syntax for the |
| 262 | entire request line, and the possible responses. |
| 263 | |
| 264 | The syntax of the request and response lines is described in a format |
| 265 | like that traditional of \unix\ man pages, with a number of terms, |
| 266 | each corresponding to a word in the line. Each term in the syntax |
| 267 | description is either a literal string, written in lower case; an |
| 268 | argument, written in uppercase and meant to be replaced by some other |
| 269 | text as described; an optional term, enclosed in brackets |
| 270 | (``\texttt{[}'' and ``\texttt{]}''); or a list of alternatives, |
| 271 | enclosed in braces (``\texttt{\{}'' and ``\texttt{\}}'') and separated |
| 272 | by pipes (``\texttt{|}''). Possible repetition of a term is indicated |
| 273 | by three dots (``\texttt{...}''), and, for the purpose of repition, |
| 274 | terms may be groups with parentheses (``\texttt{(}'' and |
| 275 | ``\texttt{)}''). |
| 276 | |
| 277 | Two things should be noted regarding the responses. First, in the |
| 278 | syntax description of responses, the response code is given as the |
| 279 | first term, even though it is not actually considered a word. Second, |
| 280 | more words may follow after the specified syntax, and should be |
| 281 | discarded by a client. Many responses use that to include a human |
| 282 | readable string to indicate the conclusion of the request. |
| 283 | |
| 284 | \subsubsection{Connection} |
| 285 | As mentioned above, the act of connecting to the daemon is itself |
| 286 | considered a request, soliciting a response. Such a request obviously |
| 287 | has no command name and no syntax, but needs a description |
| 288 | nonetheless. |
| 289 | |
| 290 | \revision{1} |
| 291 | |
| 292 | \noperm |
| 293 | |
| 294 | \begin{responses} |
| 295 | \response{200} |
| 296 | The old response given by daemons not yet using the revisioned |
| 297 | protocol. Clients receiving this response should consider it an |
| 298 | error. |
| 299 | \response{201 LOREV HIREV} |
| 300 | Indicates that the connection is accepted. The \param{LOREV} and |
| 301 | \param{HIREV} parameters specify the range of supported protocol |
| 302 | revisions, as described in section \ref{rev}. |
| 303 | \response{502 REASON} |
| 304 | The connection is refused by the daemon and will be closed. The |
| 305 | \param{REASON} parameter states the reason for the refusal in |
| 306 | English\footnote{So it is probably not suitable for localized |
| 307 | programs}. |
| 308 | \end{responses} |
| 309 | |
| 310 | \input{commands} |
| 311 | |
| 312 | \end{document} |