Commit | Line | Data |
---|---|---|
4ae8ca60 FT |
1 | \documentclass[twoside,a4paper,11pt]{article} |
2 | ||
66e1551f FT |
3 | \usepackage[T1]{fontenc} |
4 | \usepackage[utf8x]{inputenc} | |
f6d0f511 | 5 | \usepackage[ps2pdf]{hyperref} |
66e1551f | 6 | \usepackage{reqlist} |
f7932303 | 7 | \usepackage{longtable} |
66e1551f | 8 | |
75053ec3 | 9 | \newcommand{\urlink}[1]{\texttt{<\url{#1}>}} |
4ae8ca60 FT |
10 | \newcommand{\unix}{\textsc{Unix}} |
11 | ||
12 | \title{Dolda Connect protocol} | |
13 | \author{Fredrik Tolf\\\texttt{<fredrik@dolda2000.com>}} | |
14 | ||
15 | \begin{document} | |
16 | ||
17 | \maketitle | |
18 | ||
47b71ed4 FT |
19 | \tableofcontents |
20 | ||
4ae8ca60 FT |
21 | \section{Introduction} |
22 | Dolda Connect consists partly of a daemon (a.k.a. server) that runs in | |
23 | the background and carries out all the actual work, and a number of | |
24 | client programs (a.k.a. user interfaces) that connect to the daemon in | |
25 | order to tell it what to do. In order for the daemon and the clients | |
26 | to be able to talk to each other, a protocol is needed. This document | |
27 | intends to document that protocol, so that third parties can write | |
28 | their own client programs. | |
29 | ||
30 | It is worthy of note that there exists a library, called | |
03c10091 | 31 | \texttt{libdcui}, that carries out much of the low level work of |
4ae8ca60 FT |
32 | speaking the protocol, facilitating the creation of new client |
33 | programs. In itself, \texttt{libdcui} is written in the C programming | |
34 | language and is intended to be used by other programs written in C, | |
35 | but there also exist wrapper libraries for both GNU Guile (the GNU | |
36 | project's Scheme interpreter) and for Python. The former is | |
37 | distributed with the main Dolda Connect source tree, while the latter | |
38 | is distributed separately (for technical reasons). To get a copy, | |
75053ec3 FT |
39 | please refer to Dolda Connect's homepage: |
40 | ||
41 | \urlink{http://www.dolda2000.com/~fredrik/doldaconnect/} | |
4ae8ca60 FT |
42 | |
43 | \section{Transport format} | |
66e1551f FT |
44 | Note: Everything covered in this section is handled by the |
45 | \texttt{libdcui} library. Thus, if you read this because you just want | |
46 | to write a client, and are using the library (or any of the wrapper | |
47 | libraries), you can safely skip over this section. It may still be | |
48 | interesting to read in order to understand the semantics of the | |
49 | protocol, however. | |
50 | ||
4ae8ca60 FT |
51 | The protocol can be spoken over any channel that features a |
52 | byte-oriented, reliable virtual (or not) circuit. Usually, it is | |
53 | spoken over a TCP connection or a byte-oriented \unix\ socket. The | |
54 | usual port number for TCP connections is 1500, but any port could be | |
55 | used\footnote{However, port 1500 is what the \texttt{libdcui} library | |
56 | uses if no port is explicitly stated, so it is probably to be | |
66e1551f FT |
57 | preferred}. |
58 | ||
59 | \subsection{Informal description} | |
4ae8ca60 FT |
60 | |
61 | On top of the provided byte-oriented connection, the most basic level | |
62 | of the protocol is a stream of Unicode characters, encoded with | |
63 | UTF-8. The Unicode stream is then grouped in two levels: lines | |
64 | consisting of words (a.k.a. tokens). Lines are separated by CRLF | |
65 | sequences (\emph{not} just CR or LF), and words are separated by | |
66 | whitespace. Both whitespace and CRLFs can be quoted, however, | |
67 | overriding their normal interpretation of separators and allowing them | |
68 | to be parts of words. NUL characters are not allowed to be transferred | |
69 | at all, but all other Unicode codepoints are allowed. | |
70 | ||
71 | Lines transmitted from the daemon to the client are slightly | |
72 | different, however. They all start with a three-digit code, followed | |
73 | by either a space or a dash\footnote{Yes, this is inspired by FTP and | |
74 | SMTP.}, followed by the normal sequence of words. The three-digit | |
75 | code identifies that type of line. Overall, the protocol is a | |
76 | lock-step protocol, where the clients sends one line that is | |
77 | interpreted as a request, and the daemon replies with one or more | |
78 | lines. In a multi-line response, all lines except the last have the | |
79 | three-digit code followed by a dash. The last line of a multi-line | |
80 | response and the only line of a single-line response have the | |
81 | three-digit code followed by a space. All lines of a multi-line | |
82 | response have the same three-digit code. The client is not allowed to | |
83 | send another request until the last line of the previous response has | |
66e1551f FT |
84 | been received. The exception is that the daemon might send (but only |
85 | if the client has requested it to do so) sporadic lines of | |
86 | asynchronous notification messages. Notification message lines are | |
87 | distinguished by having their three-digit codes always begin with the | |
88 | digit 6. Otherwise, the first digit of the three-digit code indicates | |
89 | the overall success or failure of a request. Codes beginning with 2 | |
90 | indicate the the request to which they belong succeeded. Codes | |
91 | beginning with 3 indicate that the request succeeded in itself, but | |
92 | that it is considered part of a sequence of commands, and that the | |
93 | sequence still requires additional interaction before considered | |
94 | successful. Codes beginning with 5 are indication of errors. The | |
95 | remaining two digits merely distinguish between different | |
96 | outcomes. Note that notification message lines may come at \emph{any} | |
97 | time, even in the middle of multiline responses (though not in the | |
98 | middle of another line). There are no multiline notifications. | |
99 | ||
100 | The act of connecting to the daemon is itself considered a request, | |
101 | solicitating a success or failure response, so it is the daemon that | |
102 | first transmits actual data. A failure response may be provoked by a | |
103 | client connecting from a prohibited source. | |
104 | ||
105 | Quoting of special characters in words may be done in two ways. First, | |
106 | the backslash character escapes any special interpretation of the | |
107 | character that comes after it, no matter where or what the following | |
108 | character is (it is not required even to be a special | |
109 | character). Thus, the only way to include a backslash in a word is to | |
110 | escape it with another backslash. Second, any interpretation of | |
111 | whitespace may be escaped using the citation mark character (only the | |
112 | ASCII one, U+0022 -- not any other Unicode quotes), by enclosing a | |
113 | string containing whitespace in citation marks. (Note that the citation | |
114 | marks need not necessarily be placed at the word boundaries, so the | |
115 | string ``\texttt{a"b c"d}'' is parsed as a single word ``\texttt{ab | |
116 | cd}''.) Technically, this dual layer of quoting may seem like a | |
117 | liability when implementing the protocol, but it is quite convenient | |
118 | when talking directly to the daemon with a program such as | |
119 | \texttt{telnet}. | |
120 | ||
121 | \subsection{Formal description} | |
122 | ||
123 | Formally, the syntax of the protocol may be defined with the following | |
124 | BNF rules. Note that they all operate on Unicode characters, not bytes. | |
125 | ||
f7932303 | 126 | \begin{longtable}{lcl} |
66e1551f FT |
127 | <session> & ::= & <SYN> <response> \\ |
128 | & & | <session> <transaction> \\ | |
129 | & & | <session> <notification> \\ | |
130 | <transaction> & ::= & <request> <response> \\ | |
131 | <request> & ::= & <line> \\ | |
132 | <response> & ::= & <resp-line-last> \\ | |
133 | & & | <resp-line-not-last> <response> \\ | |
134 | & & | <notification> <response> \\ | |
135 | <resp-line-last> & ::= & <resp-code> <SPACE> <line> \\ | |
136 | <resp-line-not-last> & ::= & <resp-code> <DASH> <line> \\ | |
137 | <notification> & ::= & <notification-code> <SPACE> <line> \\ | |
138 | <resp-code> & ::= & ``\texttt{2}'' <digit> <digit> \\ | |
139 | & & | ``\texttt{3}'' <digit> <digit> \\ | |
140 | & & | ``\texttt{5}'' <digit> <digit> \\ | |
141 | <notification-code> & ::= & ``\texttt{6}'' <digit> <digit> \\ | |
142 | <line> & ::= & <CRLF> \\ | |
143 | & & | <word> <ws> <line> \\ | |
144 | <word> & ::= & <COMMON-CHAR> \\ | |
145 | & & | ``\texttt{$\backslash$}'' <CHAR> \\ | |
146 | & & | ``\texttt{"}'' <quoted-word> ``\texttt{"}'' \\ | |
147 | & & | <word> <word> \\ | |
148 | <quoted-word> & ::= & ``'' \\ | |
149 | & & | <COMMON-CHAR> <quoted-word> \\ | |
150 | & & | <ws> <quoted-word> \\ | |
151 | & & | ``\texttt{$\backslash$}'' <CHAR> <quoted-word> \\ | |
152 | <ws> & ::= & <1ws> | <1ws> <ws> \\ | |
153 | <1ws> & ::= & <SPACE> | <TAB> \\ | |
154 | <digit> & ::= & ``\texttt{0}'' | | |
155 | ``\texttt{1}'' | ``\texttt{2}'' | | |
156 | ``\texttt{3}'' | ``\texttt{4}'' \\ | |
157 | & & | ``\texttt{5}'' | ``\texttt{6}'' | | |
158 | ``\texttt{7}'' | ``\texttt{8}'' | | |
159 | ``\texttt{9}'' | |
f7932303 | 160 | \end{longtable} |
66e1551f FT |
161 | |
162 | As for the terminal symbols, <SPACE> is U+0020, <TAB> is U+0009, | |
163 | <CRLF> is the sequence of U+000D and U+000A, <DASH> is U+002D, <CHAR> | |
164 | is any Unicode character except U+0000, <COMMON-CHAR> is any | |
165 | Unicode character except U+0000, U+0009, U+000A, U+000D, U+0020, | |
166 | U+0022 and U+005C, and <SYN> is the out-of-band message that | |
167 | establishes the communication channel\footnote{This means that the | |
168 | communication channel must support such a message. For example, raw | |
169 | RS-232 would be hard to support.}. The following constraints also | |
170 | apply: | |
171 | \begin{itemize} | |
172 | \item <SYN> and <request> must be sent from the client to the daemon. | |
173 | \item <response> and <notification> must be sent from the daemon to | |
174 | the client. | |
175 | \end{itemize} | |
176 | Note that the definition of <word> means that the only way to | |
177 | represent an empty word is by a pair of citation marks. | |
178 | ||
179 | In each request line, there should be at least one word, but it is not | |
180 | considered a syntax error if there is not. The first word in each | |
181 | request line is considered the name of the command to be carried out | |
182 | by the daemon. An empty line is a valid request as such, but since no | |
183 | matching command, it will provoke the same kind of error response as | |
184 | if a request with any other non-existing command were sent. Any | |
185 | remaining words on the line are considered arguments to the command. | |
186 | ||
2466bbc8 FT |
187 | \section{Data model} |
188 | ||
189 | The main purpose of the protocol is to communicate the current state | |
190 | of the daemon to the client and keep it synchronized. Therefore, in | |
191 | order to understand the actions of the individual requests, an | |
192 | understanding of the data structures that define the current state is | |
193 | fundamental. The intent of this section is document those structures | |
194 | in a top-down approach. | |
195 | ||
196 | \subsection{Filesharing network} | |
197 | \label{fnet} | |
198 | At the heart of the Dolda Connect daemon lies the abstraction of a | |
199 | file sharing network, often abbreviated ``filenet'' or ``fnet''. To | |
200 | the daemon, a filenet is a software module that speaks a certain | |
201 | filesharing protocol, such as the Direct Connect protocol. A client | |
202 | program will never interact directly with any filenet module, but it | |
203 | is often important to know that there are several filenet | |
204 | modules\footnote{Actually, at the time of this writing, that is false, | |
205 | as only the Direct Connect protocol is implemented. However, the | |
206 | protocol still requires it explicitly stated at several occasions, | |
207 | and it is nonetheless important to keep in mind that there | |
208 | \emph{could} be several filenet modules. Also, work is under way to | |
209 | implement ADC, the ``official'' successor to the Direct Connect | |
210 | protocol.}. The only detail visible to clients about a filenet is | |
211 | its name. The currently implemented filenet modules are listed in | |
212 | section \ref{fnets}, along with important information about each. | |
213 | ||
214 | \subsection{Filenet node} | |
215 | \label{fnetnode} | |
216 | The filenet node, often abbreviated ``fnetnode'', corresponds closely | |
c030a346 FT |
217 | to the Direct Connect concept of a ``hub''. In world outside of Dolda |
218 | Connect abstractions, it is a server running software that other users | |
219 | connect to and communicate through. A fnetnode always belongs to a | |
220 | filenet, and its substructure consists of its ID number, name, | |
221 | connection state, persistent ID and user list. | |
222 | ||
223 | When a fnetnode is created, it is assigned an ID number, which is used | |
224 | to refer to it in subsequent requests. The ID number is guaranteed to | |
225 | be unique so long as the Dolda Connect daemon runs. The persistent ID, | |
226 | in contrast, is intended to be unique for as long as the server lives | |
227 | (but it is not perfect). The ``name'' of the fnetnode is the name that | |
228 | the server states. Note that the name cannot be used as a persistent | |
229 | ID at all, since server owners frequently change the name. Hopefully, | |
230 | the name means something to the end user. | |
231 | ||
232 | The connection state can take four values, referred to as | |
233 | \texttt{syn}, \texttt{hs}, \texttt{est} and \texttt{dead}, and a | |
234 | fnetnode proceeds along that order during its lifetime. It begins in | |
235 | the \texttt{syn} state, and remains there while the Dolda Connect | |
236 | daemon attempts to establish a network connection to it. When the | |
237 | network connection is established, it enters the \texttt{hs} state, | |
238 | where it remains while the initial protocol handshake is being carried | |
239 | out. It then enters the \texttt{est} state, where it remains for as | |
240 | long as it is connected. It only enters the \texttt{dead} state when | |
241 | the network connection between Dolda Connect and the server is | |
242 | severed. In essence, the fnetnode is usable while in the \texttt{est} | |
243 | state. | |
244 | ||
245 | The user list is the list of other users connected to the same | |
246 | server. It consists of a set of attribute definitions and a list of | |
247 | users objects. | |
248 | ||
249 | \subsubsection{User objects} | |
250 | A user object represents a single user connected to a file-sharing | |
251 | server. Its substructure comprises an ID, a screen name and a number | |
252 | of key-value mappings. | |
253 | ||
254 | The namespace of a user ID is the filenet which its owning fnetnode | |
255 | belongs to. The intention is that there should be a one-to-one mapping | |
256 | between (filenet, user ID) pairs and real humans. However, that ideal | |
257 | situation does not always hold true. First, real humans may choose to | |
258 | allocate several IDs for themselves (one reason to do so would be | |
259 | privacy). Second, lesser protocols, such as the Direct Connect | |
260 | protocol, cannot guarantee that a single ID cannot map to more than | |
261 | one real human. Strictly, a single ID can only be guaranteed to map to | |
262 | one real human within the scope of a fnetnode. | |
263 | ||
264 | The screen name of a user is the name that the user has chosen to be | |
265 | displayed for others to identify. It may change arbitrarily over the | |
266 | lifetime of a user ID. It is probably more human readable than the | |
267 | user ID\footnote{Although, the Direct Connect protocol implementation | |
268 | uses a user's screen name as the user ID.}. | |
269 | ||
270 | The key-value mappings represent arbitrary attributes that are | |
271 | associated with a user object. Exactly what attributes are available | |
272 | differ between different filenets and fnetnodes. | |
273 | ||
274 | \subsubsection{Attribute definitions} | |
275 | The attributes associated with a user object have a key, a value, and | |
276 | a value domain (or datatype, if you will). In order to save network | |
277 | bandwidth when transferring a user list, the value domain for | |
278 | attributes is not transferred along with the user list. Instead, a | |
279 | list of possible keys and their value domains is requested | |
280 | separately. The value domains defined as of this writing are integers, | |
281 | long integers and strings. The difference between | |
282 | an integer and a long integer is that the former must fit in a 32-bit | |
283 | variable\footnote{Yes, long integers are an ugly hack to | |
284 | facilitate C implementations.}. | |
285 | ||
286 | As mentioned above, the available attributes will differ between | |
287 | different filenets and fnetnodes, but there are a number of standard | |
288 | ones, which are listed in table \ref{tab:std-attrs}. Note that being | |
289 | standard does not mean that they will always be present -- only that | |
290 | they will have the same meaning anywhere they actually are present. | |
291 | ||
292 | \begin{table} | |
293 | \begin{longtable}{ll|p{3in}} | |
294 | Name & Domain & Description \\ | |
295 | \hline | |
296 | \texttt{descr} & String & | |
297 | A description entered by the user to | |
298 | describe herself, or, more probably, the files she is sharing. | |
299 | \\ | |
300 | \texttt{email} & String & | |
301 | The user's email address. Few users will | |
302 | probably fill this in honestly, but it is defined nonetheless. | |
303 | \\ | |
304 | \texttt{share} & Longint & | |
305 | The total number of bytes the user is sharing. | |
306 | \end{longtable} | |
307 | \caption{The standard user attributes} | |
308 | \label{tab:std-attrs} | |
309 | \end{table} | |
310 | ||
311 | \subsection{Transfer} | |
312 | \label{transfer} | |
313 | Obviously, the main purpose of the daemon is to actually transfer | |
314 | files. | |
2466bbc8 | 315 | |
66e1551f | 316 | \section{Requests} |
2466bbc8 | 317 | |
66e1551f FT |
318 | For each arriving request, the daemon checks so that the request |
319 | passes a number of tests before carrying it out. First, it matches the | |
320 | name of the command against the list of known commands to see if the | |
321 | request calls a valid command. If the command is not valid, the daemon | |
322 | sends a reponse with code 500. Then, it checks so that the request has | |
323 | the minimum required number of parameters for the given command. If it | |
324 | does not, it responds with a 501 code. Last, it checks so that the | |
325 | user account issuing the request has the necessary permissions to have | |
326 | the request carried out. If it does not, it responds with a 502 | |
327 | code. After that, any responses are individual to the command in | |
328 | question. The intention of this section is to list them all. | |
329 | ||
330 | \subsection{Permissions} | |
331 | ||
332 | As for the permissions mentioned above, it is outside the scope of | |
333 | this document to describe the administration of | |
334 | permissions\footnote{Please see the \texttt{doldacond.conf(5)} man | |
c030a346 FT |
335 | page for more information on that topic.}, but as some commands |
336 | require certain permission, they need at least be specified. When a | |
337 | connection is established, it is associated with no permissions. At | |
338 | that point, only requests that do not require any permissions can be | |
339 | successfully issued. Normally, the first thing a client would do is to | |
340 | authenticate to the daemon. At the end of a successful authentication, | |
341 | the daemon associates the proper permissions with the connection over | |
342 | which authentication took place. The possible permissions are listed | |
343 | in table \ref{tab:perm}. | |
66e1551f FT |
344 | |
345 | \begin{table} | |
346 | \begin{tabular}{rl} | |
347 | Name & General description \\ | |
348 | \hline | |
349 | \texttt{admin} & Required for all commands that administer the | |
350 | daemon. \\ | |
351 | \texttt{fnetctl} & Required for all commands that alter the state of | |
352 | connected hubs. \\ | |
353 | \texttt{trans} & Required for all commands that alter the state of | |
354 | file transfers. \\ | |
355 | \texttt{transcu} & Required specifically for cancelling uploads. \\ | |
356 | \texttt{chat} & Required for exchanging chat messages. \\ | |
357 | \texttt{srch} & Required for issuing and querying searches. \\ | |
358 | \end{tabular} | |
359 | \caption{The list of available permissions} | |
360 | \label{tab:perm} | |
361 | \end{table} | |
362 | ||
363 | \subsection{Protocol revisions} | |
03ee2e4a | 364 | \label{rev} |
66e1551f FT |
365 | Since Dolda Connect is developing, its command set may change |
366 | occasionally. Sometimes new commands are added, sometimes commands | |
367 | change argument syntax, and sometimes commands are removed. In order | |
368 | for clients to be able to cleanly cope with such changes, the protocol | |
369 | is revisioned. When a client connects to the daemon, the daemon | |
370 | indicates in the first response it sends the range of protocol | |
371 | revisions it supports, and each command listed below specifies the | |
372 | revision number from which its current specification is valid. A | |
373 | client should should check the revision range from the daemon so that | |
374 | it includes the revision that incorporates all commands that it wishes | |
375 | to use. | |
376 | ||
377 | Whenever the protocol changes at all, it is given a new revision | |
378 | number. If the entire protocol is backwards compatible with the | |
379 | previous version, the revision range sent by the server is updated to | |
380 | extend forward to the new revision. If the protocol in any way is not | |
381 | compatible with the previous revision, the revision range is moved | |
382 | entirely to the new revision. Therefore, a client can check for a | |
383 | certain revision and be sure that everything it wants is supported by | |
384 | the daemon. | |
385 | ||
03ee2e4a FT |
386 | At the time of this writing, the latest protocol revision is 2. Please |
387 | see the file \texttt{doc/protorev} that comes with the Dolda Connect | |
388 | source tree for a full list of revisions and what changed between | |
389 | them. | |
390 | ||
66e1551f FT |
391 | \subsection{List of commands} |
392 | ||
393 | Follows does a (hopefully) exhaustive listing of all commands valid | |
394 | for a request. For each possible request, it includes the name of the | |
03ee2e4a | 395 | command for the request, the permissions required, the syntax for the |
66e1551f FT |
396 | entire request line, and the possible responses. |
397 | ||
398 | The syntax of the request and response lines is described in a format | |
399 | like that traditional of \unix\ man pages, with a number of terms, | |
400 | each corresponding to a word in the line. Each term in the syntax | |
401 | description is either a literal string, written in lower case; an | |
402 | argument, written in uppercase and meant to be replaced by some other | |
403 | text as described; an optional term, enclosed in brackets | |
404 | (``\texttt{[}'' and ``\texttt{]}''); or a list of alternatives, | |
405 | enclosed in braces (``\texttt{\{}'' and ``\texttt{\}}'') and separated | |
406 | by pipes (``\texttt{|}''). Possible repetition of a term is indicated | |
407 | by three dots (``\texttt{...}''), and, for the purpose of repition, | |
408 | terms may be groups with parentheses (``\texttt{(}'' and | |
409 | ``\texttt{)}''). | |
410 | ||
411 | Two things should be noted regarding the responses. First, in the | |
412 | syntax description of responses, the response code is given as the | |
413 | first term, even though it is not actually considered a word. Second, | |
414 | more words may follow after the specified syntax, and should be | |
415 | discarded by a client. Many responses use that to include a human | |
416 | readable string to indicate the conclusion of the request. | |
417 | ||
418 | \subsubsection{Connection} | |
419 | As mentioned above, the act of connecting to the daemon is itself | |
420 | considered a request, soliciting a response. Such a request obviously | |
421 | has no command name and no syntax, but needs a description | |
422 | nonetheless. | |
423 | ||
03ee2e4a FT |
424 | \revision{1} |
425 | ||
66e1551f FT |
426 | \noperm |
427 | ||
428 | \begin{responses} | |
429 | \response{200} | |
430 | The old response given by daemons not yet using the revisioned | |
431 | protocol. Clients receiving this response should consider it an | |
432 | error. | |
03ee2e4a FT |
433 | \response{201 LOREV HIREV} |
434 | Indicates that the connection is accepted. The \param{LOREV} and | |
435 | \param{HIREV} parameters specify the range of supported protocol | |
436 | revisions, as described in section \ref{rev}. | |
437 | \response{502 REASON} | |
438 | The connection is refused by the daemon and will be closed. The | |
439 | \param{REASON} parameter states the reason for the refusal in | |
440 | English\footnote{So it is probably not suitable for localized | |
441 | programs}. | |
66e1551f | 442 | \end{responses} |
4ae8ca60 | 443 | |
f6d0f511 FT |
444 | \input{commands} |
445 | ||
2466bbc8 FT |
446 | \section{Filesharing networks} |
447 | \label{fnets} | |
448 | ||
4ae8ca60 | 449 | \end{document} |