Incremental update of documenation.
[doldaconnect.git] / doc / protocol / protocol.tex
CommitLineData
4ae8ca60
FT
1\documentclass[twoside,a4paper,11pt]{article}
2
66e1551f
FT
3\usepackage[T1]{fontenc}
4\usepackage[utf8x]{inputenc}
5\usepackage{reqlist}
6
4ae8ca60
FT
7\newcommand{\url}[1]{\texttt{<#1>}}
8\newcommand{\unix}{\textsc{Unix}}
9
10\title{Dolda Connect protocol}
11\author{Fredrik Tolf\\\texttt{<fredrik@dolda2000.com>}}
12
13\begin{document}
14
15\maketitle
16
17\section{Introduction}
18Dolda Connect consists partly of a daemon (a.k.a. server) that runs in
19the background and carries out all the actual work, and a number of
20client programs (a.k.a. user interfaces) that connect to the daemon in
21order to tell it what to do. In order for the daemon and the clients
22to be able to talk to each other, a protocol is needed. This document
23intends to document that protocol, so that third parties can write
24their own client programs.
25
26It is worthy of note that there exists a library, called
27\texttt{libdcui} that carries out much of the low level work of
28speaking the protocol, facilitating the creation of new client
29programs. In itself, \texttt{libdcui} is written in the C programming
30language and is intended to be used by other programs written in C,
31but there also exist wrapper libraries for both GNU Guile (the GNU
32project's Scheme interpreter) and for Python. The former is
33distributed with the main Dolda Connect source tree, while the latter
34is distributed separately (for technical reasons). To get a copy,
35please refer to Dolda Connect's homepage at
36\url{http://www.dolda2000.com}.
37
38\section{Transport format}
66e1551f
FT
39Note: Everything covered in this section is handled by the
40\texttt{libdcui} library. Thus, if you read this because you just want
41to write a client, and are using the library (or any of the wrapper
42libraries), you can safely skip over this section. It may still be
43interesting to read in order to understand the semantics of the
44protocol, however.
45
4ae8ca60
FT
46The protocol can be spoken over any channel that features a
47byte-oriented, reliable virtual (or not) circuit. Usually, it is
48spoken over a TCP connection or a byte-oriented \unix\ socket. The
49usual port number for TCP connections is 1500, but any port could be
50used\footnote{However, port 1500 is what the \texttt{libdcui} library
51 uses if no port is explicitly stated, so it is probably to be
66e1551f
FT
52 preferred}.
53
54\subsection{Informal description}
4ae8ca60
FT
55
56On top of the provided byte-oriented connection, the most basic level
57of the protocol is a stream of Unicode characters, encoded with
58UTF-8. The Unicode stream is then grouped in two levels: lines
59consisting of words (a.k.a. tokens). Lines are separated by CRLF
60sequences (\emph{not} just CR or LF), and words are separated by
61whitespace. Both whitespace and CRLFs can be quoted, however,
62overriding their normal interpretation of separators and allowing them
63to be parts of words. NUL characters are not allowed to be transferred
64at all, but all other Unicode codepoints are allowed.
65
66Lines transmitted from the daemon to the client are slightly
67different, however. They all start with a three-digit code, followed
68by either a space or a dash\footnote{Yes, this is inspired by FTP and
69 SMTP.}, followed by the normal sequence of words. The three-digit
70code identifies that type of line. Overall, the protocol is a
71lock-step protocol, where the clients sends one line that is
72interpreted as a request, and the daemon replies with one or more
73lines. In a multi-line response, all lines except the last have the
74three-digit code followed by a dash. The last line of a multi-line
75response and the only line of a single-line response have the
76three-digit code followed by a space. All lines of a multi-line
77response have the same three-digit code. The client is not allowed to
78send another request until the last line of the previous response has
66e1551f
FT
79been received. The exception is that the daemon might send (but only
80if the client has requested it to do so) sporadic lines of
81asynchronous notification messages. Notification message lines are
82distinguished by having their three-digit codes always begin with the
83digit 6. Otherwise, the first digit of the three-digit code indicates
84the overall success or failure of a request. Codes beginning with 2
85indicate the the request to which they belong succeeded. Codes
86beginning with 3 indicate that the request succeeded in itself, but
87that it is considered part of a sequence of commands, and that the
88sequence still requires additional interaction before considered
89successful. Codes beginning with 5 are indication of errors. The
90remaining two digits merely distinguish between different
91outcomes. Note that notification message lines may come at \emph{any}
92time, even in the middle of multiline responses (though not in the
93middle of another line). There are no multiline notifications.
94
95The act of connecting to the daemon is itself considered a request,
96solicitating a success or failure response, so it is the daemon that
97first transmits actual data. A failure response may be provoked by a
98client connecting from a prohibited source.
99
100Quoting of special characters in words may be done in two ways. First,
101the backslash character escapes any special interpretation of the
102character that comes after it, no matter where or what the following
103character is (it is not required even to be a special
104character). Thus, the only way to include a backslash in a word is to
105escape it with another backslash. Second, any interpretation of
106whitespace may be escaped using the citation mark character (only the
107ASCII one, U+0022 -- not any other Unicode quotes), by enclosing a
108string containing whitespace in citation marks. (Note that the citation
109marks need not necessarily be placed at the word boundaries, so the
110string ``\texttt{a"b c"d}'' is parsed as a single word ``\texttt{ab
111 cd}''.) Technically, this dual layer of quoting may seem like a
112liability when implementing the protocol, but it is quite convenient
113when talking directly to the daemon with a program such as
114\texttt{telnet}.
115
116\subsection{Formal description}
117
118Formally, the syntax of the protocol may be defined with the following
119BNF rules. Note that they all operate on Unicode characters, not bytes.
120
121\begin{tabular}{lcl}
122<session> & ::= & <SYN> <response> \\
123 & & | <session> <transaction> \\
124 & & | <session> <notification> \\
125<transaction> & ::= & <request> <response> \\
126<request> & ::= & <line> \\
127<response> & ::= & <resp-line-last> \\
128 & & | <resp-line-not-last> <response> \\
129 & & | <notification> <response> \\
130<resp-line-last> & ::= & <resp-code> <SPACE> <line> \\
131<resp-line-not-last> & ::= & <resp-code> <DASH> <line> \\
132<notification> & ::= & <notification-code> <SPACE> <line> \\
133<resp-code> & ::= & ``\texttt{2}'' <digit> <digit> \\
134 & & | ``\texttt{3}'' <digit> <digit> \\
135 & & | ``\texttt{5}'' <digit> <digit> \\
136<notification-code> & ::= & ``\texttt{6}'' <digit> <digit> \\
137<line> & ::= & <CRLF> \\
138 & & | <word> <ws> <line> \\
139<word> & ::= & <COMMON-CHAR> \\
140 & & | ``\texttt{$\backslash$}'' <CHAR> \\
141 & & | ``\texttt{"}'' <quoted-word> ``\texttt{"}'' \\
142 & & | <word> <word> \\
143<quoted-word> & ::= & ``'' \\
144 & & | <COMMON-CHAR> <quoted-word> \\
145 & & | <ws> <quoted-word> \\
146 & & | ``\texttt{$\backslash$}'' <CHAR> <quoted-word> \\
147<ws> & ::= & <1ws> | <1ws> <ws> \\
148<1ws> & ::= & <SPACE> | <TAB> \\
149<digit> & ::= & ``\texttt{0}'' |
150``\texttt{1}'' | ``\texttt{2}'' |
151``\texttt{3}'' | ``\texttt{4}'' \\
152& & | ``\texttt{5}'' | ``\texttt{6}'' |
153``\texttt{7}'' | ``\texttt{8}'' |
154``\texttt{9}''
155\end{tabular}
156
157As for the terminal symbols, <SPACE> is U+0020, <TAB> is U+0009,
158<CRLF> is the sequence of U+000D and U+000A, <DASH> is U+002D, <CHAR>
159is any Unicode character except U+0000, <COMMON-CHAR> is any
160Unicode character except U+0000, U+0009, U+000A, U+000D, U+0020,
161U+0022 and U+005C, and <SYN> is the out-of-band message that
162establishes the communication channel\footnote{This means that the
163 communication channel must support such a message. For example, raw
164 RS-232 would be hard to support.}. The following constraints also
165apply:
166\begin{itemize}
167\item <SYN> and <request> must be sent from the client to the daemon.
168\item <response> and <notification> must be sent from the daemon to
169 the client.
170\end{itemize}
171Note that the definition of <word> means that the only way to
172represent an empty word is by a pair of citation marks.
173
174In each request line, there should be at least one word, but it is not
175considered a syntax error if there is not. The first word in each
176request line is considered the name of the command to be carried out
177by the daemon. An empty line is a valid request as such, but since no
178matching command, it will provoke the same kind of error response as
179if a request with any other non-existing command were sent. Any
180remaining words on the line are considered arguments to the command.
181
182\section{Requests}
183For each arriving request, the daemon checks so that the request
184passes a number of tests before carrying it out. First, it matches the
185name of the command against the list of known commands to see if the
186request calls a valid command. If the command is not valid, the daemon
187sends a reponse with code 500. Then, it checks so that the request has
188the minimum required number of parameters for the given command. If it
189does not, it responds with a 501 code. Last, it checks so that the
190user account issuing the request has the necessary permissions to have
191the request carried out. If it does not, it responds with a 502
192code. After that, any responses are individual to the command in
193question. The intention of this section is to list them all.
194
195\subsection{Permissions}
196
197As for the permissions mentioned above, it is outside the scope of
198this document to describe the administration of
199permissions\footnote{Please see the \texttt{doldacond.conf(5)} man
200 page for more information on that topic.}, but some commands require
201certain permission, they need at least be specified. When a connection
202is established, it is associated with no permissions. At that point,
203only requests that do not require any permissions can be successfully
204issued. Normally, the first thing a client would do is to authenticate
205to the daemon. At the end of a successful authentication, the daemon
206associates the proper permissions with the connection over which
207authentication took place. The possible permissions are listed in
208table \ref{tab:perm}.
209
210\begin{table}
211 \begin{tabular}{rl}
212 Name & General description \\
213 \hline
214 \texttt{admin} & Required for all commands that administer the
215 daemon. \\
216 \texttt{fnetctl} & Required for all commands that alter the state of
217 connected hubs. \\
218 \texttt{trans} & Required for all commands that alter the state of
219 file transfers. \\
220 \texttt{transcu} & Required specifically for cancelling uploads. \\
221 \texttt{chat} & Required for exchanging chat messages. \\
222 \texttt{srch} & Required for issuing and querying searches. \\
223 \end{tabular}
224 \caption{The list of available permissions}
225 \label{tab:perm}
226\end{table}
227
228\subsection{Protocol revisions}
03ee2e4a 229\label{rev}
66e1551f
FT
230Since Dolda Connect is developing, its command set may change
231occasionally. Sometimes new commands are added, sometimes commands
232change argument syntax, and sometimes commands are removed. In order
233for clients to be able to cleanly cope with such changes, the protocol
234is revisioned. When a client connects to the daemon, the daemon
235indicates in the first response it sends the range of protocol
236revisions it supports, and each command listed below specifies the
237revision number from which its current specification is valid. A
238client should should check the revision range from the daemon so that
239it includes the revision that incorporates all commands that it wishes
240to use.
241
242Whenever the protocol changes at all, it is given a new revision
243number. If the entire protocol is backwards compatible with the
244previous version, the revision range sent by the server is updated to
245extend forward to the new revision. If the protocol in any way is not
246compatible with the previous revision, the revision range is moved
247entirely to the new revision. Therefore, a client can check for a
248certain revision and be sure that everything it wants is supported by
249the daemon.
250
03ee2e4a
FT
251At the time of this writing, the latest protocol revision is 2. Please
252see the file \texttt{doc/protorev} that comes with the Dolda Connect
253source tree for a full list of revisions and what changed between
254them.
255
66e1551f
FT
256\subsection{List of commands}
257
258Follows does a (hopefully) exhaustive listing of all commands valid
259for a request. For each possible request, it includes the name of the
03ee2e4a 260command for the request, the permissions required, the syntax for the
66e1551f
FT
261entire request line, and the possible responses.
262
263The syntax of the request and response lines is described in a format
264like that traditional of \unix\ man pages, with a number of terms,
265each corresponding to a word in the line. Each term in the syntax
266description is either a literal string, written in lower case; an
267argument, written in uppercase and meant to be replaced by some other
268text as described; an optional term, enclosed in brackets
269(``\texttt{[}'' and ``\texttt{]}''); or a list of alternatives,
270enclosed in braces (``\texttt{\{}'' and ``\texttt{\}}'') and separated
271by pipes (``\texttt{|}''). Possible repetition of a term is indicated
272by three dots (``\texttt{...}''), and, for the purpose of repition,
273terms may be groups with parentheses (``\texttt{(}'' and
274``\texttt{)}'').
275
276Two things should be noted regarding the responses. First, in the
277syntax description of responses, the response code is given as the
278first term, even though it is not actually considered a word. Second,
279more words may follow after the specified syntax, and should be
280discarded by a client. Many responses use that to include a human
281readable string to indicate the conclusion of the request.
282
283\subsubsection{Connection}
284As mentioned above, the act of connecting to the daemon is itself
285considered a request, soliciting a response. Such a request obviously
286has no command name and no syntax, but needs a description
287nonetheless.
288
03ee2e4a
FT
289\revision{1}
290
66e1551f
FT
291\noperm
292
293\begin{responses}
294 \response{200}
295 The old response given by daemons not yet using the revisioned
296 protocol. Clients receiving this response should consider it an
297 error.
03ee2e4a
FT
298 \response{201 LOREV HIREV}
299 Indicates that the connection is accepted. The \param{LOREV} and
300 \param{HIREV} parameters specify the range of supported protocol
301 revisions, as described in section \ref{rev}.
302 \response{502 REASON}
303 The connection is refused by the daemon and will be closed. The
304 \param{REASON} parameter states the reason for the refusal in
305 English\footnote{So it is probably not suitable for localized
306 programs}.
66e1551f 307\end{responses}
4ae8ca60
FT
308
309\end{document}