Added table of contents to protocol.tex.
[doldaconnect.git] / doc / protocol / protocol.tex
CommitLineData
4ae8ca60
FT
1\documentclass[twoside,a4paper,11pt]{article}
2
66e1551f
FT
3\usepackage[T1]{fontenc}
4\usepackage[utf8x]{inputenc}
f6d0f511 5\usepackage[ps2pdf]{hyperref}
66e1551f
FT
6\usepackage{reqlist}
7
f6d0f511 8\newcommand{\urlink}[1]{\texttt{<#1>}}
4ae8ca60
FT
9\newcommand{\unix}{\textsc{Unix}}
10
11\title{Dolda Connect protocol}
12\author{Fredrik Tolf\\\texttt{<fredrik@dolda2000.com>}}
13
14\begin{document}
15
16\maketitle
17
47b71ed4
FT
18\tableofcontents
19
4ae8ca60
FT
20\section{Introduction}
21Dolda Connect consists partly of a daemon (a.k.a. server) that runs in
22the background and carries out all the actual work, and a number of
23client programs (a.k.a. user interfaces) that connect to the daemon in
24order to tell it what to do. In order for the daemon and the clients
25to be able to talk to each other, a protocol is needed. This document
26intends to document that protocol, so that third parties can write
27their own client programs.
28
29It is worthy of note that there exists a library, called
30\texttt{libdcui} that carries out much of the low level work of
31speaking the protocol, facilitating the creation of new client
32programs. In itself, \texttt{libdcui} is written in the C programming
33language and is intended to be used by other programs written in C,
34but there also exist wrapper libraries for both GNU Guile (the GNU
35project's Scheme interpreter) and for Python. The former is
36distributed with the main Dolda Connect source tree, while the latter
37is distributed separately (for technical reasons). To get a copy,
38please refer to Dolda Connect's homepage at
f6d0f511 39\urlink{http://www.dolda2000.com}.
4ae8ca60
FT
40
41\section{Transport format}
66e1551f
FT
42Note: Everything covered in this section is handled by the
43\texttt{libdcui} library. Thus, if you read this because you just want
44to write a client, and are using the library (or any of the wrapper
45libraries), you can safely skip over this section. It may still be
46interesting to read in order to understand the semantics of the
47protocol, however.
48
4ae8ca60
FT
49The protocol can be spoken over any channel that features a
50byte-oriented, reliable virtual (or not) circuit. Usually, it is
51spoken over a TCP connection or a byte-oriented \unix\ socket. The
52usual port number for TCP connections is 1500, but any port could be
53used\footnote{However, port 1500 is what the \texttt{libdcui} library
54 uses if no port is explicitly stated, so it is probably to be
66e1551f
FT
55 preferred}.
56
57\subsection{Informal description}
4ae8ca60
FT
58
59On top of the provided byte-oriented connection, the most basic level
60of the protocol is a stream of Unicode characters, encoded with
61UTF-8. The Unicode stream is then grouped in two levels: lines
62consisting of words (a.k.a. tokens). Lines are separated by CRLF
63sequences (\emph{not} just CR or LF), and words are separated by
64whitespace. Both whitespace and CRLFs can be quoted, however,
65overriding their normal interpretation of separators and allowing them
66to be parts of words. NUL characters are not allowed to be transferred
67at all, but all other Unicode codepoints are allowed.
68
69Lines transmitted from the daemon to the client are slightly
70different, however. They all start with a three-digit code, followed
71by either a space or a dash\footnote{Yes, this is inspired by FTP and
72 SMTP.}, followed by the normal sequence of words. The three-digit
73code identifies that type of line. Overall, the protocol is a
74lock-step protocol, where the clients sends one line that is
75interpreted as a request, and the daemon replies with one or more
76lines. In a multi-line response, all lines except the last have the
77three-digit code followed by a dash. The last line of a multi-line
78response and the only line of a single-line response have the
79three-digit code followed by a space. All lines of a multi-line
80response have the same three-digit code. The client is not allowed to
81send another request until the last line of the previous response has
66e1551f
FT
82been received. The exception is that the daemon might send (but only
83if the client has requested it to do so) sporadic lines of
84asynchronous notification messages. Notification message lines are
85distinguished by having their three-digit codes always begin with the
86digit 6. Otherwise, the first digit of the three-digit code indicates
87the overall success or failure of a request. Codes beginning with 2
88indicate the the request to which they belong succeeded. Codes
89beginning with 3 indicate that the request succeeded in itself, but
90that it is considered part of a sequence of commands, and that the
91sequence still requires additional interaction before considered
92successful. Codes beginning with 5 are indication of errors. The
93remaining two digits merely distinguish between different
94outcomes. Note that notification message lines may come at \emph{any}
95time, even in the middle of multiline responses (though not in the
96middle of another line). There are no multiline notifications.
97
98The act of connecting to the daemon is itself considered a request,
99solicitating a success or failure response, so it is the daemon that
100first transmits actual data. A failure response may be provoked by a
101client connecting from a prohibited source.
102
103Quoting of special characters in words may be done in two ways. First,
104the backslash character escapes any special interpretation of the
105character that comes after it, no matter where or what the following
106character is (it is not required even to be a special
107character). Thus, the only way to include a backslash in a word is to
108escape it with another backslash. Second, any interpretation of
109whitespace may be escaped using the citation mark character (only the
110ASCII one, U+0022 -- not any other Unicode quotes), by enclosing a
111string containing whitespace in citation marks. (Note that the citation
112marks need not necessarily be placed at the word boundaries, so the
113string ``\texttt{a"b c"d}'' is parsed as a single word ``\texttt{ab
114 cd}''.) Technically, this dual layer of quoting may seem like a
115liability when implementing the protocol, but it is quite convenient
116when talking directly to the daemon with a program such as
117\texttt{telnet}.
118
119\subsection{Formal description}
120
121Formally, the syntax of the protocol may be defined with the following
122BNF rules. Note that they all operate on Unicode characters, not bytes.
123
124\begin{tabular}{lcl}
125<session> & ::= & <SYN> <response> \\
126 & & | <session> <transaction> \\
127 & & | <session> <notification> \\
128<transaction> & ::= & <request> <response> \\
129<request> & ::= & <line> \\
130<response> & ::= & <resp-line-last> \\
131 & & | <resp-line-not-last> <response> \\
132 & & | <notification> <response> \\
133<resp-line-last> & ::= & <resp-code> <SPACE> <line> \\
134<resp-line-not-last> & ::= & <resp-code> <DASH> <line> \\
135<notification> & ::= & <notification-code> <SPACE> <line> \\
136<resp-code> & ::= & ``\texttt{2}'' <digit> <digit> \\
137 & & | ``\texttt{3}'' <digit> <digit> \\
138 & & | ``\texttt{5}'' <digit> <digit> \\
139<notification-code> & ::= & ``\texttt{6}'' <digit> <digit> \\
140<line> & ::= & <CRLF> \\
141 & & | <word> <ws> <line> \\
142<word> & ::= & <COMMON-CHAR> \\
143 & & | ``\texttt{$\backslash$}'' <CHAR> \\
144 & & | ``\texttt{"}'' <quoted-word> ``\texttt{"}'' \\
145 & & | <word> <word> \\
146<quoted-word> & ::= & ``'' \\
147 & & | <COMMON-CHAR> <quoted-word> \\
148 & & | <ws> <quoted-word> \\
149 & & | ``\texttt{$\backslash$}'' <CHAR> <quoted-word> \\
150<ws> & ::= & <1ws> | <1ws> <ws> \\
151<1ws> & ::= & <SPACE> | <TAB> \\
152<digit> & ::= & ``\texttt{0}'' |
153``\texttt{1}'' | ``\texttt{2}'' |
154``\texttt{3}'' | ``\texttt{4}'' \\
155& & | ``\texttt{5}'' | ``\texttt{6}'' |
156``\texttt{7}'' | ``\texttt{8}'' |
157``\texttt{9}''
158\end{tabular}
159
160As for the terminal symbols, <SPACE> is U+0020, <TAB> is U+0009,
161<CRLF> is the sequence of U+000D and U+000A, <DASH> is U+002D, <CHAR>
162is any Unicode character except U+0000, <COMMON-CHAR> is any
163Unicode character except U+0000, U+0009, U+000A, U+000D, U+0020,
164U+0022 and U+005C, and <SYN> is the out-of-band message that
165establishes the communication channel\footnote{This means that the
166 communication channel must support such a message. For example, raw
167 RS-232 would be hard to support.}. The following constraints also
168apply:
169\begin{itemize}
170\item <SYN> and <request> must be sent from the client to the daemon.
171\item <response> and <notification> must be sent from the daemon to
172 the client.
173\end{itemize}
174Note that the definition of <word> means that the only way to
175represent an empty word is by a pair of citation marks.
176
177In each request line, there should be at least one word, but it is not
178considered a syntax error if there is not. The first word in each
179request line is considered the name of the command to be carried out
180by the daemon. An empty line is a valid request as such, but since no
181matching command, it will provoke the same kind of error response as
182if a request with any other non-existing command were sent. Any
183remaining words on the line are considered arguments to the command.
184
185\section{Requests}
186For each arriving request, the daemon checks so that the request
187passes a number of tests before carrying it out. First, it matches the
188name of the command against the list of known commands to see if the
189request calls a valid command. If the command is not valid, the daemon
190sends a reponse with code 500. Then, it checks so that the request has
191the minimum required number of parameters for the given command. If it
192does not, it responds with a 501 code. Last, it checks so that the
193user account issuing the request has the necessary permissions to have
194the request carried out. If it does not, it responds with a 502
195code. After that, any responses are individual to the command in
196question. The intention of this section is to list them all.
197
198\subsection{Permissions}
199
200As for the permissions mentioned above, it is outside the scope of
201this document to describe the administration of
202permissions\footnote{Please see the \texttt{doldacond.conf(5)} man
203 page for more information on that topic.}, but some commands require
204certain permission, they need at least be specified. When a connection
205is established, it is associated with no permissions. At that point,
206only requests that do not require any permissions can be successfully
207issued. Normally, the first thing a client would do is to authenticate
208to the daemon. At the end of a successful authentication, the daemon
209associates the proper permissions with the connection over which
210authentication took place. The possible permissions are listed in
211table \ref{tab:perm}.
212
213\begin{table}
214 \begin{tabular}{rl}
215 Name & General description \\
216 \hline
217 \texttt{admin} & Required for all commands that administer the
218 daemon. \\
219 \texttt{fnetctl} & Required for all commands that alter the state of
220 connected hubs. \\
221 \texttt{trans} & Required for all commands that alter the state of
222 file transfers. \\
223 \texttt{transcu} & Required specifically for cancelling uploads. \\
224 \texttt{chat} & Required for exchanging chat messages. \\
225 \texttt{srch} & Required for issuing and querying searches. \\
226 \end{tabular}
227 \caption{The list of available permissions}
228 \label{tab:perm}
229\end{table}
230
231\subsection{Protocol revisions}
03ee2e4a 232\label{rev}
66e1551f
FT
233Since Dolda Connect is developing, its command set may change
234occasionally. Sometimes new commands are added, sometimes commands
235change argument syntax, and sometimes commands are removed. In order
236for clients to be able to cleanly cope with such changes, the protocol
237is revisioned. When a client connects to the daemon, the daemon
238indicates in the first response it sends the range of protocol
239revisions it supports, and each command listed below specifies the
240revision number from which its current specification is valid. A
241client should should check the revision range from the daemon so that
242it includes the revision that incorporates all commands that it wishes
243to use.
244
245Whenever the protocol changes at all, it is given a new revision
246number. If the entire protocol is backwards compatible with the
247previous version, the revision range sent by the server is updated to
248extend forward to the new revision. If the protocol in any way is not
249compatible with the previous revision, the revision range is moved
250entirely to the new revision. Therefore, a client can check for a
251certain revision and be sure that everything it wants is supported by
252the daemon.
253
03ee2e4a
FT
254At the time of this writing, the latest protocol revision is 2. Please
255see the file \texttt{doc/protorev} that comes with the Dolda Connect
256source tree for a full list of revisions and what changed between
257them.
258
66e1551f
FT
259\subsection{List of commands}
260
261Follows does a (hopefully) exhaustive listing of all commands valid
262for a request. For each possible request, it includes the name of the
03ee2e4a 263command for the request, the permissions required, the syntax for the
66e1551f
FT
264entire request line, and the possible responses.
265
266The syntax of the request and response lines is described in a format
267like that traditional of \unix\ man pages, with a number of terms,
268each corresponding to a word in the line. Each term in the syntax
269description is either a literal string, written in lower case; an
270argument, written in uppercase and meant to be replaced by some other
271text as described; an optional term, enclosed in brackets
272(``\texttt{[}'' and ``\texttt{]}''); or a list of alternatives,
273enclosed in braces (``\texttt{\{}'' and ``\texttt{\}}'') and separated
274by pipes (``\texttt{|}''). Possible repetition of a term is indicated
275by three dots (``\texttt{...}''), and, for the purpose of repition,
276terms may be groups with parentheses (``\texttt{(}'' and
277``\texttt{)}'').
278
279Two things should be noted regarding the responses. First, in the
280syntax description of responses, the response code is given as the
281first term, even though it is not actually considered a word. Second,
282more words may follow after the specified syntax, and should be
283discarded by a client. Many responses use that to include a human
284readable string to indicate the conclusion of the request.
285
286\subsubsection{Connection}
287As mentioned above, the act of connecting to the daemon is itself
288considered a request, soliciting a response. Such a request obviously
289has no command name and no syntax, but needs a description
290nonetheless.
291
03ee2e4a
FT
292\revision{1}
293
66e1551f
FT
294\noperm
295
296\begin{responses}
297 \response{200}
298 The old response given by daemons not yet using the revisioned
299 protocol. Clients receiving this response should consider it an
300 error.
03ee2e4a
FT
301 \response{201 LOREV HIREV}
302 Indicates that the connection is accepted. The \param{LOREV} and
303 \param{HIREV} parameters specify the range of supported protocol
304 revisions, as described in section \ref{rev}.
305 \response{502 REASON}
306 The connection is refused by the daemon and will be closed. The
307 \param{REASON} parameter states the reason for the refusal in
308 English\footnote{So it is probably not suitable for localized
309 programs}.
66e1551f 310\end{responses}
4ae8ca60 311
f6d0f511
FT
312\input{commands}
313
4ae8ca60 314\end{document}