3 Author: Christopher R. Hertel
4 License: GNU GPL version 3 or above
5 Some modules released under the LGPL v3.0 or above.
8 This code was developed in participation with the Protocol Freedom
9 Information Foundation.
10 See http://www.protocolfreedom.org/ for more information.
13 PrequelD is the Prequel PeerDist Daemon.
16 is a distributed content caching system that Microsoft put together for
17 Windows 7 and Windows 2008r2. The commercial name for the system is
18 "BranchCache", but if you get into the guts of their documentation they
19 refer to the underlying protocols as PeerDist.
22 is an Open Source project aimed at implementing PeerDist protocols for
23 Linux (and, eventually, *BSD and other Unix-like systems).
27 The PrequelD daemon generates PeerDist hashes from source content, and
28 stores those hashes in a separate directory so that they can be accessed
29 on demand. A PeerDist client can then read the hashes, formatted as
30 "Content Information", instead of the actual content.
32 Windows 7 and W2K8r2 systems support PeerDist Version 1 Content
33 Information retrieval over HTTPv1.1 and SMB2.1 protocols. Windows 8
34 systems and 2012 servers support PeerDist versions 1 and 2.
39 * Linux 2.6 or above (see below for other platforms)
40 * OpenSSL (again, see the notes below)
42 The initial release of PrequelD was written for and tested on Linux. It
43 should compile and run on most 2.6+ kernel systems. It has been tested on
44 32-bit Linux for ARM (Arch Linux), and both 32-bit and 64-bit x86 platforms.
46 The long-term plan is to port PrequelD to OpenBSD, with the goal of making
47 it work nicely on FreeBSD and NetBSD as well.
49 At this point, there is no MakeFile. Compile it from the command line as
50 shown below. The only external requirement is OpenSSL, but the code is
51 written to make it easy to replace OpenSSL with another hashing library.
53 cc -I.. -o PrequelD PrequelD.c Gstr.c PD_peerdist1.c PD_read_config.c \
54 PD_sha2_oSSL.c PD_utils.c ../ubi_sLinkList.c -lcrypto -lpthread
56 You can also add command-line definitions for the following constants:
59 Defaults to: /etc/prequeld/pd.conf
62 Defaults to: /var/run/prequeld.sock
65 Defaults to: /etc/prequeld/pd.key
67 The above are all default values that are compiled into the program.
69 When the program is run, an alternate configuration file can be specified
70 on the command-line. The socket and key file pathnames can be changed in
71 the configuration file.
73 Configuration File Syntax and Semantics
75 An example configuration file, prequeld.conf, is provided. Read it.
76 It'll get you going more quickly than reading this mess.
78 The configuration file is primarily a list of key/value pairs. It may
79 also contain cachedir and sourcedir blocks. The blocks make it easy to
80 group configuration options that apply to only a select set of source
81 directories. So, for example, the following would be a valid
84 # Example PrequelD config file.
85 hash1 sha512; # "hash1" is a key, "sha512" is a value.
86 verbosity 2; # Set the global verbosity level to 2.
87 cachedir /var/prequeld/cache/ # All settings within the brackets apply to
88 { # the hash cache in /var/prequeld/cache/.
89 sourcedir /var/www/; # A single-line source directory assignment.
90 sourcedir /data/share/ # A multi-line source directory assignment.
91 { # The parameters within the brackets are
92 verbosity 1; # set for the /data/share/ source dir only.
93 hash1 sha256; # We can override global settings locally.
98 In the above example, some configuration parameters are set globally, and
99 then overwritten for specific source directories. Things to note:
101 * A cachedir keyword must have both a directory specification and a block,
102 because a cachedir must contain at least one sourcedir (otherwise, it
103 wouldn't have anything to cache).
105 * A sourcedir keyword may be either a simple key/value pair, or may be
106 a block. The block contains keyword assignments that are specific to
109 * You can also put key/value pairs within a cachedir block, but remember
110 that they only apply to sourcedir entries that appear *later* in the
113 * Comments are introduced using a hash character ('#') and continue to the
114 end of the line. Whitespace is anything recognized as such by the
115 C-language isspace(3) function.
118 config :== { ( ignored | assignment ) }
119 ignored :== <whitespace> | comment
120 comment :== '#' { <not '\0', '\n', or EOF> } ('\n' | EOF)
121 assignment :== ( keyvalue | keyblock )
122 keyvalue :== keyword value ';'
123 keyblock :== keyword value block
124 keyword :== ( 'cachedir' | 'exclude' | 'hash1' | 'hash2' | 'keyfile'
125 | 'logfile' | 'minblocks' | 'socket' | 'sourcedir'
127 value :== ( <string> | <number> | list )
128 list :== <string> { ',' <string> }
129 block :== '{' config '}'
132 cachedir - A directory in which a cache of PeerDist hashes will be
133 stored. A cachedir keyword must be given a value (a
134 directory path) and must be followed by a block containing
135 at least one sourcedir assignment. There is no default
138 exclude - A list of files and/or directories to be excluded from
139 PeerDist hashing. Multiple exclude lines can be entered
140 within the same scope. There is no default value.
142 hash1 - The PeerDist v1 hash. This defaults to SHA256. Other valid
143 values are SHA384, SHA512, and None. A hash type of None
144 indicates that PeerDist v1 is not to be supported.
146 Note: Windows2K8r2 servers support SHA384 and SHA512
147 hashes, but there are no known PeerDist v1 clients that
148 support those hash types.
150 hash2 - The PeerDist v2 hash. This currently defaults to None
151 because PeerDist v2 is not yet supported. No other values
152 are currently defined.
154 keyfile - The pathname of the file containing the secret key used to
155 sign the PeerDist segment hashes. This is used to ensure
156 that [server, segment] pairs have unique identifiers. Note,
157 however, that multiple servers may share the same key.
158 Also, a different keyfile may be assigned to each
159 sourcedir. The default value is /etc/prequeld.key
161 logfile - A file to which to send log messages. By default, log
162 message are sent to <stderr> when the program starts up.
163 Logging is switched to syslog if the program is run as
164 a daemon, or to the specified log file if there is one.
165 This value may only be assigned in the global section,
166 as it applies to the daemon as a whole.
168 minblocks - The minimum size, in PeerDist blocks (64K), of a file that
169 is to be hashed. Files smaller than this size will not be
170 hashed. The default and minimum value is 1.
172 socket - The default pathname of a socket file through which clients
173 may query the PrequelD server to obtain PeerDist file
174 hashes. This value may only be assigned in the global
175 section, as it applies to the daemon as a whole.
176 The default is /var/run/prequeld.sock
178 sourcedir - A directory containing files which are to be hashed. All
179 subdirectories of the sourcedir will be included as well,
180 unless they are excluded using the "exclude" keyword.
181 This value must be assigned within a cachedir block, so
182 that the daemon knows where to store the hashes once
183 they are generated. There is no default sourcedir.
185 verbosity - An level of diagnostic messages that are logged. The higher
186 the value, the more output is generated. A value of 0
187 produces only error and warning messages. Higher values
188 provide more details on the personal life of the daemon.
189 Values greater than 10 are ridiculous.
192 Talking to the PrequelD Server:
194 PrequelD maintains an internal priority queue for file processing. If
195 this queue is empty, PrequelD will keep itself busy by going through the
196 cache directories, looking for stale hash files, and deleting them. It
197 will also check the source directories for files that should be, but are
198 not yet, hashed. These activitites take a back seat to hashing requests
199 received from actual clients (such as an HTTP server or Samba server).
201 Clients send requests to the Prequel Daemon via a Unix Domain socket.
203 If, for example, an HTTP server is looking for PeerDist hashes for file
204 '/var/www/bluthnr/flrbsnib/garglem.iso', but the PrequelD server has not
205 yet generated hashes for 'garglem.iso', the HTTP server can send a request
206 via the Unix Domain socket to prioritize the hashing of 'garglem.iso'.
208 These requests can be easily generated using the Prequel client library.
210 Here's the protocol used between client and server:
212 Each message has the following header structure:
213 uchar protoID; # Protocol version number.
214 uchar msgType; # Command or Error Code.
215 ushort msgID; # A client-assigned message ID.
217 + The <msgID> field is used by the client if the client is multiplexing
218 requests. It is opaque to, and must not be modified by, the server. It
219 must be returned in the response from the server.
221 + In messages sent by the client, the <msgType> field represents a
222 request. The server returns simple error codes in this field. Commands
223 and error codes are defined below.
224 + The <protoID> is incremented whenever existing message structures are
225 modified. New messages can be added without changing the value of
226 <protoID>, because such addition will not break existing clients. The
227 current maximum protocol level is defined in the header file of this
231 Query - Request that the server resolve a source pathname to a hash
232 file pathname. If the hash file does not exist, a request to
233 generate the hash file is queued and an error code is returned
236 Queue - Request that a source file be queued for hashing. The source
237 pathname will be added to the queue. An error message is only
238 returned if the server was unable to add the entry to the
239 queue. No error is returned if (for any reason) hashing fails.
240 (See the daemon log file for hashing errors.)
243 If the server encounters an error it will return an error message, as
244 shown in the next section. Clients must check the <msgType> field to
245 determine how to parse the response message body. If the <msgType>
246 value in a response is non-zero, the message should be parsed as an
247 error response message.
250 0 - No error. Success. It's okay. Nothing to see here. Move along.
251 If the error code is zero, the response message will be formatted as
252 a response to the matching request, and not as an error message.
253 1 - <Not yet defined.>
256 Following the 4-byte header is the body of the message. The structure
257 of each message must be clearly specified since that's the only message
258 framing we use. Variable-length fields (e.g. strings) must be length
262 ushort nameLen; # Number of bytes that follow.
263 uchar pathName[<nameLen>]; # The full source file pathname.
264 # NUL termination is optional.
267 ushort namelen; # The number of bytes that follow.
268 uchar pathname[<nameLen>]; # The full hash file pathname.
271 [Identical to the query request.]
277 ulong errno; # A system error code per errno(3).
278 ushort msgLen; # Length of the following error message.
279 uchar msg[<msgLen>]; # An optional error message string.
283 * Add support for PeerDist v2 Content Information.
284 * Ensure that PrequelD logging plays nicely with logrotate.
285 * Properly handle Unicode requests. Currently, we only handle ASCII.
286 * Neither the configuration file parser nor the command parser handles any
287 form of escape or quotation.
288 * Properly handle SIGHUP and SIGTERM.
289 * Support for the 'exclude' clause hasn't been implemented.
295 V1.1 Changed the communication between client and server.