1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
2 <!--Converted with LaTeX2HTML 98.1p1 release (March 2nd, 1998)
3 originally by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds
4 * revised and updated by: Marcus Hennecke, Ross Moore, Herb Swan
5 * with significant contributions from:
6 Jens Lippmann, Marek Rouchal, Martin Wilck and others -->
10 <META NAME="description" CONTENT="Results">
11 <META NAME="keywords" CONTENT="tech_report">
12 <META NAME="resource-type" CONTENT="document">
13 <META NAME="distribution" CONTENT="global">
14 <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
15 <LINK REL="STYLESHEET" HREF="tech_report.css">
16 <LINK REL="next" HREF="node7.html">
17 <LINK REL="previous" HREF="node5.html">
18 <LINK REL="up" HREF="tech_report.html">
19 <LINK REL="next" HREF="node7.html">
22 <!--Navigation Panel-->
25 <IMG WIDTH="37" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="next"
28 HREF="tech_report.html">
29 <IMG WIDTH="26" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="up"
33 <IMG WIDTH="63" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="previous"
34 SRC="previous.gif"></A>
36 <B> Next:</B> <A NAME="tex2html73"
37 HREF="node7.html">Availability</A>
38 <B> Up:</B> <A NAME="tex2html71"
39 HREF="tech_report.html">The rsync algorithm</A>
40 <B> Previous:</B> <A NAME="tex2html65"
41 HREF="node5.html">Pipelining</A>
44 <!--End of Navigation Panel-->
46 <H1><A NAME="SECTION00060000000000000000">
51 To test the algorithm, tar files were created of the Linux kernel
52 sources for two versions of the kernel. The two kernel versions were
53 1.99.10 and 2.0.0. These tar files are approximately 24MB in size and
54 are separated by 5 released patch levels.
57 Out of the 2441 files in the 1.99.10 release, 291 files had changed in
58 the 2.0.0 release, 19 files had been removed and 25 files had been
62 A ``diff'' of the two tar files using the standard GNU diff utility
63 produced over 32 thousand lines of output totalling 2.1 MB.
66 The following table shows the results for rsync between the two files
67 with a varying block size.<A NAME="tex2html2"
68 HREF="footnode.html#foot24"><SUP>2</SUP></A>
73 <TABLE CELLPADDING=3 BORDER="1">
74 <TR><TD ALIGN="LEFT"><B> block</B></TD>
75 <TD ALIGN="LEFT"><B> matches</B></TD>
76 <TD ALIGN="LEFT"><B> tag</B></TD>
77 <TD ALIGN="LEFT"><B> false</B></TD>
78 <TD ALIGN="LEFT"><B> data</B></TD>
79 <TD ALIGN="LEFT"><B> written</B></TD>
80 <TD ALIGN="LEFT"><B> read</B></TD>
82 <TR><TD ALIGN="LEFT"><B> size</B></TD>
83 <TD ALIGN="LEFT"> </TD>
84 <TD ALIGN="LEFT"><B> hits</B></TD>
85 <TD ALIGN="LEFT"><B> alarms</B></TD>
86 <TD ALIGN="LEFT"> </TD>
87 <TD ALIGN="LEFT"> </TD>
88 <TD ALIGN="LEFT"> </TD>
90 <TR><TD ALIGN="LEFT"><P>
92 <TD ALIGN="LEFT">64247</TD>
93 <TD ALIGN="LEFT">3817434</TD>
94 <TD ALIGN="LEFT">948</TD>
95 <TD ALIGN="LEFT">5312200</TD>
96 <TD ALIGN="LEFT">5629158</TD>
97 <TD ALIGN="LEFT">1632284</TD>
99 <TR><TD ALIGN="LEFT">500</TD>
100 <TD ALIGN="LEFT">46989</TD>
101 <TD ALIGN="LEFT">620013</TD>
102 <TD ALIGN="LEFT">64</TD>
103 <TD ALIGN="LEFT">1091900</TD>
104 <TD ALIGN="LEFT">1283906</TD>
105 <TD ALIGN="LEFT">979384</TD>
107 <TR><TD ALIGN="LEFT">700</TD>
108 <TD ALIGN="LEFT">33255</TD>
109 <TD ALIGN="LEFT">571970</TD>
110 <TD ALIGN="LEFT">22</TD>
111 <TD ALIGN="LEFT">1307800</TD>
112 <TD ALIGN="LEFT">1444346</TD>
113 <TD ALIGN="LEFT">699564</TD>
115 <TR><TD ALIGN="LEFT">900</TD>
116 <TD ALIGN="LEFT">25686</TD>
117 <TD ALIGN="LEFT">525058</TD>
118 <TD ALIGN="LEFT">24</TD>
119 <TD ALIGN="LEFT">1469500</TD>
120 <TD ALIGN="LEFT">1575438</TD>
121 <TD ALIGN="LEFT">544124</TD>
123 <TR><TD ALIGN="LEFT">1100</TD>
124 <TD ALIGN="LEFT">20848</TD>
125 <TD ALIGN="LEFT">496844</TD>
126 <TD ALIGN="LEFT">21</TD>
127 <TD ALIGN="LEFT">1654500</TD>
128 <TD ALIGN="LEFT">1740838</TD>
129 <TD ALIGN="LEFT">445204</TD>
136 In each case, the CPU time taken was less than the
137 time it takes to run ``diff'' on the two files.<A NAME="tex2html3"
138 HREF="footnode.html#foot40"><SUP>3</SUP></A>
141 The columns in the table are as follows:
145 <DT><STRONG>block size</STRONG>
146 <DD>The size in bytes of the checksummed blocks.
147 <DT><STRONG>matches</STRONG>
148 <DD>The number of times a block of <I>B</I> was found in <I>A</I>.
149 <DT><STRONG>tag hits</STRONG>
150 <DD>The number of times the 16 bit hash of the rolling
151 checksum matched a hash of one of the checksums from <I>B</I>.
152 <DT><STRONG>false alarms</STRONG>
153 <DD>The number of times the 32 bit rolling checksum
154 matched but the strong checksum didn't.
155 <DT><STRONG>data</STRONG>
156 <DD>The amount of file data transferred verbatim, in bytes.
157 <DT><STRONG>written</STRONG>
158 <DD>The total number of bytes written by <IMG
159 WIDTH="15" HEIGHT="13" ALIGN="BOTTOM" BORDER="0"
162 including protocol overheads. This is almost all file data.
163 <DT><STRONG>read</STRONG>
164 <DD>The total number of bytes read by <IMG
165 WIDTH="15" HEIGHT="13" ALIGN="BOTTOM" BORDER="0"
169 protocol overheads. This is almost all checksum information.
172 The results demonstrate that for block sizes above 300 bytes, only a
173 small fraction (around 5%) of the file was transferred. The amount
174 transferred was also considerably less than the size of the diff file
175 that would have been transferred if the diff/patch method of updating
176 a remote file was used.
179 The checksums themselves took up a considerable amount of space,
180 although much less than the size of the data transferred in each
181 case. Each pair of checksums consumes 20 bytes: 4 bytes for the
182 rolling checksum plus 16 bytes for the 128-bit MD4 checksum.
185 The number of false alarms was less than 1/1000 of the number of
186 true matches, indicating that the 32 bit rolling checksum is quite
187 good at screening out false matches.
190 The number of tag hits indicates that the second level of the
191 checksum search algorithm was invoked about once every 50
192 characters. This is quite high because the total number of blocks in
193 the file is a large fraction of the size of the tag hash table. For
194 smaller files we would expect the tag hit rate to be much closer to
195 the number of matches. For extremely large files, we should probably
196 increase the size of the hash table.
199 The next table shows similar results for a much smaller set of files.
200 In this case the files were not packed into a tar file first. Rather,
201 rsync was invoked with an option to recursively descend the directory
202 tree. The files used were from two source releases of another software
203 package called Samba. The total source code size is 1.7 MB and the
204 diff between the two releases is 4155 lines long totalling 120 kB.
209 <TABLE CELLPADDING=3 BORDER="1">
210 <TR><TD ALIGN="LEFT"><B> block</B></TD>
211 <TD ALIGN="LEFT"><B> matches</B></TD>
212 <TD ALIGN="LEFT"><B> tag</B></TD>
213 <TD ALIGN="LEFT"><B> false</B></TD>
214 <TD ALIGN="LEFT"><B> data</B></TD>
215 <TD ALIGN="LEFT"><B> written</B></TD>
216 <TD ALIGN="LEFT"><B> read</B></TD>
218 <TR><TD ALIGN="LEFT"><B> size</B></TD>
219 <TD ALIGN="LEFT"> </TD>
220 <TD ALIGN="LEFT"><B> hits</B></TD>
221 <TD ALIGN="LEFT"><B> alarms</B></TD>
222 <TD ALIGN="LEFT"> </TD>
223 <TD ALIGN="LEFT"> </TD>
224 <TD ALIGN="LEFT"> </TD>
226 <TR><TD ALIGN="LEFT"><P>
228 <TD ALIGN="LEFT">3727</TD>
229 <TD ALIGN="LEFT">3899</TD>
230 <TD ALIGN="LEFT">0</TD>
231 <TD ALIGN="LEFT">129775</TD>
232 <TD ALIGN="LEFT">153999</TD>
233 <TD ALIGN="LEFT">83948</TD>
235 <TR><TD ALIGN="LEFT">500</TD>
236 <TD ALIGN="LEFT">2158</TD>
237 <TD ALIGN="LEFT">2325</TD>
238 <TD ALIGN="LEFT">0</TD>
239 <TD ALIGN="LEFT">171574</TD>
240 <TD ALIGN="LEFT">189330</TD>
241 <TD ALIGN="LEFT">50908</TD>
243 <TR><TD ALIGN="LEFT">700</TD>
244 <TD ALIGN="LEFT">1517</TD>
245 <TD ALIGN="LEFT">1649</TD>
246 <TD ALIGN="LEFT">0</TD>
247 <TD ALIGN="LEFT">195024</TD>
248 <TD ALIGN="LEFT">210144</TD>
249 <TD ALIGN="LEFT">36828</TD>
251 <TR><TD ALIGN="LEFT">900</TD>
252 <TD ALIGN="LEFT">1156</TD>
253 <TD ALIGN="LEFT">1281</TD>
254 <TD ALIGN="LEFT">0</TD>
255 <TD ALIGN="LEFT">222847</TD>
256 <TD ALIGN="LEFT">236471</TD>
257 <TD ALIGN="LEFT">29048</TD>
259 <TR><TD ALIGN="LEFT">1100</TD>
260 <TD ALIGN="LEFT">921</TD>
261 <TD ALIGN="LEFT">1049</TD>
262 <TD ALIGN="LEFT">0</TD>
263 <TD ALIGN="LEFT">250073</TD>
264 <TD ALIGN="LEFT">262725</TD>
265 <TD ALIGN="LEFT">23988</TD>
273 <!--Navigation Panel-->
276 <IMG WIDTH="37" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="next"
279 HREF="tech_report.html">
280 <IMG WIDTH="26" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="up"
284 <IMG WIDTH="63" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="previous"
285 SRC="previous.gif"></A>
287 <B> Next:</B> <A NAME="tex2html73"
288 HREF="node7.html">Availability</A>
289 <B> Up:</B> <A NAME="tex2html71"
290 HREF="tech_report.html">The rsync algorithm</A>
291 <B> Previous:</B> <A NAME="tex2html65"
292 HREF="node5.html">Pipelining</A>
293 <!--End of Navigation Panel-->
295 <I>Andrew Tridgell</I>
296 <BR><I>1998-11-09</I>