1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
2 <!--Converted with LaTeX2HTML 98.1p1 release (March 2nd, 1998)
3 originally by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds
4 * revised and updated by: Marcus Hennecke, Ross Moore, Herb Swan
5 * with significant contributions from:
6 Jens Lippmann, Marek Rouchal, Martin Wilck and others -->
9 <TITLE>Rolling checksum</TITLE>
10 <META NAME="description" CONTENT="Rolling checksum">
11 <META NAME="keywords" CONTENT="tech_report">
12 <META NAME="resource-type" CONTENT="document">
13 <META NAME="distribution" CONTENT="global">
14 <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
15 <LINK REL="STYLESHEET" HREF="tech_report.css">
16 <LINK REL="next" HREF="node4.html">
17 <LINK REL="previous" HREF="node2.html">
18 <LINK REL="up" HREF="tech_report.html">
19 <LINK REL="next" HREF="node4.html">
22 <!--Navigation Panel-->
25 <IMG WIDTH="37" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="next"
26 SRC="/usr/local/src/latex2html/icons.gif/next_motif.gif"></A>
28 HREF="tech_report.html">
29 <IMG WIDTH="26" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="up"
30 SRC="/usr/local/src/latex2html/icons.gif/up_motif.gif"></A>
33 <IMG WIDTH="63" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="previous"
34 SRC="/usr/local/src/latex2html/icons.gif/previous_motif.gif"></A>
36 <B> Next:</B> <A NAME="tex2html43"
37 HREF="node4.html">Checksum searching</A>
38 <B> Up:</B> <A NAME="tex2html41"
39 HREF="tech_report.html">The rsync algorithm</A>
40 <B> Previous:</B> <A NAME="tex2html35"
41 HREF="node2.html">The rsync algorithm</A>
44 <!--End of Navigation Panel-->
46 <H1><A NAME="SECTION00030000000000000000">
51 The weak rolling checksum used in the rsync algorithm needs to have
52 the property that it is very cheap to calculate the checksum of a
54 <!-- MATH: $X_2 .. X_{n+1}$ -->
55 <I>X</I><SUB>2</SUB> .. <I>X</I><SUB><I>n</I>+1</SUB> given the checksum of buffer
56 <!-- MATH: $X_1 .. X_n$ -->
57 <I>X</I><SUB>1</SUB> .. <I>X</I><SUB><I>n</I></SUB> and
58 the values of the bytes <I>X</I><SUB>1</SUB> and <I>X</I><SUB><I>n</I>+1</SUB>.
61 The weak checksum algorithm we used in our implementation was inspired
62 by Mark Adler's adler-32 checksum. Our checksum is defined by
65 <!-- MATH: \begin{displaymath}
66 a(k,l) = (\sum_{i=k}^l X_i) \bmod M
71 WIDTH="176" HEIGHT="56"
73 ALT="\begin{displaymath}a(k,l) = (\sum_{i=k}^l X_i) \bmod M \end{displaymath}">
79 <!-- MATH: \begin{displaymath}
80 b(k,l) = (\sum_{i=k}^l (l-i+1)X_i) \bmod M
85 WIDTH="240" HEIGHT="56"
87 ALT="\begin{displaymath}b(k,l) = (\sum_{i=k}^l (l-i+1)X_i) \bmod M \end{displaymath}">
93 <!-- MATH: \begin{displaymath}
94 s(k,l) = a(k,l) + 2^{16} b(k,l)
98 <I>s</I>(<I>k</I>,<I>l</I>) = <I>a</I>(<I>k</I>,<I>l</I>) + 2<SUP>16</SUP> <I>b</I>(<I>k</I>,<I>l</I>)
103 where <I>s</I>(<I>k</I>,<I>l</I>) is the rolling checksum of the bytes
104 <!-- MATH: $X_k \ldots X_l$ -->
106 WIDTH="67" HEIGHT="29" ALIGN="MIDDLE" BORDER="0"
108 ALT="$X_k \ldots X_l$">.
109 For simplicity and speed, we use
110 <!-- MATH: $M = 2^{16}$ -->
111 <I>M</I> = 2<SUP>16</SUP>.
114 The important property of this checksum is that successive values can
115 be computed very efficiently using the recurrence relations
120 <!-- MATH: \begin{displaymath}
121 a(k+1,l+1) = (a(k,l) - X_k + X_{l+1}) \bmod M
122 \end{displaymath} -->
126 WIDTH="322" HEIGHT="28"
128 ALT="\begin{displaymath}a(k+1,l+1) = (a(k,l) - X_k + X_{l+1}) \bmod M \end{displaymath}">
134 <!-- MATH: \begin{displaymath}
135 b(k+1,l+1) = (b(k,l) - (l-k+1) X_k + a(k+1,l+1)) \bmod M
136 \end{displaymath} -->
140 WIDTH="454" HEIGHT="28"
142 ALT="\begin{displaymath}b(k+1,l+1) = (b(k,l) - (l-k+1) X_k + a(k+1,l+1)) \bmod M \end{displaymath}">
147 Thus the checksum can be calculated for blocks of length S at all
148 possible offsets within a file in a ``rolling'' fashion, with very
149 little computation at each point.
152 Despite its simplicity, this checksum was found to be quite adequate as
153 a first level check for a match of two file blocks. We have found in
154 practice that the probability of this checksum matching when the
155 blocks are not equal is quite low. This is important because the much
156 more expensive strong checksum must be calculated for each block where
157 the weak checksum matches.
161 <!--Navigation Panel-->
164 <IMG WIDTH="37" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="next"
165 SRC="/usr/local/src/latex2html/icons.gif/next_motif.gif"></A>
167 HREF="tech_report.html">
168 <IMG WIDTH="26" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="up"
169 SRC="/usr/local/src/latex2html/icons.gif/up_motif.gif"></A>
172 <IMG WIDTH="63" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="previous"
173 SRC="/usr/local/src/latex2html/icons.gif/previous_motif.gif"></A>
175 <B> Next:</B> <A NAME="tex2html43"
176 HREF="node4.html">Checksum searching</A>
177 <B> Up:</B> <A NAME="tex2html41"
178 HREF="tech_report.html">The rsync algorithm</A>
179 <B> Previous:</B> <A NAME="tex2html35"
180 HREF="node2.html">The rsync algorithm</A>
181 <!--End of Navigation Panel-->
183 <I>Andrew Tridgell</I>
184 <BR><I>1998-11-09</I>