lib/zlib/examples/zlib_how.html

   1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
   2   "http://www.w3.org/TR/REC-html40/loose.dtd">
   3 <html>
   4 <head>
   5 <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
   6 <title>zlib Usage Example</title>
   7 <!--  Copyright (c) 2004 Mark Adler.  -->
   8 </head>
   9 <body bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#00A000">
  10 <h2 align="center"> zlib Usage Example </h2>
  11 We often get questions about how the <tt>deflate()</tt> and <tt>inflate()</tt> functions should be used.
  12 Users wonder when they should provide more input, when they should use more output,
  13 what to do with a <tt>Z_BUF_ERROR</tt>, how to make sure the process terminates properly, and
  14 so on.  So for those who have read <tt>zlib.h</tt> (a few times), and
  15 would like further edification, below is an annotated example in C of simple routines to compress and decompress
  16 from an input file to an output file using <tt>deflate()</tt> and <tt>inflate()</tt> respectively.  The
  17 annotations are interspersed between lines of the code.  So please read between the lines.
  18 We hope this helps explain some of the intricacies of <em>zlib</em>.
  19 <p>
  20 Without further adieu, here is the program <a href="zpipe.c"><tt>zpipe.c</tt></a>:
  21 <pre><b>
  22 /* zpipe.c: example of proper use of zlib's inflate() and deflate()
  23    Not copyrighted -- provided to the public domain
  24    Version 1.2  9 November 2004  Mark Adler */
  25
  26 /* Version history:
  27    1.0  30 Oct 2004  First version
  28    1.1   8 Nov 2004  Add void casting for unused return values
  29                      Use switch statement for inflate() return values
  30    1.2   9 Nov 2004  Add assertions to document zlib guarantees
  31  */
  32 </b></pre><!-- -->
  33 We now include the header files for the required definitions.  From
  34 <tt>stdio.h</tt> we use <tt>fopen()</tt>, <tt>fread()</tt>, <tt>fwrite()</tt>,
  35 <tt>feof()</tt>, <tt>ferror()</tt>, and <tt>fclose()</tt> for file i/o, and
  36 <tt>fputs()</tt> for error messages.  From <tt>string.h</tt> we use
  37 <tt>strcmp()</tt> for command line argument processing.
  38 From <tt>assert.h</tt> we use the <tt>assert()</tt> macro.
  39 From <tt>zlib.h</tt>
  40 we use the basic compression functions <tt>deflateInit()</tt>,
  41 <tt>deflate()</tt>, and <tt>deflateEnd()</tt>, and the basic decompression
  42 functions <tt>inflateInit()</tt>, <tt>inflate()</tt>, and
  43 <tt>inflateEnd()</tt>.
  44 <pre><b>
  45 #include &lt;stdio.h&gt;
  46 #include &lt;string.h&gt;
  47 #include &lt;assert.h&gt;
  48 #include "zlib.h"
  49 </b></pre><!-- -->
  50 <tt>CHUNK</tt> is simply the buffer size for feeding data to and pulling data
  51 from the <em>zlib</em> routines.  Larger buffer sizes would be more efficient,
  52 especially for <tt>inflate()</tt>.  If the memory is available, buffers sizes
  53 on the order of 128K or 256K bytes should be used.
  54 <pre><b>
  55 #define CHUNK 16384
  56 </b></pre><!-- -->
  57 The <tt>def()</tt> routine compresses data from an input file to an output file.  The output data
  58 will be in the <em>zlib</em> format, which is different from the <em>gzip</em> or <em>zip</em>
  59 formats.  The <em>zlib</em> format has a very small header of only two bytes to identify it as
  60 a <em>zlib</em> stream and to provide decoding information, and a four-byte trailer with a fast
  61 check value to verify the integrity of the uncompressed data after decoding.
  62 <pre><b>
  63 /* Compress from file source to file dest until EOF on source.
  64    def() returns Z_OK on success, Z_MEM_ERROR if memory could not be
  65    allocated for processing, Z_STREAM_ERROR if an invalid compression
  66    level is supplied, Z_VERSION_ERROR if the version of zlib.h and the
  67    version of the library linked do not match, or Z_ERRNO if there is
  68    an error reading or writing the files. */
  69 int def(FILE *source, FILE *dest, int level)
  70 {
  71 </b></pre>
  72 Here are the local variables for <tt>def()</tt>.  <tt>ret</tt> will be used for <em>zlib</em>
  73 return codes.  <tt>flush</tt> will keep track of the current flushing state for <tt>deflate()</tt>,
  74 which is either no flushing, or flush to completion after the end of the input file is reached.
  75 <tt>have</tt> is the amount of data returned from <tt>deflate()</tt>.  The <tt>strm</tt> structure
  76 is used to pass information to and from the <em>zlib</em> routines, and to maintain the
  77 <tt>deflate()</tt> state.  <tt>in</tt> and <tt>out</tt> are the input and output buffers for
  78 <tt>deflate()</tt>.
  79 <pre><b>
  80     int ret, flush;
  81     unsigned have;
  82     z_stream strm;
  83     char in[CHUNK];
  84     char out[CHUNK];
  85 </b></pre><!-- -->
  86 The first thing we do is to initialize the <em>zlib</em> state for compression using
  87 <tt>deflateInit()</tt>.  This must be done before the first use of <tt>deflate()</tt>.
  88 The <tt>zalloc</tt>, <tt>zfree</tt>, and <tt>opaque</tt> fields in the <tt>strm</tt>
  89 structure must be initialized before calling <tt>deflateInit()</tt>.  Here they are
  90 set to the <em>zlib</em> constant <tt>Z_NULL</tt> to request that <em>zlib</em> use
  91 the default memory allocation routines.  An application may also choose to provide
  92 custom memory allocation routines here.  <tt>deflateInit()</tt> will allocate on the
  93 order of 256K bytes for the internal state.
  94 (See <a href="zlib_tech.html"><em>zlib Technical Details</em></a>.)
  95 <p>
  96 <tt>deflateInit()</tt> is called with a pointer to the structure to be initialized and
  97 the compression level, which is an integer in the range of -1 to 9.  Lower compression
  98 levels result in faster execution, but less compression.  Higher levels result in
  99 greater compression, but slower execution.  The <em>zlib</em> constant Z_DEFAULT_COMPRESSION,
 100 equal to -1,
 101 provides a good compromise between compression and speed and is equivalent to level 6.
 102 Level 0 actually does no compression at all, and in fact expands the data slightly to produce
 103 the <em>zlib</em> format (it is not a byte-for-byte copy of the input).
 104 More advanced applications of <em>zlib</em>
 105 may use <tt>deflateInit2()</tt> here instead.  Such an application may want to reduce how
 106 much memory will be used, at some price in compression.  Or it may need to request a
 107 <em>gzip</em> header and trailer instead of a <em>zlib</em> header and trailer, or raw
 108 encoding with no header or trailer at all.
 109 <p>
 110 We must check the return value of <tt>deflateInit()</tt> against the <em>zlib</em> constant
 111 <tt>Z_OK</tt> to make sure that it was able to
 112 allocate memory for the internal state, and that the provided arguments were valid.
 113 <tt>deflateInit()</tt> will also check that the version of <em>zlib</em> that the <tt>zlib.h</tt>
 114 file came from matches the version of <em>zlib</em> actually linked with the program.  This
 115 is especially important for environments in which <em>zlib</em> is a shared library.
 116 <p>
 117 Note that an application can initialize multiple, independent <em>zlib</em> streams, which can
 118 operate in parallel.  The state information maintained in the structure allows the <em>zlib</em>
 119 routines to be reentrant.
 120 <pre><b>
 121     /* allocate deflate state */
 122     strm.zalloc = Z_NULL;
 123     strm.zfree = Z_NULL;
 124     strm.opaque = Z_NULL;
 125     ret = deflateInit(&amp;strm, level);
 126     if (ret != Z_OK)
 127         return ret;
 128 </b></pre><!-- -->
 129 With the pleasantries out of the way, now we can get down to business.  The outer <tt>do</tt>-loop
 130 reads all of the input file and exits at the bottom of the loop once end-of-file is reached.
 131 This loop contains the only call of <tt>deflate()</tt>.  So we must make sure that all of the
 132 input data has been processed and that all of the output data has been generated and consumed
 133 before we fall out of the loop at the bottom.
 134 <pre><b>
 135     /* compress until end of file */
 136     do {
 137 </b></pre>
 138 We start off by reading data from the input file.  The number of bytes read is put directly
 139 into <tt>avail_in</tt>, and a pointer to those bytes is put into <tt>next_in</tt>.  We also
 140 check to see if end-of-file on the input has been reached.  If we are at the end of file, then <tt>flush</tt> is set to the
 141 <em>zlib</em> constant <tt>Z_FINISH</tt>, which is later passed to <tt>deflate()</tt> to
 142 indicate that this is the last chunk of input data to compress.  We need to use <tt>feof()</tt>
 143 to check for end-of-file as opposed to seeing if fewer than <tt>CHUNK</tt> bytes have been read.  The
 144 reason is that if the input file length is an exact multiple of <tt>CHUNK</tt>, we will miss
 145 the fact that we got to the end-of-file, and not know to tell <tt>deflate()</tt> to finish
 146 up the compressed stream.  If we are not yet at the end of the input, then the <em>zlib</em>
 147 constant <tt>Z_NO_FLUSH</tt> will be passed to <tt>deflate</tt> to indicate that we are still
 148 in the middle of the uncompressed data.
 149 <p>
 150 If there is an error in reading from the input file, the process is aborted with
 151 <tt>deflateEnd()</tt> being called to free the allocated <em>zlib</em> state before returning
 152 the error.  We wouldn't want a memory leak, now would we?  <tt>deflateEnd()</tt> can be called
 153 at any time after the state has been initialized.  Once that's done, <tt>deflateInit()</tt> (or
 154 <tt>deflateInit2()</tt>) would have to be called to start a new compression process.  There is
 155 no point here in checking the <tt>deflateEnd()</tt> return code.  The deallocation can't fail.
 156 <pre><b>
 157         strm.avail_in = fread(in, 1, CHUNK, source);
 158         if (ferror(source)) {
 159             (void)deflateEnd(&amp;strm);
 160             return Z_ERRNO;
 161         }
 162         flush = feof(source) ? Z_FINISH : Z_NO_FLUSH;
 163         strm.next_in = in;
 164 </b></pre><!-- -->
 165 The inner <tt>do</tt>-loop passes our chunk of input data to <tt>deflate()</tt>, and then
 166 keeps calling <tt>deflate()</tt> until it is done producing output.  Once there is no more
 167 new output, <tt>deflate()</tt> is guaranteed to have consumed all of the input, i.e.,
 168 <tt>avail_in</tt> will be zero.
 169 <pre><b>
 170         /* run deflate() on input until output buffer not full, finish
 171            compression if all of source has been read in */
 172         do {
 173 </b></pre>
 174 Output space is provided to <tt>deflate()</tt> by setting <tt>avail_out</tt> to the number
 175 of available output bytes and <tt>next_out</tt> to a pointer to that space.
 176 <pre><b>
 177             strm.avail_out = CHUNK;
 178             strm.next_out = out;
 179 </b></pre>
 180 Now we call the compression engine itself, <tt>deflate()</tt>.  It takes as many of the
 181 <tt>avail_in</tt> bytes at <tt>next_in</tt> as it can process, and writes as many as
 182 <tt>avail_out</tt> bytes to <tt>next_out</tt>.  Those counters and pointers are then
 183 updated past the input data consumed and the output data written.  It is the amount of
 184 output space available that may limit how much input is consumed.
 185 Hence the inner loop to make sure that
 186 all of the input is consumed by providing more output space each time.  Since <tt>avail_in</tt>
 187 and <tt>next_in</tt> are updated by <tt>deflate()</tt>, we don't have to mess with those
 188 between <tt>deflate()</tt> calls until it's all used up.
 189 <p>
 190 The parameters to <tt>deflate()</tt> are a pointer to the <tt>strm</tt> structure containing
 191 the input and output information and the internal compression engine state, and a parameter
 192 indicating whether and how to flush data to the output.  Normally <tt>deflate</tt> will consume
 193 several K bytes of input data before producing any output (except for the header), in order
 194 to accumulate statistics on the data for optimum compression.  It will then put out a burst of
 195 compressed data, and proceed to consume more input before the next burst.  Eventually,
 196 <tt>deflate()</tt>
 197 must be told to terminate the stream, complete the compression with provided input data, and
 198 write out the trailer check value.  <tt>deflate()</tt> will continue to compress normally as long
 199 as the flush parameter is <tt>Z_NO_FLUSH</tt>.  Once the <tt>Z_FINISH</tt> parameter is provided,
 200 <tt>deflate()</tt> will begin to complete the compressed output stream.  However depending on how
 201 much output space is provided, <tt>deflate()</tt> may have to be called several times until it
 202 has provided the complete compressed stream, even after it has consumed all of the input.  The flush
 203 parameter must continue to be <tt>Z_FINISH</tt> for those subsequent calls.
 204 <p>
 205 There are other values of the flush parameter that are used in more advanced applications.  You can
 206 force <tt>deflate()</tt> to produce a burst of output that encodes all of the input data provided
 207 so far, even if it wouldn't have otherwise, for example to control data latency on a link with
 208 compressed data.  You can also ask that <tt>deflate()</tt> do that as well as erase any history up to
 209 that point so that what follows can be decompressed independently, for example for random access
 210 applications.  Both requests will degrade compression by an amount depending on how often such
 211 requests are made.
 212 <p>
 213 <tt>deflate()</tt> has a return value that can indicate errors, yet we do not check it here.  Why
 214 not?  Well, it turns out that <tt>deflate()</tt> can do no wrong here.  Let's go through
 215 <tt>deflate()</tt>'s return values and dispense with them one by one.  The possible values are
 216 <tt>Z_OK</tt>, <tt>Z_STREAM_END</tt>, <tt>Z_STREAM_ERROR</tt>, or <tt>Z_BUF_ERROR</tt>.  <tt>Z_OK</tt>
 217 is, well, ok.  <tt>Z_STREAM_END</tt> is also ok and will be returned for the last call of
 218 <tt>deflate()</tt>.  This is already guaranteed by calling <tt>deflate()</tt> with <tt>Z_FINISH</tt>
 219 until it has no more output.  <tt>Z_STREAM_ERROR</tt> is only possible if the stream is not
 220 initialized properly, but we did initialize it properly.  There is no harm in checking for
 221 <tt>Z_STREAM_ERROR</tt> here, for example to check for the possibility that some
 222 other part of the application inadvertently clobbered the memory containing the <em>zlib</em> state.
 223 <tt>Z_BUF_ERROR</tt> will be explained further below, but
 224 suffice it to say that this is simply an indication that <tt>deflate()</tt> could not consume
 225 more input or produce more output.  <tt>deflate()</tt> can be called again with more output space
 226 or more available input, which it will be in this code.
 227 <pre><b>
 228             ret = deflate(&amp;strm, flush);    /* no bad return value */
 229             assert(ret != Z_STREAM_ERROR);  /* state not clobbered */
 230 </b></pre>
 231 Now we compute how much output <tt>deflate()</tt> provided on the last call, which is the
 232 difference between how much space was provided before the call, and how much output space
 233 is still available after the call.  Then that data, if any, is written to the output file.
 234 We can then reuse the output buffer for the next call of <tt>deflate()</tt>.  Again if there
 235 is a file i/o error, we call <tt>deflateEnd()</tt> before returning to avoid a memory leak.
 236 <pre><b>
 237             have = CHUNK - strm.avail_out;
 238             if (fwrite(out, 1, have, dest) != have || ferror(dest)) {
 239                 (void)deflateEnd(&amp;strm);
 240                 return Z_ERRNO;
 241             }
 242 </b></pre>
 243 The inner <tt>do</tt>-loop is repeated until the last <tt>deflate()</tt> call fails to fill the
 244 provided output buffer.  Then we know that <tt>deflate()</tt> has done as much as it can with
 245 the provided input, and that all of that input has been consumed.  We can then fall out of this
 246 loop and reuse the input buffer.
 247 <p>
 248 The way we tell that <tt>deflate()</tt> has no more output is by seeing that it did not fill
 249 the output buffer, leaving <tt>avail_out</tt> greater than zero.  However suppose that
 250 <tt>deflate()</tt> has no more output, but just so happened to exactly fill the output buffer!
 251 <tt>avail_out</tt> is zero, and we can't tell that <tt>deflate()</tt> has done all it can.
 252 As far as we know, <tt>deflate()</tt>
 253 has more output for us.  So we call it again.  But now <tt>deflate()</tt> produces no output
 254 at all, and <tt>avail_out</tt> remains unchanged as <tt>CHUNK</tt>.  That <tt>deflate()</tt> call
 255 wasn't able to do anything, either consume input or produce output, and so it returns
 256 <tt>Z_BUF_ERROR</tt>.  (See, I told you I'd cover this later.)  However this is not a problem at
 257 all.  Now we finally have the desired indication that <tt>deflate()</tt> is really done,
 258 and so we drop out of the inner loop to provide more input to <tt>deflate()</tt>.
 259 <p>
 260 With <tt>flush</tt> set to <tt>Z_FINISH</tt>, this final set of <tt>deflate()</tt> calls will
 261 complete the output stream.  Once that is done, subsequent calls of <tt>deflate()</tt> would return
 262 <tt>Z_STREAM_ERROR</tt> if the flush parameter is not <tt>Z_FINISH</tt>, and do no more processing
 263 until the state is reinitialized.
 264 <p>
 265 Some applications of <em>zlib</em> have two loops that call <tt>deflate()</tt>
 266 instead of the single inner loop we have here.  The first loop would call
 267 without flushing and feed all of the data to <tt>deflate()</tt>.  The second loop would call
 268 <tt>deflate()</tt> with no more
 269 data and the <tt>Z_FINISH</tt> parameter to complete the process.  As you can see from this
 270 example, that can be avoided by simply keeping track of the current flush state.
 271 <pre><b>
 272         } while (strm.avail_out == 0);
 273         assert(strm.avail_in == 0);     /* all input will be used */
 274 </b></pre><!-- -->
 275 Now we check to see if we have already processed all of the input file.  That information was
 276 saved in the <tt>flush</tt> variable, so we see if that was set to <tt>Z_FINISH</tt>.  If so,
 277 then we're done and we fall out of the outer loop.  We're guaranteed to get <tt>Z_STREAM_END</tt>
 278 from the last <tt>deflate()</tt> call, since we ran it until the last chunk of input was
 279 consumed and all of the output was generated.
 280 <pre><b>
 281         /* done when last data in file processed */
 282     } while (flush != Z_FINISH);
 283     assert(ret == Z_STREAM_END);        /* stream will be complete */
 284 </b></pre><!-- -->
 285 The process is complete, but we still need to deallocate the state to avoid a memory leak
 286 (or rather more like a memory hemorrhage if you didn't do this).  Then
 287 finally we can return with a happy return value.
 288 <pre><b>
 289     /* clean up and return */
 290     (void)deflateEnd(&amp;strm);
 291     return Z_OK;
 292 }
 293 </b></pre><!-- -->
 294 Now we do the same thing for decompression in the <tt>inf()</tt> routine. <tt>inf()</tt>
 295 decompresses what is hopefully a valid <em>zlib</em> stream from the input file and writes the
 296 uncompressed data to the output file.  Much of the discussion above for <tt>def()</tt>
 297 applies to <tt>inf()</tt> as well, so the discussion here will focus on the differences between
 298 the two.
 299 <pre><b>
 300 /* Decompress from file source to file dest until stream ends or EOF.
 301    inf() returns Z_OK on success, Z_MEM_ERROR if memory could not be
 302    allocated for processing, Z_DATA_ERROR if the deflate data is
 303    invalid or incomplete, Z_VERSION_ERROR if the version of zlib.h and
 304    the version of the library linked do not match, or Z_ERRNO if there
 305    is an error reading or writing the files. */
 306 int inf(FILE *source, FILE *dest)
 307 {
 308 </b></pre>
 309 The local variables have the same functionality as they do for <tt>def()</tt>.  The
 310 only difference is that there is no <tt>flush</tt> variable, since <tt>inflate()</tt>
 311 can tell from the <em>zlib</em> stream itself when the stream is complete.
 312 <pre><b>
 313     int ret;
 314     unsigned have;
 315     z_stream strm;
 316     char in[CHUNK];
 317     char out[CHUNK];
 318 </b></pre><!-- -->
 319 The initialization of the state is the same, except that there is no compression level,
 320 of course, and two more elements of the structure are initialized.  <tt>avail_in</tt>
 321 and <tt>next_in</tt> must be initialized before calling <tt>inflateInit()</tt>.  This
 322 is because the application has the option to provide the start of the zlib stream in
 323 order for <tt>inflateInit()</tt> to have access to information about the compression
 324 method to aid in memory allocation.  In the current implementation of <em>zlib</em>
 325 (up through versions 1.2.x), the method-dependent memory allocations are deferred to the first call of
 326 <tt>inflate()</tt> anyway.  However those fields must be initialized since later versions
 327 of <em>zlib</em> that provide more compression methods may take advantage of this interface.
 328 In any case, no decompression is performed by <tt>inflateInit()</tt>, so the
 329 <tt>avail_out</tt> and <tt>next_out</tt> fields do not need to be initialized before calling.
 330 <p>
 331 Here <tt>avail_in</tt> is set to zero and <tt>next_in</tt> is set to <tt>Z_NULL</tt> to
 332 indicate that no input data is being provided.
 333 <pre><b>
 334     /* allocate inflate state */
 335     strm.zalloc = Z_NULL;
 336     strm.zfree = Z_NULL;
 337     strm.opaque = Z_NULL;
 338     strm.avail_in = 0;
 339     strm.next_in = Z_NULL;
 340     ret = inflateInit(&amp;strm);
 341     if (ret != Z_OK)
 342         return ret;
 343 </b></pre><!-- -->
 344 The outer <tt>do</tt>-loop decompresses input until <tt>inflate()</tt> indicates
 345 that it has reached the end of the compressed data and has produced all of the uncompressed
 346 output.  This is in contrast to <tt>def()</tt> which processes all of the input file.
 347 If end-of-file is reached before the compressed data self-terminates, then the compressed
 348 data is incomplete and an error is returned.
 349 <pre><b>
 350     /* decompress until deflate stream ends or end of file */
 351     do {
 352 </b></pre>
 353 We read input data and set the <tt>strm</tt> structure accordingly.  If we've reached the
 354 end of the input file, then we leave the outer loop and report an error, since the
 355 compressed data is incomplete.  Note that we may read more data than is eventually consumed
 356 by <tt>inflate()</tt>, if the input file continues past the <em>zlib</em> stream.
 357 For applications where <em>zlib</em> streams are embedded in other data, this routine would
 358 need to be modified to return the unused data, or at least indicate how much of the input
 359 data was not used, so the application would know where to pick up after the <em>zlib</em> stream.
 360 <pre><b>
 361         strm.avail_in = fread(in, 1, CHUNK, source);
 362         if (ferror(source)) {
 363             (void)inflateEnd(&amp;strm);
 364             return Z_ERRNO;
 365         }
 366         if (strm.avail_in == 0)
 367             break;
 368         strm.next_in = in;
 369 </b></pre><!-- -->
 370 The inner <tt>do</tt>-loop has the same function it did in <tt>def()</tt>, which is to
 371 keep calling <tt>inflate()</tt> until has generated all of the output it can with the
 372 provided input.
 373 <pre><b>
 374         /* run inflate() on input until output buffer not full */
 375         do {
 376 </b></pre>
 377 Just like in <tt>def()</tt>, the same output space is provided for each call of <tt>inflate()</tt>.
 378 <pre><b>
 379             strm.avail_out = CHUNK;
 380             strm.next_out = out;
 381 </b></pre>
 382 Now we run the decompression engine itself.  There is no need to adjust the flush parameter, since
 383 the <em>zlib</em> format is self-terminating. The main difference here is that there are
 384 return values that we need to pay attention to.  <tt>Z_DATA_ERROR</tt>
 385 indicates that <tt>inflate()</tt> detected an error in the <em>zlib</em> compressed data format,
 386 which means that either the data is not a <em>zlib</em> stream to begin with, or that the data was
 387 corrupted somewhere along the way since it was compressed.  The other error to be processed is
 388 <tt>Z_MEM_ERROR</tt>, which can occur since memory allocation is deferred until <tt>inflate()</tt>
 389 needs it, unlike <tt>deflate()</tt>, whose memory is allocated at the start by <tt>deflateInit()</tt>.
 390 <p>
 391 Advanced applications may use
 392 <tt>deflateSetDictionary()</tt> to prime <tt>deflate()</tt> with a set of likely data to improve the
 393 first 32K or so of compression.  This is noted in the <em>zlib</em> header, so <tt>inflate()</tt>
 394 requests that that dictionary be provided before it can start to decompress.  Without the dictionary,
 395 correct decompression is not possible.  For this routine, we have no idea what the dictionary is,
 396 so the <tt>Z_NEED_DICT</tt> indication is converted to a <tt>Z_DATA_ERROR</tt>.
 397 <p>
 398 <tt>inflate()</tt> can also return <tt>Z_STREAM_ERROR</tt>, which should not be possible here,
 399 but could be checked for as noted above for <tt>def()</tt>.  <tt>Z_BUF_ERROR</tt> does not need to be
 400 checked for here, for the same reasons noted for <tt>def()</tt>.  <tt>Z_STREAM_END</tt> will be
 401 checked for later.
 402 <pre><b>
 403             ret = inflate(&amp;strm, Z_NO_FLUSH);
 404             assert(ret != Z_STREAM_ERROR);  /* state not clobbered */
 405             switch (ret) {
 406             case Z_NEED_DICT:
 407                 ret = Z_DATA_ERROR;     /* and fall through */
 408             case Z_DATA_ERROR:
 409             case Z_MEM_ERROR:
 410                 (void)inflateEnd(&amp;strm);
 411                 return ret;
 412             }
 413 </b></pre>
 414 The output of <tt>inflate()</tt> is handled identically to that of <tt>deflate()</tt>.
 415 <pre><b>
 416             have = CHUNK - strm.avail_out;
 417             if (fwrite(out, 1, have, dest) != have || ferror(dest)) {
 418                 (void)inflateEnd(&amp;strm);
 419                 return Z_ERRNO;
 420             }
 421 </b></pre>
 422 The inner <tt>do</tt>-loop ends when <tt>inflate()</tt> has no more output as indicated
 423 by not filling the output buffer, just as for <tt>deflate()</tt>.  In this case, we cannot
 424 assert that <tt>strm.avail_in</tt> will be zero, since the deflate stream may end before the file
 425 does.
 426 <pre><b>
 427         } while (strm.avail_out == 0);
 428 </b></pre><!-- -->
 429 The outer <tt>do</tt>-loop ends when <tt>inflate()</tt> reports that it has reached the
 430 end of the input <em>zlib</em> stream, has completed the decompression and integrity
 431 check, and has provided all of the output.  This is indicated by the <tt>inflate()</tt>
 432 return value <tt>Z_STREAM_END</tt>.  The inner loop is guaranteed to leave <tt>ret</tt>
 433 equal to <tt>Z_STREAM_END</tt> if the last chunk of the input file read contained the end
 434 of the <em>zlib</em> stream.  So if the return value is not <tt>Z_STREAM_END</tt>, the
 435 loop continues to read more input.
 436 <pre><b>
 437         /* done when inflate() says it's done */
 438     } while (ret != Z_STREAM_END);
 439 </b></pre><!-- -->
 440 At this point, decompression successfully completed, or we broke out of the loop due to no
 441 more data being available from the input file.  If the last <tt>inflate()</tt> return value
 442 is not <tt>Z_STREAM_END</tt>, then the <em>zlib</em> stream was incomplete and a data error
 443 is returned.  Otherwise, we return with a happy return value.  Of course, <tt>inflateEnd()</tt>
 444 is called first to avoid a memory leak.
 445 <pre><b>
 446     /* clean up and return */
 447     (void)inflateEnd(&amp;strm);
 448     return ret == Z_STREAM_END ? Z_OK : Z_DATA_ERROR;
 449 }
 450 </b></pre><!-- -->
 451 That ends the routines that directly use <em>zlib</em>.  The following routines make this
 452 a command-line program by running data through the above routines from <tt>stdin</tt> to
 453 <tt>stdout</tt>, and handling any errors reported by <tt>def()</tt> or <tt>inf()</tt>.
 454 <p>
 455 <tt>zerr()</tt> is used to interpret the possible error codes from <tt>def()</tt>
 456 and <tt>inf()</tt>, as detailed in their comments above, and print out an error message.
 457 Note that these are only a subset of the possible return values from <tt>deflate()</tt>
 458 and <tt>inflate()</tt>.
 459 <pre><b>
 460 /* report a zlib or i/o error */
 461 void zerr(int ret)
 462 {
 463     fputs("zpipe: ", stderr);
 464     switch (ret) {
 465     case Z_ERRNO:
 466         if (ferror(stdin))
 467             fputs("error reading stdin\n", stderr);
 468         if (ferror(stdout))
 469             fputs("error writing stdout\n", stderr);
 470         break;
 471     case Z_STREAM_ERROR:
 472         fputs("invalid compression level\n", stderr);
 473         break;
 474     case Z_DATA_ERROR:
 475         fputs("invalid or incomplete deflate data\n", stderr);
 476         break;
 477     case Z_MEM_ERROR:
 478         fputs("out of memory\n", stderr);
 479         break;
 480     case Z_VERSION_ERROR:
 481         fputs("zlib version mismatch!\n", stderr);
 482     }
 483 }
 484 </b></pre><!-- -->
 485 Here is the <tt>main()</tt> routine used to test <tt>def()</tt> and <tt>inf()</tt>.  The
 486 <tt>zpipe</tt> command is simply a compression pipe from <tt>stdin</tt> to <tt>stdout</tt>, if
 487 no arguments are given, or it is a decompression pipe if <tt>zpipe -d</tt> is used.  If any other
 488 arguments are provided, no compression or decompression is performed.  Instead a usage
 489 message is displayed.  Examples are <tt>zpipe < foo.txt > foo.txt.z</tt> to compress, and
 490 <tt>zpipe -d < foo.txt.z > foo.txt</tt> to decompress.
 491 <pre><b>
 492 /* compress or decompress from stdin to stdout */
 493 int main(int argc, char **argv)
 494 {
 495     int ret;
 496
 497     /* do compression if no arguments */
 498     if (argc == 1) {
 499         ret = def(stdin, stdout, Z_DEFAULT_COMPRESSION);
 500         if (ret != Z_OK)
 501             zerr(ret);
 502         return ret;
 503     }
 504
 505     /* do decompression if -d specified */
 506     else if (argc == 2 &amp;&amp; strcmp(argv[1], "-d") == 0) {
 507         ret = inf(stdin, stdout);
 508         if (ret != Z_OK)
 509             zerr(ret);
 510         return ret;
 511     }
 512
 513     /* otherwise, report usage */
 514     else {
 515         fputs("zpipe usage: zpipe [-d] &lt; source &gt; dest\n", stderr);
 516         return 1;
 517     }
 518 }
 519 </b></pre>
 520 <hr>
 521 <i>Copyright (c) 2004 by Mark Adler<br>Last modified 13 November 2004</i>
 522 </body>
 523 </html>