From 2efa3d9ba7e0856248b871bb8d7eb62b1712af58 Mon Sep 17 00:00:00 2001
From: "dveditz%netscape.com" <dveditz%netscape.com>
Date: Tue, 5 Nov 2002 02:14:10 +0000
Subject: [PATCH] Useful documentation of the PKZIP archive format that we've
 relied on. Should have checked this in years ago.

---
 modules/libjar/appnote.txt | 1192 ++++++++++++++++++++++++++++++++++++
 1 file changed, 1192 insertions(+)
 create mode 100644 modules/libjar/appnote.txt

diff --git a/modules/libjar/appnote.txt b/modules/libjar/appnote.txt
new file mode 100644
index 00000000000..7b96643cad7
--- /dev/null
+++ b/modules/libjar/appnote.txt
@@ -0,0 +1,1192 @@
+Revised: 03/01/1999
+
+Disclaimer
+----------
+
+Although PKWARE will attempt to supply current and accurate
+information relating to its file formats, algorithms, and the
+subject programs, the possibility of error can not be eliminated.
+PKWARE therefore expressly disclaims any warranty that the
+information contained in the associated materials relating to the
+subject programs and/or the format of the files created or
+accessed by the subject programs and/or the algorithms used by
+the subject programs, or any other matter, is current, correct or
+accurate as delivered.  Any risk of damage due to any possible
+inaccurate information is assumed by the user of the information.
+Furthermore, the information relating to the subject programs
+and/or the file formats created or accessed by the subject
+programs and/or the algorithms used by the subject programs is
+subject to change without notice.
+
+General Format of a ZIP file
+----------------------------
+
+  Files stored in arbitrary order.  Large zipfiles can span multiple
+  diskette media.
+
+  Overall zipfile format:
+
+    [local file header + file data + data_descriptor] . . .
+    [central directory] end of central directory record
+
+
+  A.  Local file header:
+
+        local file header signature     4 bytes  (0x04034b50)
+        version needed to extract       2 bytes
+        general purpose bit flag        2 bytes
+        compression method              2 bytes
+        last mod file time              2 bytes
+        last mod file date              2 bytes
+        crc-32                          4 bytes
+        compressed size                 4 bytes
+        uncompressed size               4 bytes
+        filename length                 2 bytes
+        extra field length              2 bytes
+
+        filename (variable size)
+        extra field (variable size)
+
+  B.  Data descriptor:
+
+        crc-32                          4 bytes
+        compressed size                 4 bytes
+        uncompressed size               4 bytes
+
+      This descriptor exists only if bit 3 of the general
+      purpose bit flag is set (see below).  It is byte aligned
+      and immediately follows the last byte of compressed data.
+      This descriptor is used only when it was not possible to
+      seek in the output zip file, e.g., when the output zip file
+      was standard output or a non seekable device.
+
+  C.  Central directory structure:
+
+      [file header] . . .  end of central dir record
+
+      File header:
+
+        central file header signature   4 bytes  (0x02014b50)
+        version made by                 2 bytes
+        version needed to extract       2 bytes
+        general purpose bit flag        2 bytes
+        compression method              2 bytes
+        last mod file time              2 bytes
+        last mod file date              2 bytes
+        crc-32                          4 bytes
+        compressed size                 4 bytes
+        uncompressed size               4 bytes
+        filename length                 2 bytes
+        extra field length              2 bytes
+        file comment length             2 bytes
+        disk number start               2 bytes
+        internal file attributes        2 bytes
+        external file attributes        4 bytes
+        relative offset of local header 4 bytes
+
+        filename (variable size)
+        extra field (variable size)
+        file comment (variable size)
+
+      End of central dir record:
+
+        end of central dir signature    4 bytes  (0x06054b50)
+        number of this disk             2 bytes
+        number of the disk with the
+        start of the central directory  2 bytes
+        total number of entries in
+        the central dir on this disk    2 bytes
+        total number of entries in
+        the central dir                 2 bytes
+        size of the central directory   4 bytes
+        offset of start of central
+        directory with respect to
+        the starting disk number        4 bytes
+        zipfile comment length          2 bytes
+        zipfile comment (variable size)
+
+  D.  Explanation of fields:
+
+      version made by (2 bytes)
+
+          The upper byte indicates the compatibility of the file
+          attribute information.  If the external file attributes 
+          are compatible with MS-DOS and can be read by PKZIP for 
+          DOS version 2.04g then this value will be zero.  If these 
+          attributes are not compatible, then this value will 
+          identify the host system on which the attributes are 
+          compatible.  Software can use this information to determine
+          the line record format for text files etc.  The current
+          mappings are:
+
+          0 - MS-DOS and OS/2 (FAT / VFAT / FAT32 file systems)
+          1 - Amiga                     2 - VAX/VMS
+          3 - Unix                      4 - VM/CMS
+          5 - Atari ST                  6 - OS/2 H.P.F.S.
+          7 - Macintosh                 8 - Z-System
+          9 - CP/M                     10 - Windows NTFS
+         11 thru 255 - unused
+
+          The lower byte indicates the version number of the
+          software used to encode the file.  The value/10
+          indicates the major version number, and the value
+          mod 10 is the minor version number.
+
+      version needed to extract (2 bytes)
+
+          The minimum software version needed to extract the
+          file, mapped as above.
+
+      general purpose bit flag: (2 bytes)
+
+          Bit 0: If set, indicates that the file is encrypted.
+
+          (For Method 6 - Imploding)
+          Bit 1: If the compression method used was type 6,
+                 Imploding, then this bit, if set, indicates
+                 an 8K sliding dictionary was used.  If clear,
+                 then a 4K sliding dictionary was used.
+          Bit 2: If the compression method used was type 6,
+                 Imploding, then this bit, if set, indicates
+                 3 Shannon-Fano trees were used to encode the
+                 sliding dictionary output.  If clear, then 2
+                 Shannon-Fano trees were used.
+
+          (For Method 8 - Deflating)
+          Bit 2  Bit 1
+            0      0    Normal (-en) compression option was used.
+            0      1    Maximum (-ex) compression option was used.
+            1      0    Fast (-ef) compression option was used.
+            1      1    Super Fast (-es) compression option was used.
+
+          Note:  Bits 1 and 2 are undefined if the compression
+                 method is any other.
+
+          Bit 3: If this bit is set, the fields crc-32, compressed 
+                 size and uncompressed size are set to zero in the 
+                 local header.  The correct values are put in the 
+                 data descriptor immediately following the compressed
+                 data.  (Note: PKZIP version 2.04g for DOS only 
+                 recognizes this bit for method 8 compression, newer 
+                 versions of PKZIP recognize this bit for any 
+                 compression method.)
+
+          Bit 4: Reserved for use with method 8, for enhanced
+                 deflating. 
+
+          Bit 5: If this bit is set, this indicates that the file is 
+                 compressed patched data.  (Note: Requires PKZIP 
+                 version 2.70 or greater)
+
+          Bit 6: Currently unused.
+
+          Bit 7: Currently unused.
+
+          Bit 8: Currently unused.
+
+          Bit 9: Currently unused.
+
+          Bit 10: Currently unused.
+
+          Bit 11: Currently unused.
+
+          Bit 12: Reserved by PKWARE for enhanced compression.
+
+          Bit 13: Reserved by PKWARE.
+
+          Bit 14: Reserved by PKWARE.
+
+          Bit 15: Reserved by PKWARE.
+
+      compression method: (2 bytes)
+
+          (see accompanying documentation for algorithm
+          descriptions)
+
+          0 - The file is stored (no compression)
+          1 - The file is Shrunk
+          2 - The file is Reduced with compression factor 1
+          3 - The file is Reduced with compression factor 2
+          4 - The file is Reduced with compression factor 3
+          5 - The file is Reduced with compression factor 4
+          6 - The file is Imploded
+          7 - Reserved for Tokenizing compression algorithm
+          8 - The file is Deflated
+          9 - Reserved for enhanced Deflating
+         10 - PKWARE Date Compression Library Imploding
+
+      date and time fields: (2 bytes each)
+
+          The date and time are encoded in standard MS-DOS format.
+          If input came from standard input, the date and time are
+          those at which compression was started for this data.
+
+      CRC-32: (4 bytes)
+
+          The CRC-32 algorithm was generously contributed by
+          David Schwaderer and can be found in his excellent
+          book "C Programmers Guide to NetBIOS" published by
+          Howard W. Sams & Co. Inc.  The 'magic number' for
+          the CRC is 0xdebb20e3.  The proper CRC pre and post
+          conditioning is used, meaning that the CRC register
+          is pre-conditioned with all ones (a starting value
+          of 0xffffffff) and the value is post-conditioned by
+          taking the one's complement of the CRC residual.
+          If bit 3 of the general purpose flag is set, this
+          field is set to zero in the local header and the correct
+          value is put in the data descriptor and in the central
+          directory.
+
+      compressed size: (4 bytes)
+      uncompressed size: (4 bytes)
+
+          The size of the file compressed and uncompressed,
+          respectively.  If bit 3 of the general purpose bit flag
+          is set, these fields are set to zero in the local header
+          and the correct values are put in the data descriptor and
+          in the central directory.
+
+      filename length: (2 bytes)
+      extra field length: (2 bytes)
+      file comment length: (2 bytes)
+
+          The length of the filename, extra field, and comment
+          fields respectively.  The combined length of any
+          directory record and these three fields should not
+          generally exceed 65,535 bytes.  If input came from standard
+          input, the filename length is set to zero.
+
+      disk number start: (2 bytes)
+
+          The number of the disk on which this file begins.
+
+      internal file attributes: (2 bytes)
+
+          The lowest bit of this field indicates, if set, that
+          the file is apparently an ASCII or text file.  If not
+          set, that the file apparently contains binary data.
+          The remaining bits are unused in version 1.0.
+
+          Bits 1 and 2 are reserved for use by PKWARE.
+
+      external file attributes: (4 bytes)
+
+          The mapping of the external attributes is
+          host-system dependent (see 'version made by').  For
+          MS-DOS, the low order byte is the MS-DOS directory
+          attribute byte.  If input came from standard input, this
+          field is set to zero.
+
+      relative offset of local header: (4 bytes)
+
+          This is the offset from the start of the first disk on
+          which this file appears, to where the local header should
+          be found.
+
+      filename: (Variable)
+
+          The name of the file, with optional relative path.
+          The path stored should not contain a drive or
+          device letter, or a leading slash.  All slashes
+          should be forward slashes '/' as opposed to
+          backwards slashes '\' for compatibility with Amiga
+          and Unix file systems etc.  If input came from standard
+          input, there is no filename field.
+
+      extra field: (Variable)
+
+          This is for future expansion.  If additional information
+          needs to be stored in the future, it should be stored
+          here.  Earlier versions of the software can then safely
+          skip this file, and find the next file or header.  This
+          field will be 0 length in version 1.0.
+
+          In order to allow different programs and different types
+          of information to be stored in the 'extra' field in .ZIP
+          files, the following structure should be used for all
+          programs storing data in this field:
+
+          header1+data1 + header2+data2 . . .
+
+          Each header should consist of:
+
+            Header ID - 2 bytes
+            Data Size - 2 bytes
+
+          Note: all fields stored in Intel low-byte/high-byte order.
+
+          The Header ID field indicates the type of data that is in
+          the following data block.
+
+          Header ID's of 0 thru 31 are reserved for use by PKWARE.
+          The remaining ID's can be used by third party vendors for
+          proprietary usage.
+
+          The current Header ID mappings defined by PKWARE are:
+
+          0x0007        AV Info
+          0x0009        OS/2
+          0x000a        NTFS 
+          0x000c        VAX/VMS
+          0x000d        Unix
+          0x000f        Patch Descriptor
+
+          Several third party mappings commonly used are:
+
+          0x4b46        FWKCS MD5 (see below)
+          0x07c8        Macintosh
+          0x4341        Acorn/SparkFS 
+          0x4453        Windows NT security descriptor (binary ACL)
+          0x4704        VM/CMS
+          0x470f        MVS
+          0x4c41        OS/2 access control list (text ACL)
+          0x4d49        Info-ZIP VMS (VAX or Alpha)
+          0x5455        extended timestamp
+          0x5855        Info-ZIP Unix (original, also OS/2, NT, etc)
+          0x6542        BeOS/BeBox
+          0x756e        ASi Unix
+          0x7855        Info-ZIP Unix (new)
+          0xfd4a        SMS/QDOS
+
+          The Data Size field indicates the size of the following
+          data block. Programs can use this value to skip to the
+          next header block, passing over any data blocks that are
+          not of interest.
+
+          Note: As stated above, the size of the entire .ZIP file
+                header, including the filename, comment, and extra
+                field should not exceed 64K in size.
+
+          In case two different programs should appropriate the same
+          Header ID value, it is strongly recommended that each
+          program place a unique signature of at least two bytes in
+          size (and preferably 4 bytes or bigger) at the start of
+          each data area.  Every program should verify that its
+          unique signature is present, in addition to the Header ID
+          value being correct, before assuming that it is a block of
+          known type.
+
+         -OS/2 Extra Field:
+
+          The following is the layout of the OS/2 attributes "extra" 
+          block.  (Last Revision  09/05/95)
+
+          Note: all fields stored in Intel low-byte/high-byte order.
+
+          Value       Size          Description
+          -----       ----          -----------
+  (OS/2)  0x0009      2 bytes       Tag for this "extra" block type
+          TSize       2 bytes       Size for the following data block
+          BSize       4 bytes       Uncompressed Block Size
+          CType       2 bytes       Compression type
+          EACRC       4 bytes       CRC value for uncompress block
+          (var)       variable      Compressed block
+
+        The OS/2 extended attribute structure (FEA2LIST) is 
+        compressed and then stored in it's entirety within this 
+        structure.  There will only ever be one "block" of data in 
+        VarFields[].
+
+         -UNIX Extra Field:
+
+          The following is the layout of the Unix "extra" block.
+          Note: all fields are stored in Intel low-byte/high-byte 
+          order.
+
+          Value       Size          Description
+          -----       ----          -----------
+  (UNIX)  0x000d      2 bytes       Tag for this "extra" block type
+          TSize       2 bytes       Size for the following data block
+          Atime       4 bytes       File last access time
+          Mtime       4 bytes       File last modification time
+          Uid         2 bytes       File user ID
+          Gid         2 bytes       File group ID
+          (var)       variable      Variable length data field
+
+          The variable length data field will contain file type 
+          specific data.  Currently the only values allowed are
+          the original "linked to" file names for hard or symbolic 
+          links.
+
+         -VAX/VMS Extra Field:
+
+          The following is the layout of the VAX/VMS attributes 
+          "extra" block.
+
+          Note: all fields stored in Intel low-byte/high-byte order.
+
+          Value      Size       Description
+          -----      ----       -----------
+  (VMS)   0x000c     2 bytes    Tag for this "extra" block type
+          TSize      2 bytes    Size of the total "extra" block
+          CRC        4 bytes    32-bit CRC for remainder of the block
+          Tag1       2 bytes    VMS attribute tag value #1
+          Size1      2 bytes    Size of attribute #1, in bytes
+          (var.)     Size1      Attribute #1 data
+          .
+          .
+          .
+          TagN       2 bytes    VMS attribute tage value #N
+          SizeN      2 bytes    Size of attribute #N, in bytes
+          (var.)     SizeN      Attribute #N data
+
+          Rules:
+
+          1. There will be one or more of attributes present, which 
+             will each be preceded by the above TagX & SizeX values.  
+             These values are identical to the ATR$C_XXXX and 
+             ATR$S_XXXX constants which are defined in ATR.H under 
+             VMS C.  Neither of these values will ever be zero.
+
+          2. No word alignment or padding is performed.
+
+          3. A well-behaved PKZIP/VMS program should never produce
+             more than one sub-block with the same TagX value.  Also,
+             there will never be more than one "extra" block of type
+             0x000c in a particular directory record.
+
+         -NTFS Extra Field:
+
+          The following is the layout of the NTFS attributes 
+          "extra" block.
+
+          Note: all fields stored in Intel low-byte/high-byte order.
+
+          Value      Size       Description
+          -----      ----       -----------
+  (NTFS)  0x000a     2 bytes    Tag for this "extra" block type
+          TSize      2 bytes    Size of the total "extra" block
+          Reserved   4 bytes    Reserved for future use
+          Tag1       2 bytes    NTFS attribute tag value #1
+          Size1      2 bytes    Size of attribute #1, in bytes
+          (var.)     Size1      Attribute #1 data
+          .
+          .
+          .
+          TagN       2 bytes    NTFS attribute tage value #N
+          SizeN      2 bytes    Size of attribute #N, in bytes
+          (var.)     SizeN      Attribute #N data
+
+          For NTFS, values for Tag1 through TagN are as follows:
+          (currently only one set of attributes is defined for NTFS)
+
+          Tag        Size       Description
+          -----      ----       -----------
+          0x0001     2 bytes    Tag for attribute #1 
+          Size1      2 bytes    Size of attribute #1, in bytes
+          Mtime      8 bytes    File last modification time
+          Atime      8 bytes    File last access time
+          Ctime      8 bytes    File creation time
+          
+         -PATCH Descriptor Extra Field:
+
+          The following is the layout of the Patch Descriptor "extra"
+          block.
+
+          Note: all fields stored in Intel low-byte/high-byte order.
+
+          Value     Size     Description
+          -----     ----     -----------
+  (Patch) 0x000f    2 bytes  Tag for this "extra" block type
+          TSize     2 bytes  Size of the total "extra" block
+          Version   2 bytes  Version of the descriptor
+          Flags     4 bytes  Actions and reactions (see below) 
+          OldSize   4 bytes  Size of the file about to be patched 
+          OldCRC    4 bytes  32-bit CRC of the file to be patched 
+          NewSize   4 bytes  Size of the resulting file 
+          NewCRC    4 bytes  32-bit CRC of the resulting file 
+
+          Actions and reactions
+
+          Bits          Description
+          ----          ----------------
+          0             Use for autodetection
+          1             Treat as selfpatch
+          2-3           RESERVED
+          4-5           Action (see below)
+          6-7           RESERVED
+          8-9           Reaction (see below) to absent file 
+          10-11         Reaction (see below) to newer file
+          12-13         Reaction (see below) to unknown file
+          14-15         RESERVED
+          16-31         RESERVED
+
+          Actions
+
+          Action       Value
+          ------       ----- 
+          none         0
+          add          1
+          delete       2
+          patch        3
+
+          Reactions
+ 
+          Reaction     Value
+          --------     -----
+          ask          0
+          skip         1
+          ignore       2
+          fail         3
+
+          - FWKCS MD5 Extra Field:
+
+          The FWKCS Contents_Signature System, used in
+          automatically identifying files independent of filename,
+          optionally adds and uses an extra field to support the
+          rapid creation of an enhanced contents_signature:
+
+              Header ID = 0x4b46
+              Data Size = 0x0013
+              Preface   = 'M','D','5'
+              followed by 16 bytes containing the uncompressed file's
+              128_bit MD5 hash(1), low byte first.
+
+          When FWKCS revises a zipfile central directory to add
+          this extra field for a file, it also replaces the
+          central directory entry for that file's uncompressed
+          filelength with a measured value.
+
+          FWKCS provides an option to strip this extra field, if
+          present, from a zipfile central directory. In adding
+          this extra field, FWKCS preserves Zipfile Authenticity
+          Verification; if stripping this extra field, FWKCS
+          preserves all versions of AV through PKZIP version 2.04g.
+
+          FWKCS, and FWKCS Contents_Signature System, are
+          trademarks of Frederick W. Kantor.
+
+          (1) R. Rivest, RFC1321.TXT, MIT Laboratory for Computer
+              Science and RSA Data Security, Inc., April 1992.
+              ll.76-77: "The MD5 algorithm is being placed in the
+              public domain for review and possible adoption as a
+              standard."
+
+      file comment: (Variable)
+
+          The comment for this file.
+
+      number of this disk: (2 bytes)
+
+          The number of this disk, which contains central
+          directory end record.
+
+      number of the disk with the start of the central
+      directory: (2 bytes)
+
+          The number of the disk on which the central
+          directory starts.
+
+      total number of entries in the central dir on 
+      this disk: (2 bytes)
+
+          The number of central directory entries on this disk.
+
+      total number of entries in the central dir: (2 bytes)
+
+          The total number of files in the zipfile.
+
+      size of the central directory: (4 bytes)
+
+          The size (in bytes) of the entire central directory.
+
+      offset of start of central directory with respect to
+      the starting disk number:  (4 bytes)
+
+          Offset of the start of the central directory on the
+          disk on which the central directory starts.
+
+      zipfile comment length: (2 bytes)
+
+          The length of the comment for this zipfile.
+
+      zipfile comment: (Variable)
+
+          The comment for this zipfile.
+
+  D.  General notes:
+
+      1)  All fields unless otherwise noted are unsigned and stored
+          in Intel low-byte:high-byte, low-word:high-word order.
+
+      2)  String fields are not null terminated, since the
+          length is given explicitly.
+
+      3)  Local headers should not span disk boundaries.  Also, even
+          though the central directory can span disk boundaries, no
+          single record in the central directory should be split
+          across disks.
+
+      4)  The entries in the central directory may not necessarily
+          be in the same order that files appear in the zipfile.
+
+UnShrinking - Method 1
+----------------------
+
+Shrinking is a Dynamic Ziv-Lempel-Welch compression algorithm
+with partial clearing.  The initial code size is 9 bits, and
+the maximum code size is 13 bits.  Shrinking differs from
+conventional Dynamic Ziv-Lempel-Welch implementations in several
+respects:
+
+1)  The code size is controlled by the compressor, and is not
+    automatically increased when codes larger than the current
+    code size are created (but not necessarily used).  When
+    the decompressor encounters the code sequence 256
+    (decimal) followed by 1, it should increase the code size
+    read from the input stream to the next bit size.  No
+    blocking of the codes is performed, so the next code at
+    the increased size should be read from the input stream
+    immediately after where the previous code at the smaller
+    bit size was read.  Again, the decompressor should not
+    increase the code size used until the sequence 256,1 is
+    encountered.
+
+2)  When the table becomes full, total clearing is not
+    performed.  Rather, when the compressor emits the code
+    sequence 256,2 (decimal), the decompressor should clear
+    all leaf nodes from the Ziv-Lempel tree, and continue to
+    use the current code size.  The nodes that are cleared
+    from the Ziv-Lempel tree are then re-used, with the lowest
+    code value re-used first, and the highest code value
+    re-used last.  The compressor can emit the sequence 256,2
+    at any time.
+
+Expanding - Methods 2-5
+-----------------------
+
+The Reducing algorithm is actually a combination of two
+distinct algorithms.  The first algorithm compresses repeated
+byte sequences, and the second algorithm takes the compressed
+stream from the first algorithm and applies a probabilistic
+compression method.
+
+The probabilistic compression stores an array of 'follower
+sets' S(j), for j=0 to 255, corresponding to each possible
+ASCII character.  Each set contains between 0 and 32
+characters, to be denoted as S(j)[0],...,S(j)[m], where m<32.
+The sets are stored at the beginning of the data area for a
+Reduced file, in reverse order, with S(255) first, and S(0)
+last.
+
+The sets are encoded as { N(j), S(j)[0],...,S(j)[N(j)-1] },
+where N(j) is the size of set S(j).  N(j) can be 0, in which
+case the follower set for S(j) is empty.  Each N(j) value is
+encoded in 6 bits, followed by N(j) eight bit character values
+corresponding to S(j)[0] to S(j)[N(j)-1] respectively.  If
+N(j) is 0, then no values for S(j) are stored, and the value
+for N(j-1) immediately follows.
+
+Immediately after the follower sets, is the compressed data
+stream.  The compressed data stream can be interpreted for the
+probabilistic decompression as follows:
+
+let Last-Character <- 0.
+loop until done
+    if the follower set S(Last-Character) is empty then
+        read 8 bits from the input stream, and copy this
+        value to the output stream.
+    otherwise if the follower set S(Last-Character) is non-empty then
+        read 1 bit from the input stream.
+        if this bit is not zero then
+            read 8 bits from the input stream, and copy this
+            value to the output stream.
+        otherwise if this bit is zero then
+            read B(N(Last-Character)) bits from the input
+            stream, and assign this value to I.
+            Copy the value of S(Last-Character)[I] to the
+            output stream.
+
+    assign the last value placed on the output stream to
+    Last-Character.
+end loop
+
+B(N(j)) is defined as the minimal number of bits required to
+encode the value N(j)-1.
+
+The decompressed stream from above can then be expanded to
+re-create the original file as follows:
+
+let State <- 0.
+
+loop until done
+    read 8 bits from the input stream into C.
+    case State of
+        0:  if C is not equal to DLE (144 decimal) then
+                copy C to the output stream.
+            otherwise if C is equal to DLE then
+                let State <- 1.
+
+        1:  if C is non-zero then
+                let V <- C.
+                let Len <- L(V)
+                let State <- F(Len).
+            otherwise if C is zero then
+                copy the value 144 (decimal) to the output stream.
+                let State <- 0
+
+        2:  let Len <- Len + C
+            let State <- 3.
+
+        3:  move backwards D(V,C) bytes in the output stream
+            (if this position is before the start of the output
+            stream, then assume that all the data before the
+            start of the output stream is filled with zeros).
+            copy Len+3 bytes from this position to the output stream.
+            let State <- 0.
+    end case
+end loop
+
+The functions F,L, and D are dependent on the 'compression
+factor', 1 through 4, and are defined as follows:
+
+For compression factor 1:
+    L(X) equals the lower 7 bits of X.
+    F(X) equals 2 if X equals 127 otherwise F(X) equals 3.
+    D(X,Y) equals the (upper 1 bit of X) * 256 + Y + 1.
+For compression factor 2:
+    L(X) equals the lower 6 bits of X.
+    F(X) equals 2 if X equals 63 otherwise F(X) equals 3.
+    D(X,Y) equals the (upper 2 bits of X) * 256 + Y + 1.
+For compression factor 3:
+    L(X) equals the lower 5 bits of X.
+    F(X) equals 2 if X equals 31 otherwise F(X) equals 3.
+    D(X,Y) equals the (upper 3 bits of X) * 256 + Y + 1.
+For compression factor 4:
+    L(X) equals the lower 4 bits of X.
+    F(X) equals 2 if X equals 15 otherwise F(X) equals 3.
+    D(X,Y) equals the (upper 4 bits of X) * 256 + Y + 1.
+
+Imploding - Method 6
+--------------------
+
+The Imploding algorithm is actually a combination of two distinct
+algorithms.  The first algorithm compresses repeated byte
+sequences using a sliding dictionary.  The second algorithm is
+used to compress the encoding of the sliding dictionary output,
+using multiple Shannon-Fano trees.
+
+The Imploding algorithm can use a 4K or 8K sliding dictionary
+size. The dictionary size used can be determined by bit 1 in the
+general purpose flag word; a 0 bit indicates a 4K dictionary
+while a 1 bit indicates an 8K dictionary.
+
+The Shannon-Fano trees are stored at the start of the compressed
+file. The number of trees stored is defined by bit 2 in the
+general purpose flag word; a 0 bit indicates two trees stored, a
+1 bit indicates three trees are stored.  If 3 trees are stored,
+the first Shannon-Fano tree represents the encoding of the
+Literal characters, the second tree represents the encoding of
+the Length information, the third represents the encoding of the
+Distance information.  When 2 Shannon-Fano trees are stored, the
+Length tree is stored first, followed by the Distance tree.
+
+The Literal Shannon-Fano tree, if present is used to represent
+the entire ASCII character set, and contains 256 values.  This
+tree is used to compress any data not compressed by the sliding
+dictionary algorithm.  When this tree is present, the Minimum
+Match Length for the sliding dictionary is 3.  If this tree is
+not present, the Minimum Match Length is 2.
+
+The Length Shannon-Fano tree is used to compress the Length part
+of the (length,distance) pairs from the sliding dictionary
+output.  The Length tree contains 64 values, ranging from the
+Minimum Match Length, to 63 plus the Minimum Match Length.
+
+The Distance Shannon-Fano tree is used to compress the Distance
+part of the (length,distance) pairs from the sliding dictionary
+output. The Distance tree contains 64 values, ranging from 0 to
+63, representing the upper 6 bits of the distance value.  The
+distance values themselves will be between 0 and the sliding
+dictionary size, either 4K or 8K.
+
+The Shannon-Fano trees themselves are stored in a compressed
+format. The first byte of the tree data represents the number of
+bytes of data representing the (compressed) Shannon-Fano tree
+minus 1.  The remaining bytes represent the Shannon-Fano tree
+data encoded as:
+
+    High 4 bits: Number of values at this bit length + 1. (1 - 16)
+    Low  4 bits: Bit Length needed to represent value + 1. (1 - 16)
+
+The Shannon-Fano codes can be constructed from the bit lengths
+using the following algorithm:
+
+1)  Sort the Bit Lengths in ascending order, while retaining the
+    order of the original lengths stored in the file.
+
+2)  Generate the Shannon-Fano trees:
+
+    Code <- 0
+    CodeIncrement <- 0
+    LastBitLength <- 0
+    i <- number of Shannon-Fano codes - 1   (either 255 or 63)
+
+    loop while i >= 0
+        Code = Code + CodeIncrement
+        if BitLength(i) <> LastBitLength then
+            LastBitLength=BitLength(i)
+            CodeIncrement = 1 shifted left (16 - LastBitLength)
+        ShannonCode(i) = Code
+        i <- i - 1
+    end loop
+
+3)  Reverse the order of all the bits in the above ShannonCode()
+    vector, so that the most significant bit becomes the least
+    significant bit.  For example, the value 0x1234 (hex) would
+    become 0x2C48 (hex).
+
+4)  Restore the order of Shannon-Fano codes as originally stored
+    within the file.
+
+Example:
+
+    This example will show the encoding of a Shannon-Fano tree
+    of size 8.  Notice that the actual Shannon-Fano trees used
+    for Imploding are either 64 or 256 entries in size.
+
+Example:   0x02, 0x42, 0x01, 0x13
+
+    The first byte indicates 3 values in this table.  Decoding the
+    bytes:
+            0x42 = 5 codes of 3 bits long
+            0x01 = 1 code  of 2 bits long
+            0x13 = 2 codes of 4 bits long
+
+    This would generate the original bit length array of:
+    (3, 3, 3, 3, 3, 2, 4, 4)
+
+    There are 8 codes in this table for the values 0 thru 7.  Using 
+    the algorithm to obtain the Shannon-Fano codes produces:
+
+                                  Reversed     Order     Original
+Val  Sorted   Constructed Code      Value     Restored    Length
+---  ------   -----------------   --------    --------    ------
+0:     2      1100000000000000        11       101          3
+1:     3      1010000000000000       101       001          3
+2:     3      1000000000000000       001       110          3
+3:     3      0110000000000000       110       010          3
+4:     3      0100000000000000       010       100          3
+5:     3      0010000000000000       100        11          2
+6:     4      0001000000000000      1000      1000          4
+7:     4      0000000000000000      0000      0000          4
+
+The values in the Val, Order Restored and Original Length columns
+now represent the Shannon-Fano encoding tree that can be used for
+decoding the Shannon-Fano encoded data.  How to parse the
+variable length Shannon-Fano values from the data stream is beyond
+the scope of this document.  (See the references listed at the end of
+this document for more information.)  However, traditional decoding
+schemes used for Huffman variable length decoding, such as the
+Greenlaw algorithm, can be successfully applied.
+
+The compressed data stream begins immediately after the
+compressed Shannon-Fano data.  The compressed data stream can be
+interpreted as follows:
+
+loop until done
+    read 1 bit from input stream.
+
+    if this bit is non-zero then       (encoded data is literal data)
+        if Literal Shannon-Fano tree is present
+            read and decode character using Literal Shannon-Fano tree.
+        otherwise
+            read 8 bits from input stream.
+        copy character to the output stream.
+    otherwise              (encoded data is sliding dictionary match)
+        if 8K dictionary size
+            read 7 bits for offset Distance (lower 7 bits of offset).
+        otherwise
+            read 6 bits for offset Distance (lower 6 bits of offset).
+
+        using the Distance Shannon-Fano tree, read and decode the
+          upper 6 bits of the Distance value.
+
+        using the Length Shannon-Fano tree, read and decode
+          the Length value.
+
+        Length <- Length + Minimum Match Length
+
+        if Length = 63 + Minimum Match Length
+            read 8 bits from the input stream,
+            add this value to Length.
+
+        move backwards Distance+1 bytes in the output stream, and
+        copy Length characters from this position to the output
+        stream.  (if this position is before the start of the output
+        stream, then assume that all the data before the start of
+        the output stream is filled with zeros).
+end loop
+
+Tokenizing - Method 7
+--------------------
+
+This method is not used by PKZIP.
+
+Deflating - Method 8
+-----------------
+
+The Deflate algorithm is similar to the Implode algorithm using
+a sliding dictionary of up to 32K with secondary compression
+from Huffman/Shannon-Fano codes.
+
+The compressed data is stored in blocks with a header describing
+the block and the Huffman codes used in the data block.  The header
+format is as follows:
+
+   Bit 0: Last Block bit     This bit is set to 1 if this is the last
+                             compressed block in the data.
+   Bits 1-2: Block type
+      00 (0) - Block is stored - All stored data is byte aligned.
+               Skip bits until next byte, then next word = block 
+               length, followed by the ones compliment of the block
+               length word. Remaining data in block is the stored 
+               data.
+
+      01 (1) - Use fixed Huffman codes for literal and distance codes.
+               Lit Code    Bits             Dist Code   Bits
+               ---------   ----             ---------   ----
+                 0 - 143    8                 0 - 31      5
+               144 - 255    9
+               256 - 279    7
+               280 - 287    8
+
+               Literal codes 286-287 and distance codes 30-31 are 
+               never used but participate in the huffman construction.
+
+      10 (2) - Dynamic Huffman codes.  (See expanding Huffman codes)
+
+      11 (3) - Reserved - Flag a "Error in compressed data" if seen.
+
+Expanding Huffman Codes
+-----------------------
+If the data block is stored with dynamic Huffman codes, the Huffman
+codes are sent in the following compressed format:
+
+   5 Bits: # of Literal codes sent - 256 (256 - 286)
+           All other codes are never sent.
+   5 Bits: # of Dist codes - 1           (1 - 32)
+   4 Bits: # of Bit Length codes - 3     (3 - 19)
+
+The Huffman codes are sent as bit lengths and the codes are built as
+described in the implode algorithm.  The bit lengths themselves are
+compressed with Huffman codes.  There are 19 bit length codes:
+
+   0 - 15: Represent bit lengths of 0 - 15
+       16: Copy the previous bit length 3 - 6 times.
+           The next 2 bits indicate repeat length (0 = 3, ... ,3 = 6)
+              Example:  Codes 8, 16 (+2 bits 11), 16 (+2 bits 10) will
+                        expand to 12 bit lengths of 8 (1 + 6 + 5)
+       17: Repeat a bit length of 0 for 3 - 10 times. (3 bits of length)
+       18: Repeat a bit length of 0 for 11 - 138 times (7 bits of length)
+
+The lengths of the bit length codes are sent packed 3 bits per value
+(0 - 7) in the following order:
+
+   16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15
+
+The Huffman codes should be built as described in the Implode algorithm
+except codes are assigned starting at the shortest bit length, i.e. the
+shortest code should be all 0's rather than all 1's.  Also, codes with
+a bit length of zero do not participate in the tree construction.  The
+codes are then used to decode the bit lengths for the literal and 
+distance tables.
+
+The bit lengths for the literal tables are sent first with the number
+of entries sent described by the 5 bits sent earlier.  There are up
+to 286 literal characters; the first 256 represent the respective 8
+bit character, code 256 represents the End-Of-Block code, the remaining
+29 codes represent copy lengths of 3 thru 258.  There are up to 30
+distance codes representing distances from 1 thru 32k as described
+below.
+
+                             Length Codes
+                             ------------
+      Extra             Extra              Extra              Extra
+ Code Bits Length  Code Bits Lengths  Code Bits Lengths  Code Bits Length(s)
+ ---- ---- ------  ---- ---- -------  ---- ---- -------  ---- ---- ---------
+  257   0     3     265   1   11,12    273   3   35-42    281   5  131-162
+  258   0     4     266   1   13,14    274   3   43-50    282   5  163-194
+  259   0     5     267   1   15,16    275   3   51-58    283   5  195-226
+  260   0     6     268   1   17,18    276   3   59-66    284   5  227-257
+  261   0     7     269   2   19-22    277   4   67-82    285   0    258
+  262   0     8     270   2   23-26    278   4   83-98
+  263   0     9     271   2   27-30    279   4   99-114
+  264   0    10     272   2   31-34    280   4  115-130
+
+                            Distance Codes
+                            --------------
+      Extra           Extra             Extra               Extra
+ Code Bits Dist  Code Bits  Dist   Code Bits Distance  Code Bits Distance
+ ---- ---- ----  ---- ---- ------  ---- ---- --------  ---- ---- --------
+   0   0    1      8   3   17-24    16    7  257-384    24   11  4097-6144
+   1   0    2      9   3   25-32    17    7  385-512    25   11  6145-8192
+   2   0    3     10   4   33-48    18    8  513-768    26   12  8193-12288
+   3   0    4     11   4   49-64    19    8  769-1024   27   12 12289-16384
+   4   1   5,6    12   5   65-96    20    9 1025-1536   28   13 16385-24576
+   5   1   7,8    13   5   97-128   21    9 1537-2048   29   13 24577-32768
+   6   2   9-12   14   6  129-192   22   10 2049-3072
+   7   2  13-16   15   6  193-256   23   10 3073-4096
+
+The compressed data stream begins immediately after the
+compressed header data.  The compressed data stream can be
+interpreted as follows:
+
+do
+   read header from input stream.
+
+   if stored block
+      skip bits until byte aligned
+      read count and 1's compliment of count
+      copy count bytes data block
+   otherwise
+      loop until end of block code sent
+         decode literal character from input stream
+         if literal < 256
+            copy character to the output stream
+         otherwise
+            if literal = end of block
+               break from loop
+            otherwise
+               decode distance from input stream
+
+               move backwards distance bytes in the output stream, and
+               copy length characters from this position to the output
+               stream.
+      end loop
+while not last block
+
+if data descriptor exists
+   skip bits until byte aligned
+   read crc and sizes
+endif
+
+Decryption
+----------
+
+The encryption used in PKZIP was generously supplied by Roger
+Schlafly.  PKWARE is grateful to Mr. Schlafly for his expert
+help and advice in the field of data encryption.
+
+PKZIP encrypts the compressed data stream.  Encrypted files must
+be decrypted before they can be extracted.
+
+Each encrypted file has an extra 12 bytes stored at the start of
+the data area defining the encryption header for that file.  The
+encryption header is originally set to random values, and then
+itself encrypted, using three, 32-bit keys.  The key values are
+initialized using the supplied encryption password.  After each byte
+is encrypted, the keys are then updated using pseudo-random number
+generation techniques in combination with the same CRC-32 algorithm
+used in PKZIP and described elsewhere in this document.
+
+The following is the basic steps required to decrypt a file:
+
+1) Initialize the three 32-bit keys with the password.
+2) Read and decrypt the 12-byte encryption header, further
+   initializing the encryption keys.
+3) Read and decrypt the compressed data stream using the
+   encryption keys.
+
+Step 1 - Initializing the encryption keys
+-----------------------------------------
+
+Key(0) <- 305419896
+Key(1) <- 591751049
+Key(2) <- 878082192
+
+loop for i <- 0 to length(password)-1
+    update_keys(password(i))
+end loop
+
+Where update_keys() is defined as:
+
+update_keys(char):
+  Key(0) <- crc32(key(0),char)
+  Key(1) <- Key(1) + (Key(0) & 000000ffH)
+  Key(1) <- Key(1) * 134775813 + 1
+  Key(2) <- crc32(key(2),key(1) >> 24)
+end update_keys
+
+Where crc32(old_crc,char) is a routine that given a CRC value and a
+character, returns an updated CRC value after applying the CRC-32
+algorithm described elsewhere in this document.
+
+Step 2 - Decrypting the encryption header
+-----------------------------------------
+
+The purpose of this step is to further initialize the encryption
+keys, based on random data, to render a plaintext attack on the
+data ineffective.
+
+Read the 12-byte encryption header into Buffer, in locations
+Buffer(0) thru Buffer(11).
+
+loop for i <- 0 to 11
+    C <- buffer(i) ^ decrypt_byte()
+    update_keys(C)
+    buffer(i) <- C
+end loop
+
+Where decrypt_byte() is defined as:
+
+unsigned char decrypt_byte()
+    local unsigned short temp
+    temp <- Key(2) | 2
+    decrypt_byte <- (temp * (temp ^ 1)) >> 8
+end decrypt_byte
+
+After the header is decrypted,  the last 1 or 2 bytes in Buffer
+should be the high-order word/byte of the CRC for the file being
+decrypted, stored in Intel low-byte/high-byte order.  Versions of
+PKZIP prior to 2.0 used a 2 byte CRC check; a 1 byte CRC check is
+used on versions after 2.0.  This can be used to test if the password
+supplied is correct or not.
+
+Step 3 - Decrypting the compressed data stream
+----------------------------------------------
+
+The compressed data stream can be decrypted as follows:
+
+loop until done
+    read a character into C
+    Temp <- C ^ decrypt_byte()
+    update_keys(temp)
+    output Temp
+end loop
+
+In addition to the above mentioned contributors to PKZIP and PKUNZIP,
+I would like to extend special thanks to Robert Mahoney for suggesting
+the extension .ZIP for this software.
+
+References:
+
+    Fiala, Edward R., and Greene, Daniel H., "Data compression with
+       finite windows",  Communications of the ACM, Volume 32, Number 4,
+       April 1989, pages 490-505.
+
+    Held, Gilbert, "Data Compression, Techniques and Applications,
+       Hardware and Software Considerations", John Wiley & Sons, 1987.
+
+    Huffman, D.A., "A method for the construction of minimum-redundancy
+       codes", Proceedings of the IRE, Volume 40, Number 9, September 1952,
+       pages 1098-1101.
+
+    Nelson, Mark, "LZW Data Compression", Dr. Dobbs Journal, Volume 14,
+       Number 10, October 1989, pages 29-37.
+
+    Nelson, Mark, "The Data Compression Book",  M&T Books, 1991.
+
+    Storer, James A., "Data Compression, Methods and Theory",
+       Computer Science Press, 1988
+
+    Welch, Terry, "A Technique for High-Performance Data Compression",
+       IEEE Computer, Volume 17, Number 6, June 1984, pages 8-19.
+
+    Ziv, J. and Lempel, A., "A universal algorithm for sequential data
+       compression", Communications of the ACM, Volume 30, Number 6,
+       June 1987, pages 520-540.
+
+    Ziv, J. and Lempel, A., "Compression of individual sequences via
+       variable-rate coding", IEEE Transactions on Information Theory,
+       Volume 24, Number 5, September 1978, pages 530-536.