.zip file format

alanhaggai on 2008-07-06T17:57:43

My study of the .zip file format:

Magic bytes

Magic numbers are common in programs across many operating systems. Magic numbers implement strongly typed data and are a form of in-band signaling to the controlling program that reads the data type(s) at program run-time. Many files have such constants that identify the contained data. Detecting such constants in files is a simple and effective way of distinguishing between many file formats and can yield further run-time information.


That is Wikipedia's definition of magic bytes / numbers.

.zip magic bytes are: 0x50 0x45 at offset 0x00. 0x50 = P, and 0x45 = K in ASCII.

PK stands for Phil Katz, the creator of the .zip file format.

Meta Data

  • Optional archive comment
  • Optional comment per entry
  • All kinds of system-specific data ( like file attributes ) can be added per file using so called extra fields


Limitations

  • Internal offset values are 32-bits large, so only files upto 4 GB can be stored
  • No support for extended character sets in file names