A lossless compressed morning
by Danny Willems
While working on the zkVM project for Optimism with O(1) Labs, I noticed that zlib was used to decompress the memory, encoded in base64.
I opened a PR to decode the state.json that Cannon load-elf command produces,
and I was a bit lost in the different library in Rust implementing zlib
(called zeta-lib!). There
are flate2 and libflate.
Oh, wait. What is zlib? What are these libraries?
Ok. Let’s learn about compression. I’ve never been into the theory of it,
neither implement any compression algorithm.
Also, I ran into an error while trying to use flate2, and I did not know what
was happening. And I don’t like when I don’t understand.
Therefore, I woke up the next day (today…), and while I was waiting for my first
coffee, I started YouTube-ing: zlib compression
.
Ok, found Elegant Compression in Text (The LZ 77 Method) - Computerphile.
Nice introduction, better than the sum of all naturals are -1/12… Anyway.
Coffee is now ready.
Let’s get my breakfast.
Let’s watch another one while eating.
How Computers Compress Text: Huffman Coding and Huffman
Trees is the next one.
So, Huffman Coding and Huffman trees. Fascinating! Very smart about the encoding of the letters. Never saw that before. As said in the video, instead of working with single letter, let’s work with words.
Lempel-Ziv 77 algorithm is a good reading, starting with the Wikipedia article
It drives me to the DEFLATE algorithm. Ah, nice! I know where flate
comes from now. Ok.
Flate is sometimes used to call the DEFLATE algorithm.
It seems to be the basics of compression.
Ok, but what about zlib now? zlib is an implementation of the DEFLATE algorithm, and it is used in a lot of different softwares. The wikipedia article mentions a bunch of them.
What about zip, gzip, etc?
- gzip (GNU zip) is a file format, and also the related software implementing it. Good
reading is the wikipedia article. The filename extension is
gz
- zip is another file format. Good reading is the [wikipedia article](https://en.wikipedia.org/wiki/ZIP_(file_format).
After quickly reading this, came the question of archiving formats, like tar. So, tar means Tape Archive. Interesting… Sounds like something that can go on savoir-inutile.
Therefore, for a structured, in-order, first crash course on compression:
- Huffman coding
- LZ-77 & LZ-78
- DEFLATE algorithm and the corresponding RFC 1951.
- zlib, the most used implementation of DEFLATE
- File formats: gzip, zip.
- Archive formats and other compression formats.