Danny Willems -- Work In Progress

A mathematician fighting for privacy and security on the Internet, while dreaming about describing the Universe with equations and symbols.

Research Publications Public Talks Open source software contributions CV Education Blog PGP public key Recommended softwares Contact Proton calendar for cryptography and cybersecurity events
25 October 2023

A lossless compressed morning

by Danny Willems

While working on the zkVM project for Optimism with O(1) Labs, I noticed that zlib was used to decompress the memory, encoded in base64. I opened a PR to decode the state.json that Cannon load-elf command produces, and I was a bit lost in the different library in Rust implementing zlib (called zeta-lib!). There are flate2 and libflate. Oh, wait. What is zlib? What are these libraries? Ok. Let’s learn about compression. I’ve never been into the theory of it, neither implement any compression algorithm. Also, I ran into an error while trying to use flate2, and I did not know what was happening. And I don’t like when I don’t understand.

Therefore, I woke up the next day (today…), and while I was waiting for my first coffee, I started YouTube-ing: zlib compression. Ok, found Elegant Compression in Text (The LZ 77 Method) - Computerphile. Nice introduction, better than the sum of all naturals are -1/12… Anyway. Coffee is now ready. Let’s get my breakfast. Let’s watch another one while eating. How Computers Compress Text: Huffman Coding and Huffman Trees is the next one.

So, Huffman Coding and Huffman trees. Fascinating! Very smart about the encoding of the letters. Never saw that before. As said in the video, instead of working with single letter, let’s work with words.

Lempel-Ziv 77 algorithm is a good reading, starting with the Wikipedia article

It drives me to the DEFLATE algorithm. Ah, nice! I know where flate comes from now. Ok. Flate is sometimes used to call the DEFLATE algorithm.

It seems to be the basics of compression.

Ok, but what about zlib now? zlib is an implementation of the DEFLATE algorithm, and it is used in a lot of different softwares. The wikipedia article mentions a bunch of them.

What about zip, gzip, etc?

After quickly reading this, came the question of archiving formats, like tar. So, tar means Tape Archive. Interesting… Sounds like something that can go on savoir-inutile.

Therefore, for a structured, in-order, first crash course on compression:

tags: compression - lossless - lz77 - huffman coding - huffman trees