Thursday, November 24, 2011

Hashing LZ

I post this here because I’d enjoy a discussion and have the errors of my naive ways pointed out to me; I’ve already promised myself I’ll not be sucked into implementing a proof of concept! Who knows, maybe in here there’s a nugget that will inspire someone else, and that will be nice.

Imagine a hashtable-only-based history for LZ. Imagine it like this:

You pick some order for the key. In this example, I’ll pick order 3.

Think of the stream as being in multiples of this key-size, even though you’re sliding through it one char at a time.

So, for example, our compressor is mid-stream:

input:    ...I had no love of world of work, with its ...
current:                                  ^
blocks:                       ...[of ][wor][k, ][wit]... 


We look up the previous key in the table, [wor]

This will point out to a *list* of historic offsets for the [wor] prefix

We also look up where we are now, [k, ], which is another list of historic offsets

We take these two lists and slide along them, looking for a continuation:

[wor] -> hash lookup -> list of offsets -> 56, 68, 122, 157 …
[k, ] -> hash lookup -> list of offsets -> 35, 125, 220 …

In the sliding-along we spot that [wor]122 and [k, ]125 are adjacent. We know that historicly there was a [work, ] at 122 and the match was 6 long.

Here’s an augmentory idea, that might be useful elsewhere. We’re doing a *greedy* LZ coder. Look up the prefix - [of ][wor] - what is it’s history? There will like this for sure: [of ][wor][ld ]. Now we we *know* that our match is not going to be [ld ] because, if it was, we’d have previously greedily encoded our LZ! We can use the historic offsets to cull candidates because we know that we’re greedy.

Notes

  1. williamedwardscoder posted this

 ↓ click the "share" button below!