My favourite small hash table
Posted by speckx 16 hours ago
Comments
Comment by Aardwolf 14 hours ago
This constraint allows making a linear array of all the 4 billion values, with the key as array index, which fits in 16 GiB. Another 500 MiB is enough to have a bit indicating present or not for each.
Perhaps text strings as keys and values would give a more interesting example...
Comment by re 9 hours ago
The hash table has the significant advantage of having a much smaller minimum size.
> Perhaps text strings as keys and values would give a more interesting example
Keep reading to "If keys and values are larger than 32 bits"
Comment by kazinator 6 hours ago
If you actually only have a handful of entries in the table, it is measurable in bytes.
A linear array of all 4 billion possible values will occupy 16 GiB (of virtual memory) upfront. We have then essentially replaced the hash table with a radix tree --- that made up of the page directories and tables of the VM system. If only the highest and lowest value are present in the table, then we only need the highest and lowest page (4 kB or whatever) to be mapped. It's not very compact for small sets; storing N numbers in random locations could require as many as N virtual memory pages to be committed.
Comment by dragontamer 10 hours ago
Comment by unixhero 54 minutes ago
Comment by Panzerschrek 1 hour ago
Comment by artur44 14 hours ago
It’s also a good reminder that clarity of layout often beats more “clever” designs, especially when the dataset fits comfortably in memory.
Comment by hinkley 11 hours ago
My best documented case was a 10x speed up from removing a double lookup that was killing caches.
Comment by crest 9 hours ago
Comment by hinkley 8 hours ago
If you would have asked me to bet on which one would have had the bigger impact I would have split the difference.
My second favorite was similar. Two functions making a call instead of sharing the answer. Profiler said 10% cumulative. I removed half. Instead of 5% I got 20%. Which just demonstrates how much data a profiler cannot show you.
Comment by throw-the-towel 8 hours ago
Comment by hinkley 8 hours ago
Profilers lie and some more than most. I’ve gotten 3x from code with a perfectly rectangular profile output.
Part of it comes down to a trick game devs steal from finance: give each task a budget and keep it to the budget even if it’s not the tall tent pole.
You should not spend 10% of your response time on telemetry and logging combined. Yet I pulled 10% TTFB out of just the logging and telemetry code on a project. It was a frog boiling situation. Every new epic used the new code and determining the cumulative cost wasn’t easy.
Comment by jasonwatkinspdx 6 hours ago
You can use more elaborate probe and relocation schemes, but just choosing the less full bucket and resizing if both choices are full gets you surprisingly far.
Comment by kccqzy 6 hours ago
Comment by saltcured 11 hours ago
I've nearly always had a variable length string or other complex structure that was being hashed, not their handles.
Back in my early career in C, this would be a generic API to hash and store void pointers, but the pointers were not being hashed. The domain-specific hash function needed to downcast and perform the appropriate remote memory access to fetch the variable-length material that was actually being hashed.
Comment by attractivechaos 5 hours ago
> To meet property (3) [if the key 0 is present, its value is not 0] ... "array index plus one" can be stored rather than "array index".
If hash code can take any value in [0,2^32), how to define a special value for empty buckets? The more common solution is to have a special key, not a special hash code, for empty slots, which is easier to achieve. In addition, as the author points out, supporting generic keys requires to store 32-bit hash values. With the extra 4 bytes per bucket, it is not clear if this implementation is better than plain linear probing (my favorite). The fastest hash table implementations like boost and abseil don't often use Robin Hood hashing.
Comment by clbrmbr 15 hours ago
Comment by loeg 11 hours ago
Comment by LargoLasskhyfv 14 hours ago
Comment by Joker_vD 6 hours ago
Yay, we've got an equivalent of SIB byte but as three (six?) separate opcodes. Well, sub-opcodes.
It's a shame though that Zcmp extension didn't get into RVA23 even as an optional extension.
Comment by judofyr 14 hours ago
struct slot {
uint32_t key;
uint32_t value;
}Comment by nitnelave 14 hours ago
You could work around that with a union or casts with explicit alignment constraints, but this is the shortest way to express that.
Comment by Asooka 12 hours ago
union slot {
uint64_t keyvalue;
struct {
uint64_t key: 32;
uint64_t value: 32;
};
};
Since both members of the union are effectively the exact same type, there is no issue. C99: "If the member used to access the contents of a union is not the same as the member last used to store a value, the object representation of the value that was stored is reinterpreted as an object representation of the new type". Meaning, you can initialise keyvalue and that will initialise both key and value, so writing "union slot s{0}" initialises everything to 0. One issue is that the exact layout for bit fields is implementation defined, so if you absolutely need to know where key and value are in memory, you will have to read GCC's manual (or just experiment). Another is that you cannot take the address of key or value individually, but if your code was already using uint64_t, you probably don't need to.Edit: Note also that you can cast a pointer to slot to a pointer to uint64_t and that does not break strict aliasing rules.
Comment by nitnelave 12 hours ago
Comment by crest 9 hours ago
Comment by zimpenfish 14 hours ago
Comment by simonask 13 hours ago
Comment by zimpenfish 11 hours ago
Comment by loeg 12 hours ago
Comment by mwkaufma 9 hours ago
Comment by ww520 13 hours ago
Comment by air217 12 hours ago