ULID: Universally Unique Lexicographically Sortable Identifier
Posted by der_gopher 8 days ago
Comments
Comment by rdtsc 12 hours ago
Yeah, I would go with UUID v7 at this point given that it's part of the UUID RFC https://datatracker.ietf.org/doc/html/rfc9562#name-uuid-vers...
Comment by codys 8 hours ago
Comment by sedatk 12 hours ago
It's best to stick to UUIDv7 because of such quirks of ULID.
Comment by cpburns2009 11 hours ago
Can you expand on how this can actually cause a problem? My understanding is different processes and hosts should never conflict because of the 80 bits of random data. The only way I can conceive of a conflict is multiple threads using the same non-thread-safe generator during the same millisecond.
Comment by sedatk 11 hours ago
Comment by 0x457 10 hours ago
I've switched to using UUIDv7 tho. It made sense to use ULID before v7, but now ULID only has one thing going on - smaller string representation. That doesn't matter if your storage can store UUIDs natively (i.e. as 128 bit integer)
If your goal is to have global order intact, then neither ULID nor UUIDv7 is going to work for you.
Comment by sedatk 9 hours ago
Yes, and that's when sequences are only used. I guess that's to avoid hogging the CPU or emptying the OS entropy pool during high loads.
However, that "optimization" is a failure mode if you're not aware how ULID internals work. It's easy to shoot yourself in the foot by blindly trusting ULID will always generate a unique ID across threads without blocking your thread. That's a sneaky footgun.
> That generally only matters when you create a batch of IDs at once
No, any web service instance can receive requests at arbitrary times, and sometimes in the same millisecond zone. The probability is proportional to the number of concurrent users and requests.
> If your goal is to have global order intact, then neither ULID nor UUIDv7 is going to work for you.
Agreed.
Comment by jasonwatkinspdx 9 hours ago
Just a heads up that's not really a thing. If the CSPRNG is initialized correctly you're done. There's nothing being depleted. I know for ages the linux docs said different, they were just wrong and a maintainer was keeping a weird little fiefdom over it.
Comment by sedatk 8 hours ago
Comment by vbezhenar 10 hours ago
I've implemented this thing, though not called it ULID. I've dedicated some bits for timestamp, some bits for counter within millisecond and rest for randomness. So they always ordered and always unpredictable.
Another approach is to keep latest generated UUID and if new UUID requested within the same timestamp - generate random part until it's greater than previous one. I think that's pretty good approach as well.
Comment by sedatk 9 hours ago
It's literally incrementing it by one:
https://github.com/ulid/javascript/blob/11c2067821ee19e4dc78...
https://github.com/ulid/javascript/blob/11c2067821ee19e4dc78...
Comment by vbezhenar 4 hours ago
But that's easy to fix, so just implementation quirk for this particular library, the idea is sound.
Comment by sedatk 2 hours ago
It's in ULID spec.
Comment by jasonwatkinspdx 9 hours ago
It is and it does.
Also the ULID spec suggests you use a CSPRNG, but doesn't mandate that or provide specific advice on appropriate algorithms. So in practice people may reach for whatever hash function is convenient in their project, which may just be FNV or similar with considerably weaker randomness too.
Comment by cpburns2009 10 hours ago
Comment by sedatk 9 hours ago
Comment by unscaled 8 hours ago
But I don't think UUIDv7 solves the issue by "having less quirks". Just like you'd have to be careful to use the non-monotonic version of ULID, you'd have to be careful to use the right version of UUID. You also have to hope that all of your UUID consumers (which would almost invariably try to parse or validate the UUID, even if they do nothing with it) support UUIDv7 or don't throw on an unknown version.
Comment by sedatk 7 hours ago
I agree that picking UUID variant requires caution, but when someone has already picked ULID, UUIDv7 is easily a superior alternative.
Comment by skeledrew 7 hours ago
Comment by marifjeren 10 hours ago
Are monotonic/sequential ULIDs as easily enumerated as integers? It's the ease of enumerability that keeps a lot of folks away from using sequential integers as IDs
Comment by sedatk 9 hours ago
Comment by marifjeren 9 hours ago
Oh and yeah, I guess I do think lots of script / AI kiddies would be discouraged by, or fail to see an opportunity when presented with, something that does not look like the numbers they saw in school.
Comment by N_Lens 9 hours ago
Comment by sedatk 9 hours ago
Comment by N_Lens 9 hours ago
Comment by sedatk 8 hours ago
Either it’s collision-prone or locking. Both are problematic in their own way.
It’s footguns all over while UUIDv7 simply exists.
Comment by N_Lens 2 hours ago
Comment by listenallyall 11 hours ago
Comment by jasonwatkinspdx 10 hours ago
I think this "sort of monotonic but not really" is the worst of both to be honest. It tempts you to treat it like an invariant when it isn't.
If you want monotonicity with independent generation, just use a composite key that's a lamport clock and a random nonce. Or if you want to be even more snazzy use Hybrid Logical Clocks or similar.
Comment by Dylan16807 9 hours ago
But the chance of the initial random positions being near each other is very very low.
If you pick a billion random numbers in an 80 bit space, the chance you have a collision is one in a million. (2^80 / (2^30)^2)
If you pick a thousand random starting points and generate a million sequential numbers each, the chance your starting points are sufficiently close to each other to cause an overlap is one in a trillion. ((2^80 / 2^20) / (2^10)^2)
In that one in a trillion case, you'll likely end up with half a million collisions, which might matter to you. But if you care about 0 collisions versus 1+ collisions, pick the monotonic version.
Comment by jasonwatkinspdx 9 hours ago
Or if what you want is monotonic distributed timestamps, again, HLC is how you do that properly.
So you're embracing this weird limitations for no real benefit.
And as you can see in the rest of this comment thread, a lot of people simply do not even know this behavior and are assuming the 80 bit portion is always random. Which is my whole point about having a not really an invariant invariant just being a bad way to go fundamentally.
Edit: just to reply to the below since I can't do so directly, I understand the arithmetic here. What I'm saying is there's zero reason to choose this weird scheme vs something that's just the full birthday bound and you never think about it again.
As another comment points out: just consider neighbor guessing attacks. This 80 bit random but not random field is a foot gun to anyone that just assumes they can treat it as truly random.
Comment by Dylan16807 8 hours ago
> Or if what you want is monotonic distributed timestamps, again, HLC is how you do that properly.
Why not just 64 bit timestamps stapled to a random number? You can be collision proof and monotonic without doing anything fancy.
> this weird scheme vs something that's just the full birthday bound and you never think about it again
But the weird scheme gives you better odds than the birthday bound.
Comment by listenallyall 9 hours ago
Comment by listenallyall 9 hours ago
I concede I'm no mathematician and I could be wrong here, but your analysis feels similar to assuming 10-11-12-13-14-15 is less likely to be a winning lottery ticket because the odds against consecutive numbers are so massive.
Comment by jasonwatkinspdx 9 hours ago
My basic point is the probability of collision is lower than the birthday bound, there's no need for this, and as comments in this thread make clear people are not understanding this limitation even exists with the specification.
Comment by listenallyall 8 hours ago
Ok then, make it easy - your requirement is to independently pick 4 numbers from the range 0 to 9, without resulting in any duplicates. Which is more likely to be successful:
- pick 4 random digits independently
- pick a random digit, which will be appended by the next digit as pick #2 (i.e. if you pick 5, then 6 will automatically be your second digit, if you pick 9, 0 will be your second digit). Then pick once more on the same terms.
The math here is easy: scenario 1 you have 0.9 x 0.8 x 0.7 = 0.504 likelihood of success. Scenario 2 it's simply 0.7.
Comment by sedatk 11 hours ago
Comment by nighthawk454 13 hours ago
> Why not use UUID7?
> "ULID is much older than UUID v7 though and looks nicer"
For those unfamiliar, UUIDv7 has pretty much the same properties – sortable, has timestamp, etc.
ULID: 01ARZ3NDEKTSV4RRFFQ69G5FAV
UUIDv7: 019b04ff-09e3-7abe-907f-d67ef9384f4f
Comment by wood_spirit 12 hours ago
Comment by nvader 12 hours ago
Comment by ChymeraXYZ 12 hours ago
Comment by unscaled 8 hours ago
Comment by andy_ppp 12 hours ago
Comment by elias1233 10 hours ago
Comment by ivan_gammel 11 minutes ago
Comment by verandaguy 7 hours ago
Comment by wafflestomp 5 hours ago
Comment by sblom 13 hours ago
Comment by jalk 12 hours ago
UUIDv7 has 62 bits of random data, ULID uses 80 bits, so if anything ULID is "stronger" (meaning less chances of generating the same id within the same millisecond)
Comment by 0x457 10 hours ago
Comment by swyx 7 hours ago
for those also learning
Comment by catlover76 8 days ago
Comment by pklausler 11 hours ago
Comment by N_Lens 9 hours ago