Distributed ID Formats Are Architectural Commitments, Not Just Data Types

Posted by mnahkies 3 days ago

Counter22Comment5OpenOriginal

Comments

Comment by mrkeen 9 minutes ago

> The old auto-increment IDs were totally fine—until suddenly they weren’t, because multiple shards couldn’t share the same global counter anymore.

> Their workaround was simple and surprisingly effective: they offset new IDs by a huge constant—roughly a billion. Old IDs stayed below the threshold, new IDs lived above it, and nothing collided. It worked surprisingly well, but it also taught me something.

So what was the fix? The new numbers are bigger? I need a little more detail.

> If your system is running on a single database with moderate traffic, auto-increment is still probably the best answer. Don’t overthink it.

If autoincrement is the simplest way to do things, but breaks if you evolve the system in any conceivable way, maybe autoincrement isn't the simplest way to do things.

Isn't that the point of the article?

Comment by alwa 31 minutes ago

> Clock drift is the other issue. If system time moves backward, ULID ordering breaks. I’ve seen this create really confusing bugs where yesterday’s events suddenly sort after today’s in an analytics pipeline.

> It still depends on system clocks like every other timestamp-based format, so clock drift affects it the same way.

…your systems’ clocks drift by days, this motivates you to homebrew a distributed ID format, and your distributed ID format is susceptible to the same problems?

There must be a reason you can’t use NTP or GPS (or, you know, GLONASS or whatever)… but you’re sure a new ID format is the solution?

Comment by CGamesPlay 1 hour ago

The checksum idea is interesting, but why make it a tack-on at the end? Taking 20 random bits to use for a mandatory checksum seems like an interesting trade-off.

Comment by theoli 2 hours ago

Epoch shift with 48-bit timestamp that has >12,000 years of range to get another 50 years of range is an amusing choice.

Comment by frutiger 3 hours ago

> ID formats aren’t just formats. They’re commitments.

Reading direct LLM output is highly cringeworthy.