Show HN: Walrus – a Kafka alternative written in Rust
Posted by janicerk 8 days ago
Comments
Comment by Barathkanna 5 days ago
Comment by alexmorley 5 days ago
Comment by lionkor 6 days ago
We called it `tuberculosis`, or `tube` for short; of course, that is what killed Kafka.
Comment by kylecazar 5 days ago
Assuming topics are consumed in your version, a la Kafka.
Comment by sgt 6 days ago
Comment by dbacar 4 days ago
Comment by enether 5 days ago
It’s nice to try and out innovate Kafka, but I fear the network effect can’t be beaten unless the alternative is 10x better.
Something like Warpstream’s architecture[2] had a shot at dethroning Kafka, but critically even they adopted the Kafka API. Sure enough, Apache Kafka introduced a competing feature[3] within two years of warpstreams launch too.
[1] - https://github.com/tansu-io/tansu [2] - https://www.warpstream.com/ [3] - https://topicpartition.io/blog/kip-1150-diskless-topics-in-a...
Comment by roncohen 6 days ago
Wasn't immediately clear to me if the data-plane level replication also happens through Raft or something home-rolled? Getting consistency and reliability right with something home-rolled is challenging.
Notes:
- Would love to see it in an S3-backed mode, either entirely diskless like WarpStream or as tiered storage.
- Love the simplified API. If possible, adding a Kafka compatible API interface is probably worth it to connect to the broader ecosystem.
Best of luck!
Comment by seanhunter 6 days ago
" It provides fault-tolerant streaming with automatic leadership rotation, segment-based partitioning, and Raft consensus for metadata coordination."
So I guess that's a "yes" to raft?Comment by zbentley 6 days ago
Comment by EdwardDiego 6 days ago
Comment by nubskr 6 days ago
Also about the kafka API, I tried to implement that earlier, I had a sort of `translation` layer for that earlier, but it gets pretty complicated to maintain that because kafka is offset based, while walrus is message based.
Comment by EdwardDiego 6 days ago
Comment by zellyn 5 days ago
tl;dr they write to s3 once every 250ms to save costs. IIRC, they contend that when you keep things organized by writing to different files for each topic, it's the Linux disk cache being clever that turns the tangle of disk block arrangement into a clean view per file. They wrote their own version of that, so they can cheaply checkpoint heavily interleaved chunks of data while their in-memory cache provides a clean per-topic view. I think maybe they clean up later async, but my memory fails me.
I don't know how BufStream works.
The thing that really stuck with me from that interview is the 10x cost reduction you can get if you're willing and able to tolerate higher latency and increased complexity and use S3. Apparently they implemented that inside Datadog ("Labrador" I think?), and then did it again with WarpStream.
I highly recommend the whole episode (and the whole podcast, really).
Comment by EdwardDiego 2 days ago
Comment by k_bx 6 days ago
Never tried it, but looks promising
Comment by spetz 5 days ago
Comment by tormeh 6 days ago
Comment by carverauto 5 days ago
Comment by yencabulator 18 hours ago
Comment by teleforce 6 days ago
Redpanda claim of better performance but benchmarks showed no clear winner [3].
It will be interesting to test them together on the performance benchmarks.
I've got the feeling it's not due to programming language implementation of Scala/Java (Kafka), C++ (Redpanda) and Rust (Walrus).
It's the very architecture of Kafka itself due to the notorious head of line problem (check the top most comments [4].
[1] Redpanda – A Kafka-compatible streaming platform for mission-critical workloads (120 comments):
https://news.ycombinator.com/item?id=25075739
[2] Redpanda website:
[3] Kafka vs. Redpanda performance – do the claims add up? (141 comments):
https://news.ycombinator.com/item?id=35949771
[4] What If We Could Rebuild Kafka from Scratch? (220 comments):
Comment by nubskr 6 days ago
Comment by chaotic-good 5 days ago
Comment by EdwardDiego 6 days ago
Except a consumer can discard an unprocessable record? I'm not certain I understand how HOL applies to Kafka, but keen to learn more :)
Comment by thinkharderdev 5 days ago
It's not the unproccessable records that are the problem it is the records that are very slow to process (for whatever reason).
Comment by pixelpoet 5 days ago
Comment by selkin 4 days ago
Comment by pixelpoet 4 days ago
Comment by selkin 4 days ago
Comment by pixelpoet 4 days ago
There's also Fil-C BTW, and in normal C++ there are GCC or Clang (I forget which) extensions for detecting threading issues, even good old Valgrind is under-appreciated and under-used. In general one wants to adopt best practices and be proactive, rather than relying on the language to solve all problems (of course).
Comment by throw10920 5 days ago
Comment by gethly 5 days ago
Comment by mrkeen 5 days ago
It's popular because it didn't have any competition while it built up its ecosystem. And even though there's competitors now, I haven't had time to check them out, and they still brand themselves as "Kafka-alternatives".
Comment by sumtechguy 5 days ago
Comment by lucyjojo 4 days ago
Comment by optician_owl 3 days ago
- Let's create a new Kafka in rust. Yeah!
- Let's create a Kafka client that's ready to use. Blah.
Comment by WD-42 5 days ago
Comment by oulipo2 6 days ago
Comment by Zambyte 5 days ago
Comment by Natfan 5 days ago
like how postgrest works with postgres
Comment by ertucetin 5 days ago
Comment by YouAreWRONGtoo 5 days ago
Comment by throwfaraway135 5 days ago
Comment by deeznuttynutz 4 days ago
Comment by fareesh 5 days ago
Comment by ulttlbtch 5 days ago
Comment by hexo 5 days ago
Comment by arschficknigger 6 days ago