Replacing Protobuf with Rust
Posted by whiteros_e 1 day ago
Comments
Comment by GuB-42 1 day ago
The problem they have software written in Rust, and they need to use the libpg_query library, that is written in C. Because they can't use the C library directly, they had to use a Rust-to-C binding library, that uses Protobuf for portability reasons. Problem is that it is slow.
So what they did is that they wrote their own non-portable but much more optimized Rust-to-C bindings, with the help of a LLM.
But had they written their software in C, they wouldn't have needed to do any conversion at all. It means they could have titled the article "How we lowered the performance penalty of using Rust".
I don't know much about Rust or libpg_query, but they probably could have gone even faster by getting rid of the conversion entirely. It would most likely have involved major adaptations and some unsafe Rust though. Writing a converter has many advantages: portability, convenience, security, etc... but it has a cost, and ultimately, I think it is a big reason why computers are so fast and apps are so slow. Our machines keep copying, converting, serializing and deserializing things.
Note: I have nothing against what they did, quite the opposite, I always appreciate those who care about performance, and what they did is reasonable and effective, good job!
Comment by Aurornis 23 hours ago
Rust didn't slow them down. The inefficient design of the external library did.
Calling into C libraries from Rust is extremely easy. It takes some work to create a safer wrapper around C libraries, but it's been done for many popular libraries.
This is the first and only time I've seen an external library connected via a Rube Goldberg like contraption with protobufs in the middle. That's the problem.
Sadly they went with the "rewrite to Rust" meme in the headline for more clickability.
Comment by GuB-42 22 hours ago
Calling the C function is not the problem here. It is dealing with the big data structure this function returns in a Rust-friendly manner.
This is something Protobuf does very well, at the cost of performance.
Comment by wizzwizz4 20 hours ago
Comment by driftwood4537 21 hours ago
Would love to hear if i could've come up with a better design.
Comment by phkahler 1 day ago
That's not really fair. The library was doing serialization/deserialization which was poor design choice from a performance perspective. They just made a more sane API that doesn't do all that extra work. It might best be titles "replacing protobuf with a normal API to go 5 times faster."
BTW what makes you think writing their end in C would yield even higher performance?
Comment by GuB-42 1 day ago
C is not inherently faster, you are right about that.
But what I understand is that the library they use works with data structures that are designed to be used in a C-like language, and are presumably full of raw pointers. These are not ideal for working in Rust, instead, presumably, they wrote their own data model in Rust fashion, which means that now, they need to make a conversion, which is obviously slower than doing nothing.
They probably could have worked with the C structures directly, resulting in code that could be as fast as C, but that wouldn't make for great Rust code. In the end, they chose the compromise of speeding up conversion.
Also, the use of Protobuf may be a poor choice from a performance perspective, but it is a good choice for portability, it allows them to support plenty of languages for cheaper, and Rust was just one among others. The PgDog team gave Rust and their specific application special treatment.
Comment by timschmidt 19 hours ago
One would think. But since caches have grown so large, and memory speed and latency haven't scaled with compute, so long as the conversion fits in the cache and is operating on data already in the cache from previous operations, which admittedly takes some care, there's often an embarrassing amount of compute sitting idle waiting for the next response from memory. So if your workload is memory or disk or network bound, conversions can oftentimes be "free" in terms of wall clock time. At the cost of slightly more wattage burnt by the CPU(s). Much depends on the size and complexity of the data structure.
Comment by hn_go_brrrrr 1 day ago
Comment by the__alchemist 1 day ago
I write most of my applications and libraries in Rust, and lament that most of the libraries I wish I would FFI are in C++ or Python, which are more difficult.
Protobuf sounds like the wrong tool. It has applications for wire serialization and similar, but is still kind of a mess there. I would not apply it to something that stays in memory.
Comment by vlovich123 1 day ago
Comment by kleton 1 day ago
Comment by kccqzy 1 day ago
It’s however somewhat common to pass in-memory protobuf objects between code, because the author didn’t want to define a custom struct but preferred to use an existing protobuf definition.
Comment by hn_go_brrrrr 1 day ago
Comment by 1718627440 23 hours ago
Comment by dchuk 1 day ago
Comment by cfors 1 day ago
Comment by vineyardmike 22 hours ago
I agree that LLMs will make clients/interfaces in every language combination much more common, but I wonder the impact it’ll have on these big software projects if more people stop learning C.
Comment by logicchains 1 day ago
That sounds like a performance nightmare, putting Protobuf of all things between the language and Postgres, I'm surprised such a library ever got popular.
Comment by formerly_proven 1 day ago
Because it is not popular.
pg_query (TFA) has ~1 million downloads, the postgres crate has 11 million downloads and the related tokio-postgres crate has over 33 million downloads. The two postgres crates currently see around 50x as much traffic as the (special-purpose) crate from the article.
edit: There is also pq-sys with over 12 million downloads, used by diesel, and sqlx-postgres with over 16 million downloads, used by sqlx.
Comment by cranx 1 day ago
Comment by bluGill 1 day ago
Protobuf also handles a bunch of languages for you. The other team wants to write in a "stupid language" - you don't have to have a political fight to prove your preferred is best for everything. You just let that team do what they want and they can learn the hard way it was a bad language. Either it isn't really that bad and so the fight was pointless, or it was but management can find other metrics to prove it and it becomes their problem to decide if it is bad enough to be worth fixing.
Comment by vlovich123 1 day ago
Comment by bluGill 1 day ago
Comment by squirrellous 15 hours ago
Examples that can noticeably slow things down even for “normal” web apps: map types and deeply nested messages.
Comment by MrDarcy 1 day ago
Comment by dietr1ch 1 day ago
Protobuf is likely really close to optimally fast for what it is designed to be, and the flaws and performance losses left are most likely all in the design space, which is why alternatives are a dime a dozen.
Comment by satvikpendem 20 hours ago
Comment by infogulch 18 hours ago
> Protobuf performs up to 6 times faster than JSON. - https://auth0.com/blog/beating-json-performance-with-protobu... (2017)
That's a 30x faster just by switching to a zero-copy data format that's suitable for both in memory use and network. JSON services spend 20-90% of their compute on serde. A zero copy data format would essentially eliminate it.
Comment by nicman23 1 day ago
Comment by jeffbee 21 hours ago
Comment by cmrdporcupine 1 day ago
Go around doing this kind of pointless thing because "it's only 5x slower" is a bad assumption to make.
Comment by miroljub 1 day ago
Just doing memcpy or mmap would be even faster. But the same Rust advocates bragging about Rust speed frown upon such unsecure practices in C/C++.
Comment by infogulch 1 day ago
Comment by mrlongroots 1 day ago
And the reason is ABI compatibility. Reasoning about ABI compatibility across different C++ versions and optimization levels and architectures can be a nightmare, let alone different programming languages.
The reason it works at all for Arrow is that the leaf levels of the data model are large contiguous columnar arrays, so reconstructing the higher layers still gets you a lot of value. The other domains where it works are tensors/DLPack and scientific arrays (Zarr etc). For arbitrary struct layouts across languages/architectures/versions, serdes is way more reliable than a universal ABI.
Comment by lenkite 1 day ago
Comment by nottorp 1 day ago
They changed the persistence system completely. Looks like from a generic solution to something specific to what they're carrying across the wire.
They could have done it in Lua and it would have been 3x faster.
Comment by consp 1 day ago
Comment by desiderantes 1 day ago
Comment by hu3 1 day ago
Comment by timeon 1 day ago
Comment by izacus 1 day ago
Comment by DangitBobby 20 hours ago
Comment by satvikpendem 20 hours ago
Comment by embedding-shape 1 day ago
Comment by alias_neo 1 day ago
I wonder if it's just poorly worded and they meant to say something like "Replacing Protobuf with some native calls [in Rust]".
Comment by misja111 1 day ago
Comment by mkoubaa 1 day ago
Comment by win311fwg 1 day ago
Comment by locknitpicker 1 day ago
> Protobuf is fast, but not using Protobuf is faster.
The blog post reads like an unserious attempt to repeat a Rust meme.
Comment by lfittl 21 hours ago
The initial motivation for developing pg_query was for pganalyze, where we use it to parse queries extracted from Postgres, to find the referenced tables, and these days also rewrite and format queries. That use case runs in the background, and as such is much less performance critical.
pg_query actually initially used a JSON format for the parse output (AST), but we changed that to Protobuf a few major releases ago, because Protobuf makes it easy to have typed bindings in the different languages we support (Ruby, Go, Rust, Python, etc). Alternatives (e.g. using FFI directly) make sense for Rust, but would require a lot of maintained glue code for other languages.
All that said, I'm supportive of Lev's effort here, and we'll add some additional functions (see [0]) in the libpg_query library to make using it directly (i.e. via FFI) easier. But I don't see Protobuf going away, because in non-performance critical cases, it is more ergonomic across the different bindings.
Comment by rozenmd 1 day ago
Comment by 7777332215 1 day ago
Comment by IshKebab 1 day ago
Comment by gf000 1 day ago
Comment by Sesse__ 1 day ago
Notably, Protobuf 2, a rewrite of Protobuf 1. Protobuf 1 was created by Sanjay Ghemawat, I believe.
Comment by 7e 1 day ago
Comment by notyourwork 1 day ago
Comment by 7e 19 hours ago
Comment by kentonv 19 hours ago
Comment by 7e 17 hours ago
Comment by gorset 9 hours ago
For high performance and critical stuff, SBE is much more suitable, but it doesn't have as good of a schema evolution story as protobuf.
Comment by sph 11 hours ago
By all means, keep using it, but it might be worth figuring out why other people don’t. Hint: it’s not because they’re more stupid than you or are looking to get promoted by big G.
(Personally, I like the ideas and binary encoding behind Capn Proto more than all the alternatives)
Comment by kentonv 19 hours ago
Comment by jpalepu33 20 hours ago
This is a common pattern: "We switched to X and got 5x faster" often really means "We fixed our terrible implementation and happened to rewrite it in X."
Key lessons from this:
1. Serialization/deserialization is often a hidden bottleneck, especially in microservices where you're doing it constantly 2. The default implementation of any library is rarely optimal for your specific use case 3. Benchmarking before optimization is critical - they identified the actual bottleneck instead of guessing
For anyone dealing with Protobuf performance issues, before rewriting: - Use arena allocation to reduce memory allocations - Pool your message objects - Consider if you actually need all the fields you're serializing - Profile the actual hot path
Rust FFI has overhead too. The real win here was probably rethinking their data flow and doing the optimization work, not just the language choice.
Comment by yodacola 1 day ago
Comment by nindalf 1 day ago
[1] - https://github.com/protocolbuffers/protobuf: Google's data interchange format
[2] - https://github.com/google/flatbuffers: Also maintained by Google
Comment by rafaelmn 1 day ago
AFAIK they have a bunch of production infra on protobuff/gRPC - not so sure about flatbufferrs which came out of the game dev side - that's the difference maker to me - which project is actually rooted in.
Comment by dewey 1 day ago
If you worked on Go projects that import Google protobuf / grpc / Kubernetes client libraries you are often reminded of that fact.
Comment by dmoy 23 hours ago
Stubby, not gRPC. Stubby is used for almost everything internally. gRPC is a similar-ish looking thing that is open sourced, but not used nearly as much as stubby internally.
Stubby predates gRPC by like 15 years or something.
> not so sure about flatbufferrs which came out of the game dev side
I wouldn't know. I'll be honest, I always forget that Google made flatbuffers. I guess if you're doing a lot of IPC?
Comment by whoevercares 1 day ago
Comment by cmrdporcupine 22 hours ago
It just means a person working at Google used that avenue to open source them.
Google offers a legal few avenues to allow you to open source your stuff while working there but one of the easiest it just to assign copyright to Google and shove it under their GitHub.
It just means a Googler published it, not that Google itself is maintaining it.
I don't know what the status of flatbuffers is specifically, but I can say I never encountered it in use in the 10 years I worked there. (I use it a lot now on my own things post-Google)
Comment by rurban 21 hours ago
Comment by secondcoming 1 day ago
Comment by ruicraveiro 3 hours ago
Comment by eliasdejong 23 hours ago
Comment by tucnak 23 hours ago
Comment by suriya-ganesh 1 day ago
using a transport serialization and deserialization protocol for IPC. It is obvious why there was an overhead because it was architectural decision to manage the communication.
I guess the old adage of if something goes 20% faster something was improved if it is 10x faster, it was just built wrong is true here.
Comment by nemothekid 22 hours ago
Comment by ordu 19 hours ago
I had experience with writing safe bindings to structures created in C library, and it is a real pain. You spend a lot of times reverse engineering C code to get an idea of the intent of those who had wrote the code. You need to know which pointers can address the same memory. You need to know which pointers can be NULL or just plain invalid. You need to know which pointers you get from C code or pass to it along with ownership, and which are just borrowed. It maybe (and often is) unclear from the documentation, so you are going to read a lot of C code, trying to guess what the authors were thinking when writing it. Generating hypotheses about the library behavior (like 'library never does THIS with the pointer') and trying to prove them by finding all the code dealing with the pointer.
It can be easy in easy situations, or it can be really tricky and time consuming. So it can make sense to just insert serialization/deserialization to avoid dealing with C code.
Comment by maherbeg 1 day ago
At the scale we were using PGDog, enabling the previous form of the query parser was extremely expensive (we would have had to 16x our pgdog fleet size).
Comment by levkk 23 hours ago
Thank you so much for the kind words!
Comment by t-writescode 1 day ago
Comment by izacus 1 day ago
Do you want to maintain that and debug that? Do you want to do all of that without help of a compiler enforcing the schema and failing compiles/CI when someone accidentally changes the schema?
Because you get all of that with protobuf if you use them appropriately.
You can of course build all of this yourself... and maybe it'll even be as efficient, performant and supported. Maybe.
Comment by t-writescode 23 hours ago
And it works in a browser, too!
Comment by nicman23 1 day ago
Comment by eklavya 1 day ago
Comment by 9rx 1 day ago
Comment by eklavya 1 day ago
Comment by 9rx 1 day ago
Assuming you were using Protobufs as they are usually used, meaning under generated code, I saw no difference between using it in Javascript and any other language in my experience. The wire format is beyond your concern. At least it is no more of your concern than it is in any other environment.
There are a number of different generator implementations for Javascript/Typescript. Some of them have some peculiar design choices. Is that where you found issue? I would certainly agree with that, but others aren't so bad. That doesn't really have anything to do with the browser, though. You'd have the same problem using protobufs under Node.js.
Comment by tcfhgj 1 day ago
Comment by tuetuopay 1 day ago
Having a way to describe your whole API and generate bindings is a godsend. Yes, it can be done with JSON and OpenApi, yet it’s not mandatory.
Comment by 9rx 1 day ago
It is not mandatory for Protobuf either. You can construct a protobuf message with an implied structure just as you can with JSON. It does not violate the spec.
Protobuf ultimately gets the nod because it has better tooling (which isn't to be taken as praise towards Protobuf's tooling, but OpenAPI is worse).
Comment by vouwfietsman 1 day ago
It sounds weird, and its totally dependent on your use case, but binary serialization can make a giant difference.
For me, I work with 3D data which is primarily (but not only) tightly packed arrays of floats & ints. I have a bunch of options available:
1. JSON/XML, readable, easy to work with, relatively bulky (but not as bad as people think if you compress) but no random access, and slow floating point parsing, great extensibility.
2. JSON/XML + base64, OK to work with, quite bulky, no random access, faster parsing, but no structure, extensible.
3. Manual binary serialization: hard to work with, OK size (esp compressed), random access if you put in the effort, optimal parsing, not extensible unless you put in a lot of effort.
4. Flatbuffers/protobuf/capn-proto/etc: easy to work with, great size (esp compressed), random access, close-to-optimal parsing, extensible.
Basically if you care about performance, you would really like to just have control of the binary layout of your data, but you generally don't want to design extensibility and random access yourself, so you end up sacrificing explicit layout (and so some performance) by choosing a convenient lib.
We are a very regularly sized company, but our 3D data spans hundreds of terabytes.
(also, no, there is no general purpose 3D format available to do this work, gltf and friends are great but have a small range of usecases)
Comment by physicsguy 1 day ago
Comment by t-writescode 23 hours ago
Comment by tucnak 23 hours ago
Comment by t-writescode 21 hours ago
Statistically, a lot of people who post on HN and cling to new or advanced tech *do* just write CRUD apps with a little special sauce, it’s part of what makes vibe coding and many of the frameworks we use so appealing.
I’m not ignoring that other things exist and are even very common; and I was agreeing with the person that’s a useful case.
I’ve also worked for various companies where protobuf has been suggested as a way to solve a political/organizational issue, not a code or platform issue.
Comment by bluGill 1 day ago
Comment by t-writescode 23 hours ago
Not having that functionality is a weakness of a language or its support tools at this point, to me.
Comment by Chiron1991 1 day ago
Comment by jonathanstrange 1 day ago
Comment by speed_spread 1 day ago
Well I agree. Contract-first is great. You provide your clients with the specs and let them generate their own bindings. And as a client they're great too because I can also easily generate a mock server implementation that I can use in tests.
Comment by pjmlp 1 day ago
Comment by lowdownbutter 1 day ago
Comment by chuckadams 1 day ago
Comment by 0x457 22 hours ago
Comment by ajross 20 hours ago
Well, yeah. If there's a feature you don't need, you'll see value by coding around it. Some features turn out not to be needed by anyone, maybe this is one. But some people need serialization, and that's what protobufs are for[1]. Those people are very (!) poorly served by headlines telling them to use Rust (!!) instead of serialization.
[1] Though as always the standard litany applies: you actually want JSON, and not protobus or ASN.1 or anything else. If you like some other technology better, you're wrong and you actually want JSON. If you think you need something faster, you probably don't and JSON would suit your needs better. If you really, 100%, know for sure that you need it faster than JSON, then you're probably isomorphic to the folks in the linked article, shouldn't have been serializing at all, and should get to work open coding your own hooks on the raw backend.
Comment by rgovostes 20 hours ago
Comment by linuxftw 1 day ago
Comment by spwa4 1 day ago
Comment by levkk 23 hours ago
The output is machine-verifiable, which makes this uniquely possible in today's vibe-coded world!
Comment by sylware 1 day ago
I wrote assembly, memory mapping oriented protobuf software... in assembly, then what? I am allowed to say I am going 1000 times faster than rust now???
Comment by IshKebab 1 day ago
But I would just increase the stack size limit if it ever becomes a problem. As far as I know the only reason it is so small is because of address space exhaustion which only affects 32-bit systems.
Comment by jeroenhd 1 day ago
The `become` keyword has already been reserved and work continues to happen (https://github.com/rust-lang/rust/issues/112788). If you enable #![feature(explicit_tail_calls)] you can already use the feature in the nightly compiler: https://play.rust-lang.org/?version=nightly&mode=debug&editi...
(Note that enabling release mode on that link will have the compiler pre-calculate the result so you need to put it to debug mode if you want to see the assembly this generates)
Comment by embedding-shape 1 day ago
Isn't that just TCO or similar? Usually a part of the compiler/core of the language itself, AFAIK.
Comment by koverstreet 1 day ago
So I think there's value in providing it as an explicit opt-in; that way when you're reading the code, you know to account for it when you're looking at backtraces.
Additionally, if you're relying on TCO it might be a major bug if the compiler isn't able to apply it - and optimizations that aren't applied are normally invisible. This might mean you could get an error if you're expecting TCO and you or the compiler screwed something up.
Comment by tialaramex 1 day ago
Suppose I have a recursive function f(n: u8) where f(0) is 0 and otherwise f(n) is n * bar(n) + f(n-1)
I might well write that with a local temporary to calculate bar(n) and then we do the sum, but this would inhibit TCO because that temporary should exist after we did the recursive calculation, even though it doesn't matter in practice.
A compiler could try to cleverly figure out whether it matters and destroy that local temporary earlier then apply TCO, but now your TCO is fragile because a seemingly minor code change might fool that "clever" logic, by ensuring it isn't correct to make this change and breaking your optimisation.
The `become` keyword is a claim by the programmer that we can drop all these locals and do TCO. So because the programmer claimed this should work they're giving the compiler permission to attempt the early drop and if it doesn't work and can't be TCO then complain that the program is wrong.
Comment by unnouinceput 1 day ago
So it's C actually, not Rust. But Hey! we used Rust somewhere, so let's post it on HN and farm internet points.
Comment by steeve 1 day ago
Comment by ahartmetz 1 day ago
Comment by Terretta 22 hours ago
Comment by Xunjin 1 day ago
Comment by xxs 1 day ago