How fast can browsers process base64 data?

Posted by mfiguiere 11 days ago

Counter49Comment27OpenOriginal

Comments

Comment by adzm 4 days ago

https://chromium-review.googlesource.com/c/v8/v8/+/7208125 v8 changed yesterday to avoid the temp buffer which will likely double the base64 decode speed.

Looks like this was brought up there as a result of this article too, which is neat! And helpful since I was just messing with a node script that is heavily decoding base64

Comment by Neywiny 4 days ago

I like commits like this. Removes unnecessary work, easy to understand optimization, 40 lines gone 32 added so net smaller codebase, which usually means easier to maintain. Has ample comments and even uses one of my favorite tricks to do a ceiling function.

Comment by Retr0id 4 days ago

> Browsers recently added convenient and safe functions to process base 64 functions Uint8Array.toBase64() and Uint8Array.fromBase64()

Wow, finally! I've had to work around this so many times in the past (btoa/atob do not play nicely with raw binary data - although there are workarounds on the decode path involving generating data URIs)

Comment by rezmason 4 days ago

base64 is embarrassingly parallel. So just pipe it to the GPU:

  precision highp float;
  uniform vec2 size;
  uniform sampler2D src,tab;
  void main(){
    vec4 a=(gl_FragCoord-.5)*3.,i=vec4(0,1,2,0)+a.y*size.x+a.x,y=floor(i/size.x),x=i-y*size.x;
    #define s(n)texture2D(src,vec2(x[n],y[n])/size)[0]
    #define e(n)texture2D(tab,vec2(a[n],0))[0]
    a=vec4(s(0),s(1),s(2),0)*255.*pow(vec4(2),-vec4(2,4,6,0)),a=fract(a).wxyz+floor(a)/64.,gl_FragColor=vec4(e(0),e(1),e(2),e(3));
  }

Comment by lioeters 4 days ago

HN user: Ah yes let me casually scribble down a tweet-sized base64 encoder that runs parallel on GPU.

Bravo, that is a thing of beauty.

Comment by pixelpoet 3 days ago

Uhhh no, it's a huge net loss because the cost of sending it to the GPU and back greatly exceeds the cost of just doing it then and there in CPU; even on iGPU the kernel launch latency etc will kill it, and that's assuming the kernel build is free. Not to mention this is doing pow calls (!!), which is so ridiculous it makes me wonder if this was a kneejerk AI prompt.

Another post in this thread mentioned V8 sped this up by removing a buffer copy; this is adding two buffer copies, each about an order of magnitude slower.

Come on guys...

Comment by rezmason 1 day ago

Don't make me upload my web-browser-in-a-GLSL-shader snippet

Comment by pixelpoet 22 hours ago

Uhhh, go for it? You're welcome to link anything you like of course, but do you maybe want to address my actual points if you have any objections? Let's do some measurements, it sounds like you might be surprised by the outcome.

Web browser in a shader also sounds extremely inefficient, for obvious fundamental reasons.

Comment by rezmason 10 hours ago

Sorry, I was cracking a joke about the browser in a shader.

The GLSL I originally posted is from the "cursed mode" of my side project, and I use it to produce a data URI of every frame, 15 times per second, as a twisted homage to old hardware. (No, I didn't use AI :P )

https://github.com/Rezmason/excel_97_egg

That said, is `pow(vec4(2),-vec4(2,4,6,0))` really so bad? I figured it'd be replaced with `vec4(0.25, 0.0625, 0.015625, 1.0)`.

Comment by alain94040 4 days ago

That blog post left me hungry for more. I was expecting Daniel Lemire to provide a SIMD crazy optimized version that shows the default browser implementations are sub-optimal. But it's not in this article. Anyone knows?

Comment by conradfr 4 days ago

I remember in the early days of Phoenix LiveView on an intranet app using http1 I noticed it was faster to base64 encode an image, putting it in an img tag and sending the diff through the Channel websocket than the regular http request through Cowboy.

Comment by skylurk 4 days ago

Huh. How many frames per second could it hit, do you think?

Comment by conradfr 4 days ago

It was for a turn-based game and I didn't benchmark for that, but it was noticeably faster for my use case.

Now that I think of it I should have cached the base64 in ETS to be even faster :)

Comment by tasn 4 days ago

Does anyone know why Firefox/Servo are so slow compared to the rest?

Comment by adzm 4 days ago

A few big things and lots of small things.

Big performance wins recently optimizing some core operations:

https://bugzilla.mozilla.org/show_bug.cgi?id=1994067 https://bugzilla.mozilla.org/show_bug.cgi?id=1995626

which brings it near chrome performance without the new v8 optimizations.

Still more work to do, including avoiding extra copies just like v8, and exploring more simd etc. Generic slow items for toBase64 and fromBase64: https://bugzilla.mozilla.org/show_bug.cgi?id=2003299 https://bugzilla.mozilla.org/show_bug.cgi?id=2003305

extra copying of results: https://bugzilla.mozilla.org/show_bug.cgi?id=2003461 https://bugzilla.mozilla.org/show_bug.cgi?id=1996197

No reason all browsers would not be able to be similar in performance eventually. Pleased this was noticed and being worked on by both v8 and Firefox team

Comment by tasn 4 days ago

Thanks for sharing!

Incredible that FF is even slower than a JS only implementation running in FF.

Comment by jeffbee 4 days ago

Mozilla's "privacy" image prevents them from knowing what their browser actually does in the wild, while Google collects CPU time profiles from user devices, comprehensively, and hammers down the hotspots they find, and that refinement has been going on for many years.

Comment by tasn 4 days ago

Even if true (and I agree with sibling that I don't think that it is), base64 encoding/decoding feels like one of those things you'd have a micro benchmark for regardless. It's also shocking that the gap is so wide, as I feel like people working on such things would start with a fairly optimized v1.

I wonder if this is why Firefox feels so sluggish with some more complex SPAs.

Comment by zenethian 4 days ago

That’s nonsense. Firefox has telemetry built in, it’s just that you can opt out of it. Your answer doesn’t explain why at all but instead just takes a wild guess at what might have happened. You don’t know if this was discovered in Chrome or in some other use of V8. Or maybe it was always fast in Chrome! What a weird non-answer.

Comment by 4 days ago

Comment by danhau 11 days ago

> However, when decoding, we must handle errors and skip spaces.

This had me scratching my head. Why would a base64 decoder need to skip spaces? But indeed, MDN documents this behavior:

> Note that: The whitespace in the space is ignored.

JS never ceases to surprise. Also, check out that typo :D

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

Comment by wvbdmp 4 days ago

Probably so you can put in line breaks? Seems common in base64 data, such as armored PGP keys or emails attachments. HTML attributes allow line breaks, although I haven’t seen it done for base64 images.

Comment by layer8 4 days ago

This might be for compatibility with XML Schema base64Binary, which collapses all whitespace (such as line breaks) to single spaces.

Comment by cluckindan 10 days ago

So technically it’s now possible to hide a payload in somewhat human-readable text, as long as it base64-decodes.

Comment by recursive 4 days ago

Now? There's no change. Also human readable text substantially consists of letters. But that's most of the base64 alphabet too. So this isn't like steganography. All the letters in the human-readable words are valid base64 characters too. The only thing about this is that you get to choose where to put the spaces and newlines. You can't exactly construct arbitrary payloads starting from arbitrary messages.

Comment by sigseg1v 4 days ago

Maybe he means invisible whitespace characters that don't render? I haven't verified this but depending on the definition of whitespace it's possible you can pass a base64 string and insert an arbitrary number of them. When decoded per spec they do nothing so nobody notices them. But if you can pass the base64 string through you can receive or verify the hidden message. Lots of reasons you might want to hide data in plain sight.

Comment by moomoo11 4 days ago

What do you mean hide a payload?

Base64 isn’t obfuscation or encryption.