Reverse-engineering the RK3588 NPU: Hacking limits to run vision transformers

Posted by rcarmo 8 hours ago

Counter37Comment6OpenOriginal

Comments

Comment by Neywiny 6 hours ago

This is good work. I would say that there was very little reverse engineering but that's fine. It's interesting seeing some companies look at ARM's Ethos line as holding them back and others as it pulling them forward. I'm not sure if ARM is the best solution, but all these different NPUs feels a bit like the early CPU architecture and compiler days. Hopefully we can make it through unscathed so at least we get better error messages or maybe even compilers that know those kinds of idiosyncracies enough to avoid such things.

Comment by kvuj 6 hours ago

Awesome! Finally putting back "Hacker" in "Hacker News".

Comment by PunchyHamster 5 hours ago

we need RISC-V equivalent but for NPUs, it's become a royal mess last few years

Comment by Neywiny 4 hours ago

It's starting. Some designs are moving towards very wide vector length (1k maybe even 2k?) RV-V cores. So less a giant matrix multiplication unit (I think TI has some parts with what they literally call MMUs, great work guys), more a bunch of DSP heavy CPUs. In the age of x86 splitting on AVX-512, it's interesting.

Comment by jauntywundrkind 6 hours ago

Epic hacker work!

For what it's worth, it seems like there's a bunch of open source NPU work in progress too. There's a layer "TEFLON" for Gallium3D shared by most of these drivers, that TensorFlow can use. Then hardware drivers for Rockchip (via ROCKET driver), and Vivante (with their Etnaviv drivers). It'd be extra interesting now to see how (or if?) they've dealt with the system constraints (small scratchpad size) here. https://www.phoronix.com/news/Gallium3D-Teflon-Merged https://www.phoronix.com/news/Rockchip-NPU-Linux-Mesa https://www.phoronix.com/news/Two-NPU-Accel-Drivers-2026

Comment by doctorpangloss 5 hours ago

hacker news needs a reprieve from "Problem. The fix? Vibe coding session. Here's the ChatGPT report"