Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon
Posted by sanchitmonga22 3 hours ago
Hi HN, we're Sanchit and Shubham (YC W26). We built a fast inference engine for Apple Silicon. LLMs, speech-to-text, text-to-speech – MetalRT beats llama.cpp, Apple's MLX, Ollama, and sherpa-onnx on every modality we tested. Custom Metal shaders, no framework overhead.
Also, we've open-sourced RCLI, the fastest end-to-end voice AI pipeline on Apple Silicon. Mic to spoken response, entirely on-device. No cloud, no API keys.
To get started:
brew tap RunanywhereAI/rcli https://github.com/RunanywhereAI/RCLI.git
brew install rcli
rcli setup # downloads ~1 GB of models
rcli # interactive mode with push-to-talk
Or: curl -fsSL https://raw.githubusercontent.com/RunanywhereAI/RCLI/main/install.sh | bash
The numbers (M4 Max, 64 GB, reproducible via `rcli bench`):LLM decode – 1.67x faster than llama.cpp, 1.19x faster than Apple MLX (same model files): - Qwen3-0.6B: 658 tok/s (vs mlx-lm 552, llama.cpp 295) - Qwen3-4B: 186 tok/s (vs mlx-lm 170, llama.cpp 87) - LFM2.5-1.2B: 570 tok/s (vs mlx-lm 509, llama.cpp 372) - Time-to-first-token: 6.6 ms
STT – 70 seconds of audio transcribed in *101 ms*. That's 714x real-time. 4.6x faster than mlx-whisper.
TTS – 178 ms synthesis. 2.8x faster than mlx-audio and sherpa-onnx.
We built this because demoing on-device AI is easy but shipping it is brutal. Voice is the hardest test: you're chaining STT, LLM, and TTS sequentially, and if any stage is slow, the user feels it. Most teams fall back to cloud APIs not because local models are bad, but because local inference infrastructure is.
The thing that's hard to solve is latency compounding. In a voice pipeline, you're stacking three models in sequence. If each adds 200ms, you're at 600ms before the user hears a word, and that feels broken. You can't optimize one stage and call it done. Every stage needs to be fast, on one device, with no network round-trip to hide behind.
We went straight to Metal. Custom GPU compute shaders, all memory pre-allocated at init (zero allocations during inference), and one unified engine for all three modalities instead of stitching separate runtimes together.
MetalRT is the first engine to handle all three modalities natively on Apple Silicon. Full methodology:
LLM benchmarks: https://www.runanywhere.ai/blog/metalrt-fastest-llm-decode-e...
Speech benchmarks: https://www.runanywhere.ai/blog/metalrt-speech-fastest-stt-t...
How: Most inference engines add layers between you and the GPU: graph schedulers, runtime dispatchers, memory managers. MetalRT skips all of it. Custom Metal compute shaders for quantized matmul, attention, and activation - compiled ahead of time, dispatched directly.
Voice Pipeline optimizations details: https://www.runanywhere.ai/blog/fastvoice-on-device-voice-ai... RAG optimizations: https://www.runanywhere.ai/blog/fastvoice-rag-on-device-retr...
RCLI is the open-source voice pipeline (MIT) built on MetalRT: three concurrent threads with lock-free ring buffers, double-buffered TTS, 38 macOS actions by voice, local RAG (~4 ms over 5K+ chunks), 20 hot-swappable models, and a full-screen TUI with per-op latency readouts. Falls back to llama.cpp when MetalRT isn't installed.
Source: https://github.com/RunanywhereAI/RCLI (MIT)
Demo: https://www.youtube.com/watch?v=eTYwkgNoaKg
What would you build if on-device AI were genuinely as fast as cloud?
Comments
Comment by vessenes 3 hours ago
Quick request: unsloth quants; bit per bit usually better. Or more generally UI for huggingface model selections. I understand you won't be able to serve everything, but I want to mix and match!
Also - grounding:
"open safari" (safari opens, voice says: "I opened safari") "navigate to google.com in safari" (nothing happens, voice says: "I navigated to google.com")
Anyway, really fun.
Comment by Tacite 2 hours ago
Comment by wlesieutre 1 hour ago
Comment by Tacite 1 hour ago
Comment by vessenes 50 minutes ago
Comment by stingraycharles 3 hours ago
How does the RAG fit in, a voice-to-RAG seems a bit random as a feature?
I don’t mean to come across as dismissive, I’m genuinely confused as to what you’re offering.
Comment by glitchc 3 hours ago
Seems pretty clear. You can supply documents to the model as input and then verbally ask questions about them.
Comment by drcongo 3 hours ago
Comment by jonhohle 2 hours ago
Comment by halostatue 44 minutes ago
{macports.halostatue.ca:austin @halostatue}
I maintain https://github.com/macports/macports-ports/blob/master/sysut... amongst other things regularly.Comment by AmanSwar 2 hours ago
Comment by rushingcreek 1 hour ago
Either way, this is a tremendous achievement and it's extremely relevant in the OpenClaw world where I might not want to have sensitive information leave my computer.
Comment by tiku 2 hours ago
Comment by RationPhantoms 1 hour ago
Comment by shubham2802 55 minutes ago
Comment by DetroitThrow 3 hours ago
Before I install, is there any telemetry enabled here or is this entirely local by default?
Comment by shubham2802 3 hours ago
Comment by bigyabai 2 hours ago
Comment by alfanick 3 hours ago
Comment by coder543 2 hours ago
What...? It is terrible, even compared to Whisper Tiny, which was released years ago under an Apache 2.0 license so Apple could have adopted it instantly and integrated it into their devices. The bigger Whisper models are far better, and Parakeet TDT V2 (English) / V3 (Multilingual) are quite impressive and very fast.
I have no idea what would make someone say that iOS dictation is good at understanding speech... it is so bad.
For a company that talks so much about accessibility, it is baffling to me that Apple continues to ship such poor quality speech to text with their devices.
Comment by derefr 2 hours ago
Comment by fragmede 52 minutes ago
Comment by coder543 50 minutes ago
Maybe you just don’t know what you’re missing? Google’s default speech to text is still bad compared to Whisper and Parakeet, but even Google’s is markedly better than Apple’s.
I cannot think of a single speech to text system that I’ve run into in the past 5 years that is less accurate than the one Apple ships.
Sure, Apple’s speech to text is incredible compared to what was on the flip phone I had 20 years ago. Terrible is relative. Much better options exist today, and they’re under very permissive licenses. Apple’s refusal to offer a better, more accessible experience to their users is frustrating when they wouldn’t even have to pay a licensing fee to ship something better. Whisper was released under a permissive license nearly 4 years ago.
Apple also restricts third party keyboards to an absurdly tiny amount of memory, so it isn’t even possible to ship a third party keyboard that provides more accurate on-device speech to text without janky workarounds (requiring the user to open the keyboard's own app first each time).
Comment by CamJN 52 seconds ago
Comment by swindmill 3 hours ago
Comment by fragmede 55 minutes ago
Umm, ah, wait no, uhh yes you are. Unless, hang on, you are possessed with greater umm speech capabilities than most, wait nevermind start over. Unless you never make a mistake while talking, you want AI to take out the "three, wait no four" and just leave the output with "four" from what you actually spoke. Depending on your use case.
Comment by computerex 2 hours ago
Comment by jawns 1 hour ago
Comment by AmanSwar 54 minutes ago
Comment by shubham2802 51 minutes ago
Comment by Tacite 3 hours ago
Comment by focusgroup0 2 hours ago
Comment by tristor 3 hours ago
I think this has to be the future for AI tools to really be truly useful. The things that are truly powerful are not general purpose models that have to run in the cloud, but specialized models that can run locally and on constrained hardware, so they can be embedded.
I'd love to see this able to be added in-path as an audio passthrough device so you can add on-device native transcriptioning into any application that does audio, such as in video conferencing applications.
Comment by j45 2 hours ago
Comment by Tacite 1 hour ago
Comment by shubham2802 20 minutes ago
Comment by john_strinlai 2 hours ago
they are a company that registers domains similar to their main one, and then uses those domains to spam people they scrape off of github without affecting their main domain reputation.
edit: here is the post https://news.ycombinator.com/item?id=47163885
----
edit2: it appears that RunAnywhere is getting damage-control help by dang or tom.
this comment, at this time, has 23 upvotes yet is below 2 grey comments (i.e. <=0 upvotes) that were posted at roughly the same time (1 before, 1 after) -- strong evidence of artificial ordering by the moderators. gross.
Comment by Imustaskforhelp 2 hours ago
Maybe its just (n=2) that only we both remember this fiasco but I don't agree with that. I don't really understand how this got so so many upvotes in short frame of time especially given its history of not doing good things to say the very least... I am especially skeptical of it.
Thoughts?
Edit: I looked deeper into Sanchit's Hackernews id to find 3 days ago they posted the same thing as far as I can tell (the difference only being that it had runanywhere.ai domain than github.com/runanywhere but this can very well be because in hackernews you can't have two same links in small period of time so they are definitely skirting that law by pasting github link)
Another point, that post (https://news.ycombinator.com/item?id=47283498) got stuck at 5 points till right now (at time of writing)
So this got a lot more crazier now which is actually wild.
Comment by john_strinlai 2 hours ago
what i do know is that their name is etched into my mind under the category of "shady, never do business with them".
Comment by Imustaskforhelp 2 hours ago
I was writing the comment at time of 18 upvotes and then it went to 24 upvote all of a sudden that I had gone suspicious.
see at 2026-03-10T17:38-39:00Z timeframe within this particular graph(0)
Comment by pzo 1 hour ago
Not sure why they decided to reinvent the wheel and write yet another ML engine (MetalRT) which is proprietary. I would most likely bet on CoreML since it have support for ANE (apple NPU) or MLX.
Other popular repos for such tasks I would recommend:
https://github.com/FluidInference/FluidAudio
https://github.com/DePasqualeOrg/mlx-swift-audio
Comment by shubham2802 57 minutes ago
Comment by antipaul 1 hour ago
What about for on-device RAG use cases?
Comment by AmanSwar 58 minutes ago
Comment by david_shaw 2 hours ago
Comment by Imustaskforhelp 2 hours ago
Edit: just reloaded, its fixed now.
Comment by dang 26 minutes ago
Comment by Imustaskforhelp 2 hours ago
I was curious so I did some more research within the company to find more shady stuff going on like intentionally buying new domains a month prior to send that spam to not have the mail reputation of their website down. You can read my comment here[2]
Just to be on the safe side here, @dang (yes pinging doesn't work but still), can you give us some average stats of who are the people who upvoted this and an internal investigation if botting was done. I can be wrong about it and I don't ever mean to harm any company but I can't in good faith understand this. Some stats
Some stats I would want are: Average Karma/Words written/Date of the accounts who upvoted this post. I'd also like to know what the conclusion of internal investigation (might be) if one takes place.
[There is a bit of conflicts of interest with this being a YC product but I think that I trust hackernews moderator and dang to do what's right yeah]
I am just skeptical, that's all, and this is my opinion. I just want to provide some historical context into this company and I hope that I am not extrapolating too much.
It's just really strange to me, that's all.
[0]: https://news.social-protocols.org/stats?id=47326101 (see the expected upvotes vs real upvotes and the context of this app and negative reception and everything combined)
[1]: Tell HN: YC companies scrape GitHub activity, send spam emails to users: https://news.ycombinator.com/item?id=47163885
Comment by dang 2 hours ago
In other words, your perception wasn't wrong, but the interpretation was off. I've put "Launch HN" and "YC W26" back in the title to make that clearer - I edited them out earlier, which was my mistake.
As for the booster comments, those are pretty common on launch threads and often pretty innocent - most people who aren't active HN users have no idea that it's against the rules. We do our best to communicate about that, but it's not a cardinal sin—there are far worse offenses.
Comment by john_strinlai 2 hours ago
https://news.ycombinator.com/item?id=47326953 is grey (i.e <=0 karma). my top-level comment is at 14 karma. we posted within 15 minutes of each other. their comment is higher up the page. ive never seen something like that before.
the two posts calling out unethical behavior have been living at the bottom of this post the entire time, until a couple of actually [flagged] comments ended up under them.
i do not care about the karma itself, at all. but i do care to know if launch/show posts have comment sections with cherry-picked ordering or organic ordering.
edit 2: i am at 19 points, and now below two grey (<=0 karma) comments (https://news.ycombinator.com/item?id=47326455). whats up dang?
edit 3 (~1 hour later): you've responded to a handful of other comments and ignored this one as it becomes more and more evident that someone has artificially ordered the comments to ensure that critical comments are at the bottom of the page. it has shattered my perception of show/launch posts to know that you manually curate the comments to form a specific narrative. i really (naively) thought you guys were much more neutral about that sort of thing.
Comment by dang 24 minutes ago
I hadn't seen this until 30 seconds ago. The assumption of moderator omniscience leads to a lot of mistaken conclusions!
Sure, we marked the offtopic comments offtopic, which lowers them on the page. This is standard HN moderation. If we didn't do this, then nearly every thread would be choked with something offtopic at the top.
At the same time, we haven't killed the posts or put them in a "stub for offtopicness" [1] like we otherwise would. They're still here for people who want to read them, while at the same time the main discussion can be about the main topic, which is the startup launch.
HN is actively moderated and always has been. Downweighting offtopic/generic comments is one of the biggest things we've ever discovered for improving the quality of the threads. For us it's about the quality of the site as a whole, not specific narratives, but of course everyone can (and will) make up their own mind about this. What I can tell you is (a) the way we do these things has been stable for a long time (HN time is measured in decades, not years), and (b) we're always willing to answer questions about it.
Oh, and (3) - when YC or a YC-funded startup is part of a story, then we moderate less than we otherwise would [2]. We do still moderate, though—we just do it less.
[1] https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
[2] https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...
Comment by Imustaskforhelp 5 seconds ago
But if I may ask, doesn't the policy of moderate less not more your (3) point opposite to what you said about offtopic from how I perceive it?
> Sure, we marked the offtopic comments offtopic, which lowers them on the page. This is standard HN moderation. If we didn't do this, then nearly every thread would be choked with something offtopic at the top.
>Oh, and (3) - when YC or a YC-funded startup is part of a story, then we moderate less than we otherwise would [2]. We do still moderate, though—we just do it less.
I would suggest that the minor disagreements that we have is because these two points seem contradictory to me from how I perceive it. I would suggest (if possible) to moderate less as you mention not more and let the order of ranking be natural which in this case might be that john's comments can come at the first place for example. Because you are moderating it by putting it into downweighting it and that's one of the concerns that we sort of have.
Thoughts?
Comment by john_strinlai 18 minutes ago
especially when that company wants you to curl | bash their code onto your machine -- potential users deserve to know that despite being a YC-backed company (which would typically be a positive indicator, people may reduce their scrutiny) that they have been caught scraping data they shouldnt be, and then using that data for marketing, and refuse to respond to anyone who bring it up.
but it is your world and i am just living in it, so i will carry on. i appreciate that you did not collapse them.
Comment by Imustaskforhelp 1 hour ago
Comment by Imustaskforhelp 2 hours ago
Clearly I am not the only one here as john_strinlai here seems to have had somewhat of the same conclusion as me.
Dang I know you care about this community so can you please talk more what you think about this in particular as well.
I understand that YC companies get preferential treatment, Fine by me. But this feels something larger to me
I have written everything that I could find in this thread from the same post being shown here 3 days ago in anywhere.ai link to now changing to github to skirt off HN rule that same link can't be posted in short period of time and everything.
This feels somewhat intentional just like the spam issue, I hope you understand what I mean.
(If you also feel suspicious, Can you then do a basic analysis/investigiation with all of these suspicious points in mind and everything please as well and upload the results in an anonymous way if possible?)
I wish you to have a nice day and waiting for your thoughts on all of this.
Comment by dang 14 minutes ago
If https://news.ycombinator.com/item?id=47327129 and https://news.ycombinator.com/item?id=47328465 don't answer your questions, can you maybe try picking the most important question and making it as specific as you can? Then I can take a crack at that and we can go from there.
Comment by dsalzman 2 hours ago
Comment by josuediaz 2 hours ago
Comment by john_strinlai 2 hours ago
iharnoor 1 karma, 1 comment, in this thread.
two posts pointing out their extremely unethical spam behavior both shot down to the very bottom of the post. apparently suspicious voting behavior.
what the hell is going on?
Comment by Imustaskforhelp 2 hours ago
I was gonna comment about this guy and iharnoor which is 7 month old account who literally only said "lets go" here
This sort of makes me even more suspicious john especially iharnoor
I wasn't responding because I was making archive link of all of this so that even messages deleted can have some basis of confirmation.
Comment by iharnoor 2 hours ago
Comment by Imustaskforhelp 2 hours ago
And sorry to say but I don't think that Lets go!! is a valid comment, this makes me even more suspicious.
Especially given the history and suspicions I already had.