Show HN: We put voice agent on our website, learned retrieval isn't bottleneck
Posted by srimalireddi 2 days ago
Comments
Comment by anantm 2 days ago
Comment by srimalireddi 2 days ago
Comment by philosopherr 1 day ago
Comment by srimalireddi 1 day ago
STT -> Ambient Retrieval(Moss) -> LLM [+ Tool calls -> On-Demand Retrieval(Moss)] -> TTS
Now STT, TTS and LLM output generation are fixed cost and independent of data scales. In reality, a typical landing page and public-facing website content will range from 100's of docs (for startups) to 100K's of docs (for enterprises).
Moss's retrieval stack runs sub-10 ms with the following internal benchmarks -
- P99 of ~5.4 ms for 100K docs in a shared container
- P99 of ~4 ms for 1M docs in a dedicated VM
our R&D team is cranking it to 200M+ docs with sub-10ms promise but sky is the limit for our scale.
Comment by kowshikchills 2 days ago
Comment by srimalireddi 2 days ago
Comment by ipotapov 1 day ago
Comment by killamdiaz 2 days ago