I Reduced 5 hours of Testing my Agentic AI applcaition to 10 mins

Posted by ashmil 7 hours ago

Counter1Comment1OpenOriginal

Comments

Comment by ashmil 7 hours ago

Hi HN,

I was spending over 5 hours manually testing my Agentic AI application before every patch and release. While automating my API and backend tests was straightforward, testing the actual chat UI was a massive bottleneck. I had to sit there, type out prompts, wait for the AI to respond, read the output, and ask follow-up questions. As the app grew, releases started taking longer just because of manual QA.

To solve this, I built Mantis. It’s an automated UI testing tool designed specifically to evaluate LLM and Agentic AI applications right from the browser.

Here is how it works under the hood:

Define Cases: You define the use cases and specific test cases you want to evaluate for your LLM app.

Browser Automation: A Chrome agent takes control of your application's UI in a tab.

Execution: It simulates a real user by typing the test questions into the chat UI and clicking send.

Evaluation: It waits for the response, analyzes the LLM's output, and can even ask context-aware follow-up questions if the test case requires it.

Reporting: Once a sequence is complete, it moves to the next test case. Everything is logged and aggregated into a dashboard report.

The biggest win for me is that I can now just kick off a test run in a background Chrome tab and get back to writing code while Mantis handles the tedious chat testing.

I’d love to hear your thoughts. How are you all handling end-to-end UI testing for your chat apps and AI agents? Any feedback or questions on the approach are welcome!