Show HN: TheAuditor v2.0 – A “Flight Computer” for AI Coding Agents
Posted by ThailandJohn 18 hours ago
I’m a former Systems Architect (Cisco/VMware) turned builder in Thailand. TheAuditor v2.0 is a complete architectural rewrite (800+ commits) of the prototype I posted three months ago.
The "A-ha" moment for me didn't come from a success; it came from a massive failure. I was trying to use AI to refactor a complex schema change (a foundation change from "Products" to "ProductsVariants"), and due to the scope of it, it failed spectacularly. I realized two things:
* Context Collapse: The AI couldn't keep enough files in its context window to understand the full scope of the refactor, so it started hallucinating, "fixing" superficial issues. If I kept pressing it, it would literally panic and make up problems "so it could fix them," which only resulted in the situation going into a death spiral. That’s the villain origin story of this tool. :D * Stale Knowledge: It kept trying to implement Node 16 patterns in a Node 22 project, or defaulting to obsolete libraries (like glob v7 instead of v11) because its training data was stale.
I realized that AI agents are phenomenal at outputting working code, but they have zero understanding of it. They optimize for "making it run at any cost"—often by introducing security holes or technical debt just to bypass an error. This is a funny paradox because when "cornered/forced" to use cutting-edge versions, syntax, and best practices, it has zero issue executing or coding it. However, it’s so hilariously unaware of its surroundings that it will do anything else unless explicitly babysat.
I built v2 to be the "Sanity Check" that solves a lot of these issues, and it aims to continue solving more of the same and similar issues I face. Instead of letting the AI guess, TheAuditor indexes the entire codebase into a local SQLite Graph Database. This gives the AI a queryable map of reality, allowing it to verify dependencies and imports without needing to load "all" files into context.
A/B Demo: https://www.youtube.com/watch?v=512uqMaZlTg As seen in the demo video, instead of trying to read 10+ full files and/or grepping to make up for the hallucinations, it can now run "aud explain" and get 500 lines of deterministic "facts only" information. It gets just what it needs to see versus reading 10+ files, trying to keep them in context, finding what it was looking for, and trying to remember why it was looking to begin with.
I also learned that regex/string/heuristics don't scale at all and are painfully slow (hours vs minutes). I tried the regex-based rules/parsers approach, but they kept failing silently on complex files and suffered constant limitations (the worst offender was having to read all files per set of rules). I scrapped that approach and built a "Triple-Entry Fidelity" system. Now, the tool acts like a ledger: the parser emits a manifest, the DB emits a receipt. If they don't match, the system crashes intentionally.
It’s no longer just a scanner; it’s a guardrail. In my daily workflow, I don't let the AI write a line of code until the AI (my choice just happens to be CC/Codex) has run a pre-investigation for whatever problem statement I'm facing at the moment. This ensures it's anchored in facts and not inference assumptions or, worse, hallucinations.
With that said, my tool isn't perfect. To support it all, I had to build a pseudo-compiler for Python/JS/TS, and that means preparing extractors for every framework, every syntax—everything, really. Sometimes I don't get it right, and sometimes I just won't have had enough time to build it out to support everything.
So, my recommendation is to integrate the tool WITH your AI agent of choice rather than seeing it as a tool for you, the human. I like to use the tool as a "confirm or deny," where the AI runs the tool, verifies in source code, and presents a pre-implementation audit. Based on that audit, I will create an "aud planning."
Some of the major milestones in v2.0
* Hybrid Taint: I extended the Oracle Labs IFDS research to track data flow across microservice boundaries (e.g., React fetch → Express middleware → Controller).
* Triple-Entry Fidelity: This works across every layer (Indexer -> Extractor -> Parser -> Storage). Every step has fidelity checks working in unison. If there is silent data loss anywhere in the pipeline, the tool crashes intentionally.
* Graph DB: Moved from file-based parsing to a SQLite Graph Database to handle complex relationships that regex missed.
* Scope: Added support for Rust, Go, Bash, AWS CDK, and Terraform (v1 was Python/JS only).
* Agent Capabilities: Added Planning and Refactor engines, allowing AI agents to not just scan code but safely plan and execute architectural changes
Comments
Comment by jbellis 5 hours ago
Quick comparison: Auditor does framework-specific stuff that Brokk does not, but Brokk is significantly faster (~1M loc per minute).
Comment by ThailandJohn 3 hours ago
My speed really depends on language and what needs indexing. On pure Python projects I get around 220k loc/min, but for deeper data flow in Node apps (TypeScript compiler overhead + framework extraction) it's roughly 50k loc/min.
Curious what your stack is and what depth you're extracting to reach 1M/min - those are seriously impressive numbers! :D
Comment by digdugdirk 6 hours ago
Comment by ThailandJohn 4 hours ago
Comment by esafak 5 hours ago
Comment by doganugurlu 5 hours ago
Did you consider using treesitter instead of the pseudo compiler?
Comment by ThailandJohn 4 hours ago
It "starts with symbols", you get the basic starter kit but then quickly it became "this proves it exists" but "not what it does". Which meant taint couldn't work properly because you want to track assigments, function call arguments etc to see how the data actually flows. Same thing with the rules engine. Without tracking object literals? xss detection becomes very shallow with tons of false positives because treesitter wont be able to tell you property assigments or call methods.
And it feels like it keeps going like that for infinity with various aspects and things I wanted know and track. So all in all? Moving away from treesitter and taking on the "mountain" allowed me (after losing weeks of sanity lol) to incrementally build out virtually anything i wanted to extract or check....It does sadly leave some "money on the table" for other languages, take rust as an example? Due to treesitter the taint engine is limited to no cross module resolution and type checking. So that's why :)