Show HN: TheAuditor v2.0 – A “Flight Computer” for AI Coding Agents

Posted by ThailandJohn 18 hours ago

I’m a former Systems Architect (Cisco/VMware) turned builder in Thailand. TheAuditor v2.0 is a complete architectural rewrite (800+ commits) of the prototype I posted three months ago.

The "A-ha" moment for me didn't come from a success; it came from a massive failure. I was trying to use AI to refactor a complex schema change (a foundation change from "Products" to "ProductsVariants"), and due to the scope of it, it failed spectacularly. I realized two things:

* Context Collapse: The AI couldn't keep enough files in its context window to understand the full scope of the refactor, so it started hallucinating, "fixing" superficial issues. If I kept pressing it, it would literally panic and make up problems "so it could fix them," which only resulted in the situation going into a death spiral. That’s the villain origin story of this tool. :D * Stale Knowledge: It kept trying to implement Node 16 patterns in a Node 22 project, or defaulting to obsolete libraries (like glob v7 instead of v11) because its training data was stale.

I realized that AI agents are phenomenal at outputting working code, but they have zero understanding of it. They optimize for "making it run at any cost"—often by introducing security holes or technical debt just to bypass an error. This is a funny paradox because when "cornered/forced" to use cutting-edge versions, syntax, and best practices, it has zero issue executing or coding it. However, it’s so hilariously unaware of its surroundings that it will do anything else unless explicitly babysat.

I built v2 to be the "Sanity Check" that solves a lot of these issues, and it aims to continue solving more of the same and similar issues I face. Instead of letting the AI guess, TheAuditor indexes the entire codebase into a local SQLite Graph Database. This gives the AI a queryable map of reality, allowing it to verify dependencies and imports without needing to load "all" files into context.

A/B Demo: https://www.youtube.com/watch?v=512uqMaZlTg As seen in the demo video, instead of trying to read 10+ full files and/or grepping to make up for the hallucinations, it can now run "aud explain" and get 500 lines of deterministic "facts only" information. It gets just what it needs to see versus reading 10+ files, trying to keep them in context, finding what it was looking for, and trying to remember why it was looking to begin with.

I also learned that regex/string/heuristics don't scale at all and are painfully slow (hours vs minutes). I tried the regex-based rules/parsers approach, but they kept failing silently on complex files and suffered constant limitations (the worst offender was having to read all files per set of rules). I scrapped that approach and built a "Triple-Entry Fidelity" system. Now, the tool acts like a ledger: the parser emits a manifest, the DB emits a receipt. If they don't match, the system crashes intentionally.

It’s no longer just a scanner; it’s a guardrail. In my daily workflow, I don't let the AI write a line of code until the AI (my choice just happens to be CC/Codex) has run a pre-investigation for whatever problem statement I'm facing at the moment. This ensures it's anchored in facts and not inference assumptions or, worse, hallucinations.

With that said, my tool isn't perfect. To support it all, I had to build a pseudo-compiler for Python/JS/TS, and that means preparing extractors for every framework, every syntax—everything, really. Sometimes I don't get it right, and sometimes I just won't have had enough time to build it out to support everything.

So, my recommendation is to integrate the tool WITH your AI agent of choice rather than seeing it as a tool for you, the human. I like to use the tool as a "confirm or deny," where the AI runs the tool, verifies in source code, and presents a pre-implementation audit. Based on that audit, I will create an "aud planning."

Some of the major milestones in v2.0

* Hybrid Taint: I extended the Oracle Labs IFDS research to track data flow across microservice boundaries (e.g., React fetch → Express middleware → Controller).

* Triple-Entry Fidelity: This works across every layer (Indexer -> Extractor -> Parser -> Storage). Every step has fidelity checks working in unison. If there is silent data loss anywhere in the pipeline, the tool crashes intentionally.

* Graph DB: Moved from file-based parsing to a SQLite Graph Database to handle complex relationships that regex missed.

* Scope: Added support for Rust, Go, Bash, AWS CDK, and Terraform (v1 was Python/JS only).

* Agent Capabilities: Added Planning and Refactor engines, allowing AI agents to not just scan code but safely plan and execute architectural changes

Comments

Comment by jbellis 5 hours ago

Love to see people leveraging static analysis for AI agents. Similar to what we're doing in Brokk but we're more tightly coupled to our own harness. (https://brokk.ai/) Would love to compare notes; if you're interested, hmu at [username]@brokk.ai.

Quick comparison: Auditor does framework-specific stuff that Brokk does not, but Brokk is significantly faster (~1M loc per minute).

Comment by ThailandJohn 3 hours ago

Would be really cool to compare notes :D Sent from a "non tech" company email so it doesn't get filtered lol.

My speed really depends on language and what needs indexing. On pure Python projects I get around 220k loc/min, but for deeper data flow in Node apps (TypeScript compiler overhead + framework extraction) it's roughly 50k loc/min.

Curious what your stack is and what depth you're extracting to reach 1M/min - those are seriously impressive numbers! :D

Comment by digdugdirk 6 hours ago

Cool! I've been playing with the same code -> graph concept for LLM work. Why did you decide to go for a pseudo-compiler with a ton of custom rules rather than try to interact with the AST itself?

Comment by ThailandJohn 4 hours ago

Hi! Limitations of tree sitter, its insanely fast, easy to use but hits a limit on syntax/nodes only. Typescript compiler provides semantic with full type checking and cross module resolution. Its a small nightmare as I have to write every extraction and parser for it (why i call it "pseudo compiler"). Its a necessity to gain full call chain provenance across callee/caller, framework and validations, which is a "hard" requirement for the taint analysis to work. If you want to get down into code for it? The top layer is ast_parser.py which routes a few places but taking js/ts as an example? look at data_flow.ts / javascript.py which shows the ast/extraction/analyzing layers to capture and make sense of it in the database. :)

Comment by esafak 5 hours ago

Lots of formal methods and verification submissions this week!

Comment by doganugurlu 5 hours ago

Great idea!

Did you consider using treesitter instead of the pseudo compiler?

Comment by ThailandJohn 4 hours ago

Hey! Yes I did. I started with treesitter tbh. And for go, rust, bash and hcl? I still do. In my naive beginnings, i really had no idea how complex things "were supposed to be", so i was never really deterred for it and kept building it piece by piece and very quickly? (Because I wanted "everything"). I hit hard limitations with treesitter, not only for "taint resolution" but overall what I could check, what I could do...

It "starts with symbols", you get the basic starter kit but then quickly it became "this proves it exists" but "not what it does". Which meant taint couldn't work properly because you want to track assigments, function call arguments etc to see how the data actually flows. Same thing with the rules engine. Without tracking object literals? xss detection becomes very shallow with tons of false positives because treesitter wont be able to tell you property assigments or call methods.

And it feels like it keeps going like that for infinity with various aspects and things I wanted know and track. So all in all? Moving away from treesitter and taking on the "mountain" allowed me (after losing weeks of sanity lol) to incrementally build out virtually anything i wanted to extract or check....It does sadly leave some "money on the table" for other languages, take rust as an example? Due to treesitter the taint engine is limited to no cross module resolution and type checking. So that's why :)

Comment by 18 hours ago