I built a programming language using Claude Code
Posted by GeneralMaximus 8 hours ago
Comments
Comment by jc-myths 12 minutes ago
One thing I'd add: even with good specs, the agent still cuts corners in ways that are hard to catch. It'll implement a feature but quietly add a fallback that returns mock data when the real path fails. Your app looks like it works. It doesn't. You find out in production.
Or it'll say "done" and what it did was add a placeholder component with a TODO. So now I have trust issues and I review everything, which kind of defeats the "walk away from the computer" part.
The "just one more prompt" loop is so true lol.
Comment by andsoitis 8 hours ago
Impressive. As a practical matter, one wonders what th point would be in creating a new programming languages if the programmer no longer has to write or read code.
Programming languages are after all the interface that a human uses to give instructions to a computer. If you’re not writing or reading it, the language, by definition doesn’t matter.
Comment by marssaxman 7 hours ago
There may actually be more value in creating specialized languages now, not less. Most new languages historically go nowhere because convincing human programmers to spend the time it would take to learn them is difficult, but every AI coding bot will learn your new language as a matter of course after its next update includes the contents of your website.
Comment by raincole 7 hours ago
If there are millions of lines on github in your language.
Otherwise the 'teaching AI to write your language' part will occupy so much context and make it far less efficient that just using typescript.
Comment by Maxatar 4 hours ago
The vast majority of tokens are not used for documentation or reference material but rather are for reasoning/thinking. Unless you somehow design a programming language that is just so drastically different than anything that currently exists, you can safely bet that LLMs will pick them up with relative ease.
Comment by joshstrange 3 hours ago
You can do it today if you are willing to pay (API or on top of your subscription) [0]
> The 1M context window is currently in beta. Features, pricing, and availability may change.
> Extended context is available for:
> API and pay-as-you-go users: full access to 1M context
> Pro, Max, Teams, and Enterprise subscribers: available with extra usage enabled
> Selecting a 1M model does not immediately change billing. Your session uses standard rates until it exceeds 200K tokens of context. Beyond 200K tokens, requests are charged at long-context pricing with dedicated rate limits. For subscribers, tokens beyond 200K are billed as extra usage rather than through the subscription.
[0] https://code.claude.com/docs/en/model-config#extended-contex...
Comment by rebolek 4 hours ago
Comment by calvinmorrison 6 hours ago
Comment by jonfw 4 hours ago
Comment by calvinmorrison 3 hours ago
Comment by vrighter 6 hours ago
Comment by UncleOxidant 7 hours ago
That's assuming that your new, very unknown language gets slurped up in the next training session which seems unlikely. Couldn't you use RAG or have an LLM read the docs for your language?
Comment by clickety_clack 7 hours ago
Comment by almog 6 hours ago
Comment by fcatalan 5 hours ago
Comment by marssaxman 5 hours ago
Comment by danielvaughn 7 hours ago
Comment by Insanity 6 hours ago
There are languages that are already pretty sparse with keywords. e.g in Go you can write 'func main() string', no need to define that it's public, or static etc. So combining a less verbose language with 'codegolfing' the variables might be enough.
Comment by coderenegade 2 hours ago
Comment by danielvaughn 5 hours ago
Comment by gf000 6 hours ago
Comment by giancarlostoro 5 hours ago
Comment by gf000 5 hours ago
Comment by giancarlostoro 5 hours ago
Comment by gf000 4 hours ago
Code readability is another, correlating one, but this is more subjective. To me go scores pretty low here - code flow would be readable were it not for the huge amount of noise you get from error "handling" (it is mostly just syntactic ceremony, often failing to properly handle the error case, and people are desensitized to these blocks so code review are more likely to miss these).
For function signatures, they made it terser - in my subjective opinion - at the expense of readability. There were two very mainstream schools of thought with relation to type signature syntax, `type ident` and `ident : type`. Go opted for a third one that is unfamiliar to both bases, while not even having the benefits of the second syntax (e.g. easy type syntax, subjective but that : helps the eye "pattern match" these expressions).
Comment by giancarlostoro 3 hours ago
Comment by thunky 2 hours ago
Comment by politician 2 hours ago
Comment by Insanity 5 hours ago
Comment by gf000 5 hours ago
In go every third line is a noisy if err check.
Comment by LtWorf 6 hours ago
Comment by nineteen999 4 hours ago
Claude seems more consistently _concise_ to me, both in web and cli versions. But who knows, after 12 months of stuff it could be me who is hallucinating...
Comment by idiotsecant 6 hours ago
Comment by thomasmg 6 hours ago
Comment by quotemstr 6 hours ago
Programming languages function in large parts as inductive biases for humans. They expose certain domain symmetries and guide the programmer towards certain patterns. They do the same for LLMs, but with current AI tech, unless you're standing up your own RL pipeline, you're not going to be able to get it to grok your new language as well as an existing one. Your chances are better asking it to understand a library.
Comment by imiric 6 hours ago
How will it "learn" anything if the only available training data is on a single website?
LLMs struggle with following instructions when their training set is massive. The idea that they will be able to produce working software from just a language spec and a few examples is delusional. It's a fundamental misunderstanding of how these tools work. They don't understand anything. They generate patterns based on probabilities and fine tuning. Without massive amounts of data to skew the output towards a potentially correct result they're not much more useful than a lookup table.
Comment by Zak 6 hours ago
I'm using Claude Code to work on something involving a declarative UI DSL that wraps a very imperative API. Its first pass at adding a new component required imperative management of that component's state. Without that implementation in context, I told Claude the imperative pattern "sucks" and asked for an improvement just to see how far that would get me.
A human developer familiar with the codebase would easily understand the problem and add some basic state management to the DSL's support for that component. I won't pretend Claude understood, but it matched the pattern and generated the result I wanted.
This does suggest to me that a language spec and a handful of samples is enough to get it to produce useful results.
Comment by dmd 5 hours ago
I have done exactly the above with great success. I work with a weird proprietary esolang sometimes that I like, and the only documentation - or code - that exists for it is on my computer. I load that documentation in, and it works just fine and writes pretty decent code in my esolang.
"But that can't possibly work [based on my misunderstanding of how LLMs work]!" you say.
Well, it does, so clearly you misunderstand how they work.
Comment by ModernMech 5 hours ago
Probably if you’re trying to be esoteric and arcane then yeah, you might have trouble, but that’s not normally how languages evolve.
Comment by dmd 5 hours ago
Comment by wizzwizz4 4 hours ago
Comment by dmd 4 hours ago
Comment by imiric 4 hours ago
The impact that lack of training data has on the quality of the results is easily observable. Try getting them to maintain a Python codebase vs. e.g. an Elixir one. Not just generate short snippets of code, but actually assist in maintaining it. You'll constantly run into basic issues like invalid syntax, missing references, use of nonexistent APIs, etc., not to mention more functional problems like dead, useless, or unnecessarily complicated code. I run into these things with mainstream languages (Go, Python, Clojure), so I don't see how an esolang could possibly fair any better.
But then again, the definitions of "just fine" and "decent" are subjective, and these tools are inherently unreliable, which is where I suspect the large disconnect in our experiences comes from.
Comment by voxleone 6 hours ago
Roughly: machine code --> assembly --> C --> high-level languages --> frameworks --> visual tools --> LLM-assisted coding. Most of those transitions were controversial at the time, but in retrospect they mostly expanded the toolbox rather than replacing the lower layers.
One workflow I’ve found useful with LLMs is to treat them more like a code generator after the design phase. I first define the constraints, objects, actors, and flows of the system, then use structured prompts to generate or refine pieces of the implementation.
Comment by abraxas 6 hours ago
I'm being slightly facetious of course, I still use sequence diagrams and find them useful. The rest of its legacy though, not so much.
Comment by spelunker 7 hours ago
Comment by Fnoord 1 hour ago
Comment by idiotsecant 6 hours ago
Comment by tartoran 5 hours ago
Comment by idiotsecant 3 hours ago
Comment by _aavaa_ 7 hours ago
Comment by phn 7 hours ago
On a different but related note, it's almost the same as pairing django or rails with an LLM. The framework allows you to trust that things like authentication and a passable code organization are being correctly handled.
Comment by jetbalsa 7 hours ago
Comment by onlyrealcuzzo 7 hours ago
I'm working on a language as well (hoping to debut by end of month), but the premise of the language is that it's designed like so:
1) It maximizes local reasoning and minimizes global complexity
2) It makes the vast majority of bugs / illegal states impossible to represent
3) It makes writing correct, concurrent code as maximally expressive as possible (where LLMs excel)
4) It maximizes optionality for performance increases (it's always just flipping option switches - mostly at the class and function input level, occassionaly at the instruction level)
The idea is that it should be as easy as possible for an LLM to write it (especially convert other languages to), and as easy as possible for you to understand it, while being almost as fast as absolutely perfect C code, and by virtue of the design of the language - at the human review phase you have minimal concerns of hidden gotcha bugs.
Comment by idiotsecant 6 hours ago
Comment by onlyrealcuzzo 5 hours ago
My language is a step ahead of Rust, but not as strict as Ada, while being easier to read than Swift (especially where concurrency is involved).
Comment by gf000 6 hours ago
Comment by johnfn 8 hours ago
By what definition? It still matters if I write my app in Rust vs say Python because the Rust version still have better performance characteristics.
Comment by koolala 7 hours ago
Comment by johnbender 8 hours ago
Comment by andyfilms1 7 hours ago
Comment by entropie 7 hours ago
Comment by eatsyourtacos 5 hours ago
So yeah for some things we are already at the point of "I am not longer the coder, I am the architect".. and it's scary.
Comment by nineteen999 4 hours ago
Comment by gopalv 5 hours ago
That is the part of the post that stuck with me, because I've also picked up impossible challenges and tried to get Claude to dig me out of a mess without giving up from very vague instructions[1].
The effect feels like the Loss-Disguised-As-Win feeling of the video-games I used to work on at Zynga.
Sure it made a mistake, but it is right there, you could go again.
Pull the lever, doesn't matter if the kids have Karate at 8 AM.
Comment by asciimov 5 hours ago
It’s missing all the heart, the soul, of deciding and trading off options to get something to work just for you. It’s like you bought a rat bike from your local junkyard and are trying to pass it off as your own handmade cafe racer.
Comment by fcatalan 3 hours ago
Also you decide how much in control you are. Want to provide a hand made grammar? go ahead, want the agent to come up with it just from chatting and pointing it to other languages, ok too. Want to program just the first arithmetic operator yourself and then save the tedium of typing all the others so you can go to the next step? fine...
So you can have a huge toy language in mere days and experiment with stuff you'd have to build for months by hand to be able to play with.
Comment by NuclearPM 4 hours ago
Mine is an Io and Rebol inspired language that uses SQlite and Luajit as a runtime.
1.to 10 .map[n | n * n].each[n | n.say!]
Comment by bobjordan 6 hours ago
That said, the core value of the software wouldn't exist without a human at the helm. It requires someone to expend the energy to guide it, explore the problem space, and weave hundreds of micro-plans into a coherent, usable system. It's a symbiotic relationship, but the ownership is clear. It’s like building a house: I could build one with a butter knife given enough time, but I'd rather use power tools. The tools don't own the house.
At this point, LLMs aren't going to autonomously architect a 400+ table schema, network 100+ services together, and build the UI/UX/CLI to interface with it all. Maybe we'll get there one day, but right now, building software at this scale still requires us to drive. I believe the author owns the language.
Comment by wcarss 5 hours ago
Going into the vault!
Comment by heavyset_go 4 hours ago
Not according to the US Copyright Office. It is 100% LLM output, so it is not copyrighted, thus it's free for anyone to do anything with it and no claimed ownership or license can stop them.
Comment by wild_egg 4 hours ago
Comment by heavyset_go 4 hours ago
It's possible to use AI output in human created content, and it can be copyrightable, and substantiative, transformative human-creative alteration of AI output is also copyrightable.
100% machine generated code is not copyrightable.
[1] https://newsroom.loc.gov/news/copyright-office-releases-part...
Comment by wild_egg 2 hours ago
Comment by heavyset_go 1 hour ago
Comment by kccqzy 3 hours ago
Comment by wild_egg 2 hours ago
This seems the opposite of the cut and dry "cannot be copyrighted" stance I was replying to.
Comment by kccqzy 30 minutes ago
> As the Office described in its March guidance, “when an AI technology receives solely a prompt from a human and produces complex written, visual, or musical works in response, the ‘traditional elements of authorship’ are determined and executed by the technology—not the human user.”
Comment by anonnon 3 hours ago
I have yet to see a study showing something like a 2x or better boost in programmer productivity through LLMs. Usually it's something like 10-30%, depending on what metrics you use (which I don't doubt). Maybe it's 50% with frontier models, but seeing these comments on HN where people act like they're 10x more productive with these tools is strange.
Comment by thunky 1 hour ago
I guess you're just not going to believe what anyone says.
Comment by anonnon 1 hour ago
How? They claimed LLMs somehow enabled them to write more code in the span of 3.5 years (assuming they started with ChatGPT's introduction) than they would be able to write in the span of decades. No studies have shown this. But at least one study did show that LLM devs overestimate how productive these systems make them.
Comment by pluc 7 hours ago
Comment by kreek 2 hours ago
In all seriousness, this is great, and why not? As the post said, what once took months now takes weeks. You can experiment and see what works. For me, I started off building a web/API framework with certain correctness built in, and kept hitting the same wall: the guarantees I wanted (structured error handling, API contracts, making invalid states unrepresentable) really belonged at the language level, not bolted onto a framework. A few Claude Code sessions later, I had a spec, then a tree-sitter implementation, then a VM/JIT... something that, given my sandwich-generation-ness, I never would have done a few months ago.
Comment by bfivyvysj 1 hour ago
Comment by emh68 2 hours ago
Comment by ramon156 8 hours ago
That said, it's a lot of words to say not a lot of things. Still a cool post, though!
Comment by ivanjermakov 7 hours ago
I believe we're at a point where it's not possible to accurately decide whether text is completely written by human, by computer, or something in between.
Comment by wavemode 7 hours ago
If this blog post is unedited LLM output, the blog owner needs to sell whatever model, setup and/or prompt he used for a million dollars, since it's clearly far beyond the state-of-the-art in terms of natural-sounding tone.
Comment by craigmart 6 hours ago
Comment by exitb 6 hours ago
I’ve never seen LLM being able to produce these kind of absurdist jokes. Or any jokes, really.
Comment by craigmart 4 hours ago
Comment by wavemode 2 hours ago
By all means, go read the post and then try to do so.
Comment by Bnjoroge 6 hours ago
Comment by aleksiy123 6 hours ago
I've been trying a new approach I call CLI first. I realized CLI tools are designed to be used both by humans (command line) and machines (scripting), and are perfect for llms as they are text only interface.
Essentially instead of trying to get llm to generate a fully functioning UI app. You focus on building a local CLI tool first.
CLI tool is cheaper, simpler, but still has a real human UX that pure APIs don't.
You can get the llm to actually walk through the flows, and journeys like a real user end to end, and it will actually see the awkwardness or gaps in design.
Your commands structure will very roughly map to your resources or pages.
Once you are satisfied with the capability of the cli tool. (Which may actually be enough, or just local ui)
You can get it to build the remote storage, then the apis, finally the frontend.
All the while you can still tell it to use the cli to test through the flows and journeys, against real tasks that you have, and iterate on it.
I did recently for pulling some of my personal financial data and reporting it. And now I'm doing this for another TTS automation I've wanted for a while.
Comment by tines 8 hours ago
Comment by ajay-b 7 hours ago
Comment by g3f32r 7 hours ago
Black Mirror did it first https://en.wikipedia.org/wiki/Hang_the_DJ
Comment by theblazehen 7 hours ago
Comment by monster_truck 2 hours ago
Comment by jetbalsa 7 hours ago
Comment by knicholes 6 hours ago
Comment by monster_truck 2 hours ago
Comment by Bnjoroge 6 hours ago
Comment by jaggederest 7 hours ago
Comment by marginalia_nu 7 hours ago
Comment by Copyrightest 2 hours ago
Comment by soperj 7 hours ago
Comment by laweijfmvo 7 hours ago
Comment by matthews3 7 hours ago
Comment by monster_truck 2 hours ago
It has not had any issues at all writing objc3 code
Comment by Copyrightest 2 hours ago
Comment by ractive 3 hours ago
I really liked that part - the house always wins.
Comment by dybber 5 hours ago
However, I fear that agents will always work better on programming languages they have been heavily trained on, so for an agent-based development inventing a new domain specific language (e.g. for use internally in a company) might not be as efficient as using a generic programming language that models are already trained on and then just live with the extra boilerplate necessary.
Comment by singularity2001 2 hours ago
Comment by p0w3n3d 6 hours ago
Comment by NuclearPM 4 hours ago
Comment by randallsquared 5 hours ago
I haven't read any farther than this, yet, but this made me stutter in my reading. Isn't a comparison just a function that takes two arguments and returns a third? How is that different from "+"?
Comment by amelius 8 hours ago
Comment by geon 7 hours ago
Comment by amelius 7 hours ago
Comment by beepbooptheory 5 hours ago
Comment by amelius 2 hours ago
Comment by scottmf 7 hours ago
Comment by koolala 7 hours ago
Comment by jackby03 6 hours ago
Comment by shadeslayer 5 hours ago
Congratulations on getting to the front page ;)
Comment by jcranmer 7 hours ago
fn read_float_literal(&mut self) -> &'a str {
let start = self.pos;
while let Some(ch) = self.peek_char() {
if ch.is_ascii_alphanumeric() || ch == '.' || ch == '+' || ch == '-' {
self.advance_char();
} else {
break;
}
}
&self.source[start..self.pos]
}
Admittedly, I do have a very idiosyncratic definition of floating-point literal for my language (I have a variety of syntaxes for NaNs with payloads), but... that is not a usable definition of float literal.At the end of the day, I threw out all of the code the AI generated and wrote it myself, because the AI struggled to produce code that was functional to spec, much less code that would allow me to easily extend it to other kinds of future operators that I knew I would need in the future.
Comment by dboreham 6 hours ago
Comment by jcranmer 5 hours ago
Comment by righthand 8 hours ago
This is such an interesting statement to me in the context of leftpad.
Comment by rpowers 7 hours ago
Comment by righthand 6 hours ago
Comment by nefarious_ends 7 hours ago
Comment by craigmcnamara 7 hours ago
Comment by nz 7 hours ago
This latest fever for LLMs simply confirms that people would rather do _anything_ other than program in a (not necessarily purely) functional language that has meta-programming facilities. I personally blame functional fixedness (psychological concept). In my experience, when someone learns to program in a particular paradigm or language, they are rarely able or willing to migrate to a different one (I know many people who refused to code in anything that did not look and feel like Java, until forced to by their growling bellies). The AI/LLM companies are basically (and perhaps unintentionally) treating that mental inertia as a business opportunity (which, in one way or another, it was for many decades and still is -- and will probably continue to be well into a post-AGI future).
Comment by zahirbmirza 6 hours ago
Comment by ractive 3 hours ago
Comment by esafak 2 hours ago
Comment by grumpyprole 6 hours ago
Comment by dwedge 6 hours ago
I mean, they may be right but there is also a big opportunity for this being Gell-Mann amnesia : "The phenomenon of a person trusting newspapers for topics which that person is not knowledgeable about, despite recognizing the newspaper as being extremely inaccurate on certain topics which that person is knowledgeable about."
Comment by mrsmrtss 6 hours ago
Comment by shevy-java 6 hours ago
Step #2 is: get real people to use it!
Comment by mriet 7 hours ago
Who the hell is going to use it then? You certainly won't, because you're dependent on AI.
Comment by logicprog 7 hours ago
Comment by Bnjoroge 6 hours ago
Comment by dankwizard 2 hours ago
Comment by koolala 7 hours ago
Comment by atoav 6 hours ago
Comment by kerkeslager 7 hours ago
The "more on that later" was unit tests (also generated by Claude Code) and sample inputs and outputs (which is basically just unit tests by a different name).
This is... horrifically bad. It's stupidly easy to make unit tests pass with broken code, and even more stupidly easy when the test is also broken.
These "guardrails" are made of silly putty.
EDIT: Would downvoters care to share an explanation? Preferably one they thought of?
Comment by octoclaw 7 hours ago
Comment by aplomb1026 7 hours ago
Comment by dehkopolis 7 hours ago
Comment by sabinbir 1 hour ago
Comment by iberator 6 hours ago
Comment by cptroot 3 hours ago
While I agree "AI is bad", well-written posts like this one can provide real insight into the process of using them, and reveal more about _why_ AI is bad.