Is legal the same as legitimate: AI reimplementation and the erosion of copyleft
Posted by dahlia 1 day ago
Comments
Comment by jrochkind1 1 day ago
Our foreparents fought for the right to implement works-a-like to corporate software packages, even if the so-called owners did not like it. We're ready to throw it all away, and let intellectual property owners get so much more control.
The implications will not end up being anti-large-corporation or pro-sharing. If you can prevent someone from re-implementing a spec or building a client that speaks your API or building a work-a-like, it will be the large corporations that exersize this power as usual.
Comment by dogcomplex 1 day ago
Nor should we be treating AI models themselves as respected IP. They're built on everyone else's data. Throw away this whole class of law, it's irrelevant in this new world.
Comment by marcus_holmes 1 day ago
Comment by vbarrielle 19 hours ago
Comment by conartist6 14 hours ago
Comment by xyzal 18 hours ago
It would be interesting to see a court ruling that the output of LLMs trained on copyleft code are licensed under the GPL ... and all other viral licenses simultaneously
Comment by Frieren 17 hours ago
It is quantum legality, to use copyright input is legal or illegal depending on the observer.
Comment by N7lo4nl34akaoSN 14 hours ago
Comment by Saline9515 17 hours ago
Comment by taneq 16 hours ago
Comment by cj 16 hours ago
No one knows.
Comment by dec0dedab0de 13 hours ago
It would take two stubborn businesses with a lot of money deciding that it is better to battle it out than focus on their business. Something like IBM v SCO or Oracle v Google.
Comment by direwolf20 16 hours ago
Comment by red_admiral 18 hours ago
Comment by marcus_holmes 15 minutes ago
If the LLM reproduces a human's copyrighted work, then that copyright still stands. This is, in effect, the same as photocopying someone else's writing. The LLM was trained on the copyrighted work, is incapable of producing new copyrightable work, so if it duplicates the original work then the original author's copyright still stands.
I am not a lawyer
Comment by raggi 16 hours ago
Comment by grensley 1 day ago
Comment by lurk2 1 day ago
Comment by AnthonyMouse 22 hours ago
Comment by marcus_holmes 4 hours ago
The courts have repeatedly said that copyright only applies to human creativity. The Supreme Court explicitly said this when they refused to hear the appeal:
https://en.wikisource.org/wiki/Thaler_v._Perlmutter,_Refusal...
> "We affirm our decision to refuse registration for the Work because it lacks the human authorship necessary to be eligible for copyright protection."
So they're saying that the LLM cannot be the author, because LLMs cannot claim copyright.
The related case about patents is more supportive of the narrative that AIs cannot be authors (see https://www.cafc.uscourts.gov/opinions-orders/21-2347.OPINIO...), specifically: "Here, there is no ambiguity: the Patent Act requires that inventors must be natural persons; that is, human beings."
The patent situation is that the Act says that inventor must be an individual, which the courts are interpreting to mean a human, so the LLM cannot be named as the inventor. So, in this case, yes, this is just saying that an LLM cannot be named as the inventor of a patent. That's not the same thing as the courts are saying with copyrights.
Comment by bdowling 16 hours ago
Comment by dataflow 1 day ago
Comment by greyface- 18 hours ago
Comment by terminalshort 23 hours ago
Comment by dataflow 22 hours ago
So now consider two questions:
1. You actually didn't use an LLM, but they believe & claim you did. Who has the burden of proof to show that you actually own the copyright, and how do they do so?
2. They write new code that you feel is based on yours. They claim they washed it through an LLM, but you don't believe so. Who has the burden of proof here and how do they do so?
Comment by marcus_holmes 19 minutes ago
My take on the answers (I am not a lawyer):
1. You copy their code. They bring a copyright claim (let's assume this isn't a DMCA thing and they're actually bringing a claim to court). Your defence is "the LLM wrote it so no copyright attaches". Since they're asserting their copyright claim, they would have to provide evidence for that claim (same as in any other copyright case), including providing evidence that a human wrote it (which is new, and required to defeat your defence).
2. They copy your code. You bring a copyright case. Their defence is "I used an LLM to wash the code without copying". Since they're not disputing your copyright claim to the original code, you don't have to defend or prove your copyright. But you do have to prove that their code infringes on your copyright, which would mean proving that the LLM copied your code when creating the new code. This has been done before by demonstrating similarity.
Comment by marcus_holmes 22 hours ago
Comment by LtWorf 22 hours ago
Comment by Dylan16807 22 hours ago
The occasional piece of software might be a trade secret, but a person downloading a preexisting leak isn't affected by those laws.
Comment by dataflow 21 hours ago
I think 18 U.S.C. § 1832 (a) (3) might answer your question? https://www.law.cornell.edu/uscode/text/18/1832
Comment by marcus_holmes 26 minutes ago
Closed-source code is not automatically a trade secret.
Comment by wk_end 12 hours ago
Comment by raxxorraxor 18 hours ago
Comment by giancarlostoro 15 hours ago
For movies and shows, charge and increasing fee to renew the copyright. Eventually studios will give up certain movies. The older the movie the more you pay.
Comment by dec0dedab0de 13 hours ago
I personally think we should have shorter limits for non-creator owners of copyright, and for creators it should be like 20 years or death whichever comes last. I also think compulsory licensing should exist for everything.
Comment by teaearlgraycold 1 day ago
Well we could try fixing the forever part. Copyright is out of control. I’d like to see a world with much less power given to IP. Sometimes I even say I want it eradicated entirely. But realistically we should start by cutting things back. Maybe give software an especially short copyright period.
Comment by fc417fc802 1 day ago
There's always going to be downsides and edgecases when granting any party a monopoly over anything. At least if it's limited to 2 decades any unintended consequences, philosophical objections, and etc are hopefully kept within reason.
Comment by Qwertious 21 hours ago
Meanwhile, there are cases where copyright of more than 2 years is overkill.
I don't know what, but it seems like we need some sort of mechanism for variable-length IP duration is needed.
Comment by fc417fc802 20 hours ago
I could understand for medical devices maybe but even then it seems like the software is a tiny part of the overall cost of a given design. A competitor could already do a clean room reimplementation in that case.
But I guess it wouldn't be all that bad if there were a carefully crafted extension for government certified software that was explicitly tied to the length of the certification process.
Comment by pixl97 14 hours ago
Comment by PowerElectronix 13 hours ago
If you do something that requires stealing the code (publishing it, selling it, etc) the company can legally fuck you up.
Now, once it's in tbe wind, it becomes almost impossible to pursue from a practical point of view, as any implementer can claim trade secrets to avoid showing you the code.
Comment by ncruces 18 hours ago
Comment by rurban 11 hours ago
Comment by Xirdus 16 hours ago
Comment by fc417fc802 16 hours ago
Consider if you will that if some guy were to fly a drone the size of a car that he knocked together in his garage over a residential area people would not accept that. Yet private pilots in cessnas fly over neighborhoods constantly.
Comment by LtWorf 22 hours ago
Comment by jongjong 21 hours ago
If we remove IP laws, we should remove all private property laws!
Comment by tosapple 20 hours ago
Comment by thayne 1 day ago
Comment by direwolf20 16 hours ago
Comment by devonkelley 1 day ago
Comment by dTal 17 hours ago
What is the difference between an "agent" and a "compiler"?
For that matter, what is the difference between "I got an agent to provide a high level description" and a decompiler?
What is the difference between ["decompiling" a binary, editing the resulting source, recompiling, and redistributing] and [analyzing the behavior of a binary, feeding that description into an LLM, generating source code that replicates that behavior, editing that, recompiling and redistributing]?
Takeaway: we are now in a world where software tools can climb up and down the abstraction stack willy nilly and independently of human effort. Legal tools that attempt to track the "provenance" of "source code" were already shaky but are now crumbling entirely.
Comment by AnthonyMouse 21 hours ago
Comment by Ferret7446 3 hours ago
Hypothetically, I think this trail of suggestion of treating specs as intellectual property would simply destroy copyright for software, which is what we (the people who believe in FOSS) want. There is already case law protecting specs (e.g. Java)
Comment by pabs3 3 hours ago
Comment by RobRivera 1 day ago
Comment by halJordan 1 day ago
Comment by dimitrios1 1 day ago
(side bar: the phrase "anti-<whatever> luddites" is way, way overused, especially here. Let's get more creative, people!)
Comment by fc417fc802 1 day ago
There's also some environmentalist concerns which the term luddite again fits perfectly. You just have to generalize, transferring laterally from economic wellbeing to environmental wellbeing.
So I don't think GP qualified as an ad hominem dismissal but rather an accurate description of the situation. Take what's being discussed (restrictions on specifications and interoperability), project it backwards in history, and imagine what an alternate present day would look like. I think it would be pretty bad.
Comment by pyrale 16 hours ago
Who doesn't enjoy interesting times™
Comment by Qwertious 21 hours ago
Pffft no. Most of us think that AI is being used as a political trick - like firing unionized workers "to replace them with AI" and then hiring new un-unionized workers to replace them, 2 weeks later. Replace the AI with an empty cardboard box labeled "AI" in black marker, and nothing changes.
See also: using AI to launder pirated material, for big businesses.
Comment by pixl97 14 hours ago
1. Since when have companies needed trillions of dollars of AI to do that? In the US they've been able to get away with getting rid of unions for decades now.
2. Since when has HN given a shit about unions. Posting about unions, at least till recently has been a great way of getting your comment downvoted to [dead] in one easy step. For longer than LLMs have existed the HN answer to unions was "They are just there to keep me as an SWE from making as much money as I can". Only now do we see a little bit of pushback now that their heads may be next on the chopping block.
Comment by aaron695 1 day ago
Comment by MrDarcy 1 day ago
There isn’t much of a middle ground anymore.
Comment by glhaynes 11 hours ago
Comment by noemit 18 hours ago
Comment by zelphirkalt 19 hours ago
I am for keeping the licenses in place, as long as there is any copyright at all on software. If we get rid of that, then we can get rid of copyleft licenses and all others too. But of course businesses and greedy people want to have their cake and eat it too. They want copyleft to disappear, but _their_ software, oh no, no one may copy that! Double standards at their best.
Comment by rcxdude 18 hours ago
(the paradox of copyleft is that it does tend to push free software advocates in a direction of copyright maximalism)
Comment by thayne 1 day ago
Although I think the chance of that happening is effectively zero.
Comment by red_admiral 18 hours ago
Comment by taint69 10 hours ago
But in the lonely mind of a man.
Comment by az226 15 hours ago
Comment by mikkupikku 15 hours ago
Comment by alterom 1 day ago
Our "foreparents" weren't competing with corporations with unlimited access to generative AI trained on their work. The times, they're-a-changin'.
You're rehashing the argument made in one of the articles which this piece criticizes and directly addresses, while ignoring the entirety of what was written before the conclusion that you quoted.
If anyone finds themselves agreeing with the comment I'm responding to, please, do yourself a favor and read the linked article.
I would do no justice to it by reiterating its points here.
Comment by hathawsh 1 day ago
It seems like the answer is to adjust IP owner rights very carefully, if that's possible. It sounds very hard, though.
Comment by alterom 1 day ago
The point the author was making was that the intent of GPL is to shift the balance of power from wealthy corporations to the commons, and that the spirit is to make contributing to the commons an activity where you feel safe in knowing that your contributions won't be exploited.
The corporations today have the resources to purchase AI compute to produce AI-laundered work, which wouldn't be possible without the commons the AI it got its training data from, and give nothing back to the commons.
This state of things disincentivizes contributing to the FOSS ecosystem, as your work will be taken advantage of while the commons gets nothing.
Share-alike clause of the GPL was the price that was set for benefitting from the commons.
Using LLMs trained on GPL code to x "reimplement" it creates a legal (but not a moral!) workaround to circumvent GPL and avoid paying the price for participation.
This means that the current iteration of GPL isn't doing its intended job.
GPL had to grow and evolve. The Internet services using GPL code to provide access to software without, technically, distributing it was a similar legal (but not moral) workaround which was addressed with an update in GPL.
The author argues that we have reached another such point. They don't argue what exactly needs to be updated, or how.
They bring up a suggestion to make copyrightable the input to the LLM which is sufficient to create a piece of software, because in the current legal landscape, creating the prompt is deemed equivalent to creating the output.
You can't have your cake and eat it too.
A vibe-coded API implementation created by an LLM trained on open source, GPL licensed code can only be considered one of two things:
— Derivative work, and therefore, subject to the requirement to be shared under the GPL license (something the legal system disagrees with)
— An original work of the person who entered the prompt into the LLM, which is a transformative fair use of the training set (the current position of the legal system).
In the later case, the input to the LLM (which must include a reference to the API) is effectively deemed to be equivalent to the output.
The vibe-coded app, the reasoning goes, isn't a photocopy of the training data, but a rendition of the prompt (even though the transformativeness came entirely from the machine and not the "author").
Personally, I don't see a difference between making a photocopy by scanning and printing, and by "reimplementing" API by vibe coding. A photocopy looks different under a microscope too, and is clearly distinguishable from the original. It can be made better by turning the contrast up, and by shuffling the colors around. It can be printed on glossy paper.
But the courts see it differently.
Consequently, the legal system currently decided that writing the prompt is where all the originality and creative value is.
Consequently, de facto, the API is the only part of an open source program that has can be protected by copyright.
The author argues that perhaps it should be — to start a conversation.
As for who the benefactors are from a change like that — that, too, is not clear-cut.
The entities that benefit the most from LLM use are the corporations which can afford the compute.
It isn't that cheap.
What has changed since the first days of GPL is precisely this: the cost of implementing an API has gone down asymmetrically.
The importance of having an open-source compiler was that it put corporations and contributors the commons on equal footing when it came to implementation.
It would take an engineer the same amount of time to implement an API whether they do it for their employer or themselves. And whether they write a piece of code for work or for an open-source project, the expenses are the same.
Without an open compiler, that's not possible. The engineer having access to the compiler at work would have an infinite advantage over an engineer who doesn't have it at home.
The LLM-driven AI today takes the same spot. It's become the tool that software engineers can and do use to produce work.
And the LLMs are neither open nor cheap. Both creating them as well as using them at scale is a privilege that only wealthy corporations can afford.
So we're back to the days before the GNU C compiler toolchain was written: the tools aren't free, and the corporations have effectively unlimited access to them compared to enthusiasts.
Consequently, locking down the implementation of public APIs will asymmetrically hurt the corporations more than it does the commons.
This asymmetry is at the core of GPL: being forced to share something for free doesn't at all hurt the developer who's doing it willingly in the first place.
Finally, looking back at the old days ignores the reality. Back in the day, the proprietary software established the APIs, and the commons grew by reimplementing them to produce viable substitutes.
The commons did not even have its own APIs worth talking about in the early 1990s. But the commons grew way, way past that point since then.
And the value of the open source software is currently not in the fact that you can hot-swap UNIX components with open source equivalents, but in the entire interoperable ecosystem existing.
The APIs of open source programs are where the design of this enormous ecosystem is encoded.
We can talk about possible negative outcomes from pricing it.
Meanwhile, the already happening outcome is that a large corporation like Microsoft can throw a billion dollars of compute on "creating" MSLinux and refabricating the entire FOSS ecosystem under a proprietary license, enacting the Embrace, Extend, Extinguish strategy they never quite abandoned.
It simply didn't make sense for a large corporation to do that earlier, because it's very hard to compete with free labor of open source contributors on cost. It would not be a justifiable expenditure.
What GPL had accomplished in the past was ensuring that Embracing the commons led to Extending it without Extinguishing, by a Midas touch clause. Once you embrace open source, you are it.
The author of the article asks us to think about how GPL needs to be modified so that today, embracing and extending open-source solutions wouldn't lead to commons being extinguished.
Which is exactly what happened in the case of the formerly-GPL library in question.
Comment by sobellian 1 day ago
Comment by matheusmoreira 1 day ago
Comment by trinsic2 1 day ago
If you want to build a new world with out this, we can't do it while we are supporting the very companies that are creating the problem. The more power you give them, the strong they get and the weaker we become.
I think focus needs to shift completely off of for-profit companies. Although, not sure how that is going to happen..lol
Comment by autoexec 1 day ago
Comment by alterom 1 day ago
[citation needed]
Where does your confidence come from?
GPL itself was precisely the "intellectual property nonsense" adding which made FOSS (free as in freedom) software possible.
The copyright law was awfully broken in the 1980s too. Adding "nonsense" then was the only solution that proved viable.
Historically, nothing but adding "more IP nonsense" has ever worked.
>The real solution is to force AI companies to open up their models to all.
Sure. Pray tell how you would do that without some "intellectual property nonsense".
We don't exactly get to hold Sam Altman at gunpoint to dictate our terms.
>We need free as in freedom LLMs that we can run locally on our own computers
Oh, on that note.
LLMs take a fuckton of compute to train and to even run.
Even if all models were open, we're not at the point where it would create an equal playing field.
My home computer and my dev machine at work have the same specs. But I don't have a compute farm to run a ChatGPT on.
Comment by matheusmoreira 1 day ago
From the fact that copyright infringement is trivial and done at massive scales by pretty much everyone on a daily basis without people even realizing it. You infringe copyright every time you download a picture off of a website. You infringe copyright every time you share it with a friend. Everybody does stuff like this every single day. Nobody cares. It is natural.
> GPL itself was precisely the "intellectual property nonsense"
Yes. In response to copyright protection being extended towards software. It's a legal hack, nothing more. The ideal situation would have been to have no copyright to begin with. The corporation can copy your code but you can copy theirs too. Fair.
> Pray tell how you would do that without some "intellectual property nonsense".
Intellectual property is irrelevant to AI companies.
Intellectual property is built on top of a fundamental delusion: the idea that you can publish information and simultaneously control what people do with it. It's quite simply delusional to believe you can control what people do with information once it's out there and circulating. The tyranny required to implement this amounts to totalitarian dictatorships.
If you want to control information, then your only hope is to not publish it. Like cryptographic keys, the ideal situation is the one where only a single copy of the information exists in the entire universe.
AI companies are not publishing any information. They are keeping their models secret, under lock and key. They need exactly zero intellectual property protection. In fact such protections have negative value to them since it restricts the training of their models.
> We don't exactly get to hold Sam Altman at gunpoint to dictate our terms.
Sure you do. The whole point of government is to do just that. Literally pass some kind of law that forces the corporations to publish the model weights. And if the government refuses to do it, people can always rise up.
> Even if all models were open, we're not at the point where it would create an equal playing field.
Hopefully we will be, in the future.
Comment by salawat 1 day ago
Comment by matheusmoreira 1 day ago
Comment by pixl97 14 hours ago
Comment by throwaway290 22 hours ago
respectfully yoy have no idea what you are talking about here.
Comment by pixl97 14 hours ago
Copyright is a gigantic fucking mess that the US has forced over a large chunk of the world.
Comment by throwaway290 12 hours ago
How did they turn out?
Comment by pixl97 10 hours ago
Comment by scheeseman486 19 hours ago
Comment by matheusmoreira 21 hours ago
Comment by tpmoney 1 day ago
If "more freedom" is your goal, then this rewrite is inherently in that direction. It didn't "close" the old library down. The LGPL version remains under its license, for anyone to use and redistribute exactly as it always has. There is just now also an alternative that one can exercise different rights with. And that doesn't even get into the fact that "increased freedom" was never a condition of being allowed to clone a system from its interfaces in the first place. It might have been a fig leaf, but some major events in the legal landscape of all this came from closed reimplementations. Sony v. Connectix is arguably the defining case for dealing with cloning from public interfaces and behavior as it applies to emulators of all kinds, and Connectix Virtual Gamestation was very much NOT an open source or free product.
But to go a step further, the larger idea of AI assisted re-writes being "good", even if the human developers may have seen the original code seems to broadly increase freedoms overall. Imagine how much faster WINE development can go now that everyone that has seen any Microsoft source code can just direct Claude to implement an API. Retro gaming and the emulation scene is sure to see a boost from people pointing AIs at ay tests in source leaks and letting them go to town. No our "foreparents" weren't competing with corporations with unlimited access to AI trained on their work, they were competing with corporations with unlimited access to the real hardware and schematics and specifications. The playing field has always been un-level which was why fighting for the right to re-implement what you can see with your own eyes and measure with your own instruments was so important. And with the right AI tools, scrappy and small teams of developers can compete on that playing field in a way that previous developers could only dream of.
So no, I agree with the comment that you're responding to. The incredible mad dash to suddenly find strong IP rights very very important now that it's the open source community's turn to see their work commoditized and used in ways they don't approve of is off-putting and in my opinion a dangerous road to tread that will hand back years of hard fought battles in an attempt to stop the tides. In the end it will leave all of us in a weaker position while solidifying the hold large corporations have on IP in ways we will regret in the years to come.
Comment by salawat 1 day ago
Pretty sure no one, (but me anyway) saw overt theft of IP by ignoring IP law through redefinition coming. Admittedly I couldn't articulate for you capital would skill transfer and commoditize it in the form of pay to play data centers, but give me a break, I was a teenager/twenty something at the time.
Comment by zmmmmm 1 day ago
So: once it's not "hard" any more, does IP even make sense at all? Why grant monopoly rights to something that required little to no investment in the first place? Even with vestigial IP law - let's say, patents: it just becomes and input parameter that the AI needs to work around the patents like any other constraints.
Comment by palmotea 1 day ago
I think it still does: IIRC, the current legal situation is AI-output does not qualify for IP protections (at least not without substantial later human modification). IP protections are solely reserved for human work.
And I'm fine with that: if a person put in the work, they should have protections so their stuff can't be ripped off for free by all the wealthy major corporations that find some use for it. Otherwise: who cares about the LLMs.
Comment by robmccoll 1 day ago
Comment by palmotea 1 day ago
Then fix that instead of blowing it up. Because IP law is also literally the only thing that protects the little guy's work in many cases.
Arguments like yours are kinda unfathomably incomplete to me, almost like they're the remnants of some propaganda campaign. It's constructed to appeal to the defense of the little guy, but the actual effect would be to disempower him and further empower the wealthy major corporations with "big enough warchest[s]."
I mean, one thing I think the RIAA would love is to stop paying royalties to every artist ever. And the only thing they'd be worried about is an even bigger fish (like Amazon, Apple, or Spotify) no longer paying royalties to them. But as you said, they have a big enough war chest that they probably could force a deal somehow. All the artists without a war chest? Left out in the cold.
Comment by esrauch 1 day ago
It definitely does some of both, and we have no obvious measure or counterfactual to know otherwise.
You also have to take into account not just if optimal reform or optimal dismantle is better, but the realistic likelihood of each, and the risk of the bad outcomes from each.
Protect even more conceptual product ideas seems pretty strongly like it will result in more of a tool for big guys only, it's patents on crack and patents are already nearly exclusively "big guy crushes small guy" tool, versus copyright is at least debatably mixed.
Comment by palmotea 23 hours ago
It's super obvious, unless your perspective basically stems from someone who was mad they couldn't BitTorrent a ton of movies.
I mean, FFS, copyright is the literal foundation for open source licenses like the GPL.
My sense is a lot of the radically anti-IP fervor ultimately stems from people who were outraged they could be sued for seeding an MP3 (though it's accreted other complaints to justify that initial impulse, and it's likely some where indoctrinated from secondary argumentation somewhat obscured from the core impulse).
That's not to say that there are not actors who abuse IP or there aren't meaningful reforms that could be done, but the "burn it all down" impulse is not thought through.
Comment by jph00 20 hours ago
Comment by esrauch 21 hours ago
Yes it is a genius move that copy left used copyright to achieve their goal. But the name is literally reflecting the judo going on in that case. Copyleft licenses also does have a lot benefits to big companies as well too so it's not strictly a David vs Goliath victory.
I don't think it's a commonly held belief that copyright benefits small YouTube creators more than it hurts them as a concrete example, they seem to live in constant fear of being destroyed in an asymmetrical system where copyright can take away they livelihood at any moment while not doing anything to meaningfully protect it.
Comment by _aavaa_ 1 day ago
Comment by rlpb 1 day ago
Comment by jbergqvist 1 day ago
Comment by reverius42 1 day ago
Comment by eru 1 day ago
Because some photographer somewhere can claim to have put in a lot of effort, we all get IP protection for photographs by default.
Comment by shagie 1 day ago
https://en.wikipedia.org/wiki/Sweat_of_the_brow
https://en.wikipedia.org/wiki/Copyright_law_of_the_United_St...
Comment by reverius42 1 day ago
Comment by JAlexoid 21 hours ago
Comment by reverius42 1 day ago
Comment by eru 1 day ago
Comment by nkmnz 1 day ago
I beg to differ. AI-output did not entitle the person creating the prompt for IP protections, so far – but my objection is not directed towards the "so far", but towards your omission of "the person creating the prompt", because if an AI outputs copyrighted material from the training data, that material is still copyrighted. AI is not a magical copyright removal machine.
Comment by reverius42 1 day ago
What this means in practice is that (currently), all output of an LLM is legally considered to not be copyrightable (to the extent that it's an original work). If it happens to regurgitate an existing copyrighted work, though, is that infringement? I'm not sure we have a legal precedent on that question yet.
Comment by Muskwalker 1 day ago
I believe there are other cases where AI-generated works were found uncopyrightable but Thaler is not a good example* of them.
Comment by jazzyjackson 1 day ago
Comment by reverius42 1 day ago
I don't think this means the same thing as whether or not LLM output can infringe on someone else's copyright though (that does pose an interesting question -- can something non-copyrightable in general infringe on something copyrighted?).
Comment by nkmnz 1 day ago
Comment by JAlexoid 21 hours ago
Comment by bandrami 1 day ago
Comment by scheeseman486 19 hours ago
That also applies to generative AI, pure output may not be copyrightable but as soon as you do something beyond type some words and press a button, like doing area-specific infills and paintovers, which involve direct and deliberate choices by a human, the copyrighted human-driven arrangement becomes so deeply intertwined with the generative work that it's effectively inseperable.
Comment by utopiah 21 hours ago
Any example of that? So far I haven't seen any but maybe I'm looking at the wrong places.
I've see a lot of :
- "solving" math proofs that were properly formalized, with often numerous documented past attempts, re-verified by proper mathematicians, without necessarily any interesting results
- haven't seen any designed trust, most I've seen was (again with entire teams of experts behind) finding slight optimizations
Basically all outputs I've seen so far have been both following existing trends (basically low hanging fruits without any paradigm shift) and never ever alone but rather as search supports for teams of World class experts. None of these that would quality IMHO as knowledge creation. Whenever such results were published the publication seemed mostly to be promotion about the workflow itself more than the actual results. DeepMind seems to be the prime example for that.
PS: for the epistemological distinction you can see a few past comments of mine (e.g. https://news.ycombinator.com/item?id=47011884 )
Comment by satvikpendem 1 day ago
Comment by godd2 1 day ago
Comment by gbacon 10 hours ago
Comment by rfw300 1 day ago
It is entirely possible, however, that human beings will not be the primary drivers of progress on those problems.
Comment by gbacon 10 hours ago
Comment by treyd 1 day ago
I have been saying this for years. Intellectual property is based on the concept that ideas can be owned, which is fundamentally a contradiction with how reality operates. We've been able to write laws that paper over that contradiction by introducing concepts like "fair use", but it doesn't resolve it.
AI is just making the conflict arising out of that contradiction more intense in new ways and forcing us to reckon with it in this new technological landscape. You can follow two perfectly reasonable lines of logic and end up with contradictory solutions. So how are we going to get out of this mess? I don't know, not without rolling back (at least parts of) what intellectual property is in the first place.
Comment by kindkang2024 1 day ago
That's the reason I like the idea of DUKI/dju:ki/ — Decentralized Universal Kindness Income, similar to UBI but driven by voluntary kindness and sincere marketing rather than taxation. If AI makes creation trivially easy and IP loses its justification, the question becomes: how do we ensure a tiny part of the wealth generated flows back to everyone?
Comment by nradov 1 day ago
Comment by reverius42 1 day ago
That also seems relevant for this whole discussion, actually -- if a work can't be copyrighted it certainly can't have a changed license, or any license at all. (I guess it's effectively public domain to the extent that it's public at all?)
Comment by nradov 1 day ago
Comment by reverius42 1 day ago
"Lower courts upheld a U.S. Copyright Office decision that the AI-crafted visual art at issue in the case was ineligible for copyright protection because it did not have a human creator."
Not eligible for copyright protection does not mean it can be copyrighted "under the human creator's name". It means there is no creative work at all. No copyright.
Comment by reverius42 1 day ago
Comment by nradov 1 day ago
Comment by reverius42 1 day ago
Comment by zmmmmm 1 day ago
I guess the state of play will be that for new drugs the original manufacturer will already have done that and ensured that literally anything that could be found as a workaround is included in the scope of the patent. But I feel like it will not be possible to keep that wartertight.
Comment by nradov 1 day ago
Comment by paxys 1 day ago
Comment by prohobo 20 hours ago
In terms of math and biochemistry the cost of generating candidates has collapsed, but the cost of validating them hasn't.
Comment by Eridrus 1 day ago
Not all protections have to be ones that give total control like copyright.
I think it's a mistaken assumption that costs will fall to zero. The low hanging fruit will get picked, and then we'll be doing expensive combined AI/wetlab search for new drugs.
If there is any meaningful headroom we will keep doing expensive things to make progress.
Comment by matheusmoreira 23 hours ago
Then why are corporations allowed to milk successful works for all eternity? Why do we have Disney monopolizing films made half a century ago? Why do we have Nintendo selling people the exact same Mario ROMs from the 80s every single console generation?
They should have like 10 years of copyright so they can turn a profit. Once it expires it's over and the work enters the public domain where it belongs. If they want to keep profiting they should have to keep creating new things. They shouldn't be able to turn shared culture into eternal intellectual property portfolios that they monopolize and then sit on like dragons.
Comment by Eridrus 16 hours ago
I am somewhat curious what you think shortening the copyright window would do that's so great for the culture though. We already have more than enough IP slop that's just licensed.
Comment by matheusmoreira 10 hours ago
Let them profit from those new works then. All the works from the last century belong in the public domain.
Comment by LelouBil 1 day ago
https://www.vice.com/en/article/musicians-algorithmically-ge...
Two musicians generated every possible melody within an octave, and published them as creative Commons Zero.
I never heard about this again though.
Comment by js8 1 day ago
With AI, a similar process is happening - publicly available information becomes enclosed by the model owners. We will probably get a "vestigial" intellectual property in the form of model ownership, and everyone will pay a rent to use it. In fact, companies might start to gatekeep all the information to only their own LLM flavor, which you will be required to use to get to the information. For example, product documentation and datasheets will be only available by talking to their AI.
Comment by gnopgnip 1 day ago
Also copyright can protect something normally not eligible when the author chooses what information to include and exclude
Comment by eru 1 day ago
Copyright might rest on 'creativity is hard'. But patents and trademarks do not.
Comment by DonsDiscountGas 1 day ago
Comment by bandrami 1 day ago
Comment by eru 1 day ago
Comment by newyankee 1 day ago
Comment by matheusmoreira 1 day ago
Sure, it's disgusting and hypocritical how these corporations enshrined all this nonsense into law only to then ignore it all the second LLMs were invented. It's ultimately a good thing though. The model weights are all that matters. All we need to do is wait for the models to hit diminishing returns, then somehow find a way to leak them so that everyone has access. If they refuse, then just force them. By law or by revolution.
Comment by paxys 1 day ago
A company spends a decade and billions of dollars to develop a groundbreaking drug and patents it.
I think of a cool new character called "Mr Poop" and publish a short story about him with an hour of work.
Both of us get the exact same protection under the law (yes yes I know copyright vs patent etc., but ultimately they are all about IP protection).
Comment by keeda 1 day ago
Comment by AlienRobot 1 day ago
Comment by spwa4 1 day ago
Company incorporates GPL code in their product? Never once have courts decided to uphold copyright. HP did that many times. Microsoft got caught doing it. And yet the GPL was never applied to their products. Every time there was an excuse. An inconsistent excuse.
Schoolkid downloads a movie? 30,000 USD per infraction PLUS armed police officer goes in and enforces removal of any movies.
Or take the very subject here. AI training WAS NOT considered fair use when OpenAI violated copyright to train. Same with Anthropic, Google, Microsoft, ... They incorporated harry potter and the linux kernel in ChatGPT, in the model itself. Undeniable. Literally. So even if you accept that it's changed now, OpenAI should still be forced to redistribute the training set, code, and everything needed to run the model for everything they did up to 2020. Needless to say ... courts refused to apply that.
So just apply "the law", right. Courts' judgement of using AI to "remove GPL"? Approved. Using AI to "make the next Disney-style movie"? SEND IN THE ARMY! Whether one or the other violates the law according to rational people? Whatever excuse to avoid that discussion is good enough.
Comment by hyperman1 1 day ago
Patents came along when farmers started making city goods, threatening guilds secrets. Copyright came when the printing press made copying and translating the bible easy and accessible to all. (Trademark admittedly does not fit this view, but doesn't seem all that damaging either)
To Protect The Arts, and To Time Limit Trade Secrets were just the Protect The Children of old times, a way to confuse people who didn't look too hard at actual consequences.
This means that the future of IP depends on what lets the powers that be pull up the ladder behind them. Long term I'd expect e.g. copyright expansion and harder enforcement, just because cloning by AI gets easy enough to threaten the status quo.
Comment by cobbzilla 1 day ago
Isn’t trademark the only thing keeping a certain cartoon mouse out of the public domain, despite the fact that his earliest animations are out of copyright? Not sure if you’d consider that damaging, or if anyone has yet tested the boundaries of the House of Mouse’s patience here.
Comment by jazzyjackson 1 day ago
Comment by ordu 1 day ago
What AI are eroding is copyright. You can re-implement not just a GPL program, but to reverse engineer and re-implement a closed source program too, people have demonstrated it already, there were stories here on HN about it.
AI is eroding copyright, so there may no longer be a need for the GPL. GNU should stop and rethink its stance, chuck away the GPL as the main tool to fight evil software corporations and embrace LLM as the main weapon.
Comment by davidw 1 day ago
LLM's - to date - seem to require massive capital expenditures to have the highest quality ones, which is a monumental shift in power towards mega corporations and away from the world of open source where you could do innovative work on your own computer running Linux or FreeBSD or some other open OS.
I don't think that's an exciting idea for the Free Software Foundation.
Perhaps with time we'll be able to run local ones that are 'good enough', but we're not there yet.
There's also an ethical/moral question that these things have been trained on millions of hours of people's volunteer work and the benefits of that are going to accrue to the mega corporations.
Edit: I guess the conclusion I come to is that LLM's are good for 'getting things done', but the context in which they are operating is one where the balance of power is heavily tilted towards capital, and open source is perhaps less interesting to participate in if the machines are just going to slurp it up and people don't have to respect the license or even acknowledge your work.
Comment by ordu 1 day ago
Yeah, a bit of a conundrum. But I don't think that fighting for copyright now can bring any benefits for FOSS. GNU should bring Stallman back and see whether he can come with any new ideas and a new strategy. Alternatively they could try without Stallman. But the point is: they should stop and think again. Maybe they will find a way forward, maybe they won't but it means that either they could continue their fight for a freedom meaningfully, or they could just stop fighting and find some other things to do. Both options are better then fighting for copyright.
> There's also an ethical/moral question that these things have been trained on millions of hours of people's volunteer work and the benefits of that are going to accrue to the mega corporations.
I want a clarify this statement a bit. The thing with LLM relying on work of others are not against GPU philosophy as I understand it: algorithms have to be free. Nothing wrong with training LLMs on them or on programs implementing them. Nothing wrong with using these LLMs to write new (free) programs. What is wrong are corporations reaping all the benefits now and locking down new algorithms later.
I think it is important, because copyright is deemed to be an ethical thing by many (I think for most people it is just a deduction: abiding the law is ethical, therefore copyright is ethical), but not for GNU.
Comment by balamatom 1 day ago
IMO the primary significant trend in AI. Doesn't get talked about nearly enough. Means the AI is working, I guess.
>GNU should bring Stallman back ... Alternatively they could try without Stallman.
Leave Britney alone >:(
>copyright is deemed to be an ethical thing by many (I think for most people it is just a deduction: abiding the law is ethical, therefore copyright is ethical)
I've busted out "intellectual property is a crime against humanity" at layfolk to see if that shortcuts through that entire little politico-philosophical minefield. They emote the requisite mild shock when such things as crimes against humanity are mentioned; as well as at someone making such a radical statement which seems to come from no familiar species of echo chamber; and then a moment later they begin to very much look like they see where I'm coming from.
Comment by Serenacula 1 day ago
Comment by bo1024 1 day ago
Comment by zozbot234 1 day ago
There are near-SOTA LLM's available under permissive licenses. Even running them doesn't require prohibitive expenses on hardware unless you insist on realtime use.
Comment by walterbell 1 day ago
What async tasks could a local LLM accomplish on Intel 11th gen CPU with 32GB RAM?
Comment by Aozora7 1 day ago
Right now, we can get local models that you can run on consumer hardware, that match capabilities of state of the art models from two years ago. The improvements to model architecture may or may not maintain the same pace in the future, but we will get a local equivalent to Opus 4.6 or whatever other benchmark of "good enough" you have, in the foreseeable future.
Comment by tmp10423288442 1 day ago
When the FSF and GPL were created, I don't think this was really a consideration. They were perfectly happy with requiring Big Iron Unix or an esoteric Lisp Machine to use the software - they just wanted to have the ability to customize and distribute fixes and enhancements to it.
Comment by davidw 9 hours ago
The 'good enough' part is the important one here, I think.
Comment by jacquesm 1 day ago
This was already the case and it just got worse, not better.
Comment by davidw 1 day ago
Now they've just hoovered up all the free stuff into machines that can mix it up enough to spit it out in a way that doesn't even require attribution, and you have to pay to use their machine.
Comment by jacquesm 1 day ago
Before we had RedHat and Ubuntu, who at least were contributing back, now we have Microsoft, Anthropic and OpenAI who are racing to lock the barn door around their new captive sheep. It's just a massive IP laundromat.
Comment by stalfie 19 hours ago
Comment by davidw 9 hours ago
Comment by jacquesm 16 hours ago
Comment by thenewnewguy 1 day ago
Comment by davidw 1 day ago
It's nowhere near the order of magnitude of the kind of spending they're sinking into LLM's. The FSF and other groups were reasonably successful at enforcing the GPL, operating on a budget 1000's of times smaller than that of AI companies.
Comment by cloverich 1 day ago
Being able to coat efficiently run frontier models is i think, not a high priced endeavor for an org (compared to an individual).
IMO the proposition is little fishy, but its not totally without merit and imo deserves investigation. If we are all worried about our jobs, even via building custom for sale software, there is likely something there that may obviate the need at least for end user applications. Again, im deeply skeptical, but it is interesting.
Comment by overfeed 1 day ago
Running proprietary model would make you subject to whatever ToS the LLM companies choose on a particular day, and what you can produce with them, which circles back to the raison d'etre for the GPL and GNU.
Until all software copyright is dead and buried, there is no need for copyleft to change tack. Otherwise there rising tide may rise high enough to drown GPL, but not proprietary software.
Open source is easier to counterfeit/license-launder/re-implement using LLMs because source code is much lower-hanging fruit, and is understood by more people than closed-source assembly.
Comment by socalgal2 1 day ago
Comment by shadowgovt 1 day ago
Comment by davidw 1 day ago
Comment by stebalien 1 day ago
Unfortunately, there are cases where you simply can't just "re-implement" something. E.g., because doing so requires access to restricted tools, keys, or proprietary specifications.
Comment by ordu 1 day ago
"So, I looked for a way to stop that from happening. The method I came up with is called “copyleft.” It's called copyleft because it's sort of like taking copyright and flipping it over. [Laughter] Legally, copyleft works based on copyright. We use the existing copyright law, but we use it to achieve a very different goal."
https://writings.hongminhee.org/2026/03/legal-vs-legitimate/
Comment by dathinab 1 day ago
i.e. mirroring it
> use it to achieve a very different goal."
"very different goal" isn't the same as "fundamentally destroying copyright"
the very different goal include to protect public code to stay public, be properly attributed, prevent companies from just "sizing" , motivate other to make their code public too etc.
and even if his goals where not like that, it wouldn't make a difference as this is what many people try to archive with using such licenses
this kind of AI usage is very much not in line with this goals,
and in general way cheaper to do software cloning isn't sufficient to fix many of the issues the FOSS movement tried to fix, especially not when looking at the current ecosystem most people are interacting with (i.e. Phones)
---
("sizing"): As in the typical MS embrace, extend and extinguish strategy of first embracing the code then giving it proprietary but available extensions/changes/bug fixes/security patches to then make them no longer available if you don't pay them/play by their rules.
---
Through in the end using AI as a "fancy complicated" photocopier for code is as much removing copyright as using a photocopier for code would. It doesn't matter if you use the photocopier blind folded and never looked at the thing you copied.
Comment by sarchertech 1 day ago
Comment by sjunot 1 day ago
For the right goal, he should have called it "rightcopy".
Comment by rileymat2 1 day ago
It also grants one major right/feature to the creator, the ability to spread their work while keeping it as open as they intend.
Comment by johnofthesea 1 day ago
Is this LLM thing freely available or is it owned and controlled by these companies? Are we going to rent the tools to fight "evil software corporations"?
Comment by Aozora7 1 day ago
Comment by lkjdsklf 1 day ago
A year ago, the "state of the art" models were total turds. So this isn't exactly good news
Not to mention the performance of local LLMs makes them utterly unusable unless you have multiple tens of thousands to invest in hardware (and that was before the recent price spike). If you're using commodity hardware, they're just awful to use.
Comment by josephg 1 day ago
It’s probably only a matter of time before open models are as good as Claude code is today.
Comment by Aozora7 1 day ago
Comment by Peritract 1 day ago
LLMs are one of the primary manifestations of 'evil software corporations' currently.
Comment by dathinab 1 day ago
it's not that simple
yes, GPLs origins have the idea of "everyone should be able to use"
but it also is about attribution the original author
and making sure people can't just de-facto "size public goods"
the kind of AI usage is removing attribution and is often sizing public goods in a way far worse then most companies which just ignored the license did
so today there is more need then ever in the last few decades for GPL like licenses
Comment by amiga386 1 day ago
Comment by webstrand 1 day ago
Reducing it to "well you can clone the proprietary software you're forced to use by LLM" is really missing the soul of the GPL.
Comment by pocksuppet 1 day ago
Comment by webstrand 1 day ago
Comment by paxys 1 day ago
Comment by mikkupikku 1 day ago
Comment by cubefox 1 day ago
A court ordered the first Nosferatu movie to be destroyed because it had too many similarities to Dracula. Despite the fact that the movie makes rather large deviations from the original.
If Claude was indeed asked to reimplement the existing codebase, just in Rust and a bit optimized, that could well be a copyright violation. Just like rephrasing A Song ot Ice and Fire a bit, and switching to a different language, doesn't remove its copyright.
Comment by zozbot234 1 day ago
Comment by cubefox 1 day ago
Allegedly. There have been several people who doubted this story. So how to find out who is right? Well, just let Claude compare the sources. Coincidentally, Claude Opus 4.6 doesn't just score 75.6% on SWE-bench Verified but also 90.2% on BigLaw Bench.
It's like our copyright lawyer is conveniently also a developer. And possibly identical to the AI that carried out the rewrite/reimplemention in question in the first place.
Comment by Marsymars 1 day ago
There is some precedent for this, e.g. Alchemised is a recent best seller that had just enough changed from its Harry Potter fan fiction source in order to avoid copyright infringement: https://en.wikipedia.org/wiki/Alchemised
(I avoided the term “remove copyright” here because the new work is still under copyright, just not Harry Potter - related copyright.)
Comment by cubefox 1 day ago
Comment by Marsymars 1 day ago
Comment by cubefox 1 day ago
Comment by Marsymars 1 day ago
Translations are pretty much the textbook example of a derivative work in copyright.
Your jurisdiction may vary, of course, but it's pretty well established in mine (Canada) that "plot" is an idea, and can't be copyrighted, only the expression of the idea (e.g. the written novel) falls under copyright.
Comment by cubefox 1 day ago
Comment by Marsymars 9 hours ago
But "expresses the same idea" isn't the benchmark, "ideas expressed in the same way" is the benchmark.
A translated work is "ideas expressed in the same way", a translation doesn't change that.
See e.g. this question/top answer on stackexchange, it details pretty well how plot can't be copyrighted using Harry Potter as an example: https://writing.stackexchange.com/questions/3928/on-copyrigh...
Comment by xantronix 1 day ago
Comment by wolvesechoes 1 day ago
Unless it is IP of the same big corpos that consumed all content available. Good luck with eroding them.
Comment by re-thc 1 day ago
At the moment it's people that are eroding copyright. E.g. in this case someone did something.
"AI" didn't have a brain, woke up and suddenly decided to do it.
Realistically nothing to do with AI. Having a gun doesn't mean you randomly shoot.
Comment by thomastjeffery 1 day ago
Generative models (AI) are not really eroding copyright. They are calling its bluff. The very notion of intellectual property depends on a property line: some arbitrary boundary where the property begins and ends. Generative models blur that line, making it impractical to distinguish which property belongs to whom.
Ironically, these models are made by giant monopolistic corporations whose wealth is quite literally a market valuation (stock price) of their copyrights! If generative models ever become good enough to reimplement CUDA, what value will NVIDIA have left?
The reality is that generative models are nowhere near good enough to actually call the bluff. Copyright is still the winning hand, and that is likely to continue, particularly while IP holders are the primary authors of law.
---
This whole situation is missing the forest for the trees. Intellectual Property is bullshit. A system predicated on monopoly power can only result in consolidated wealth driving the consolidation of power; which is precisely what has happened. The words "starving artist" ring every bit as familiar today as any time in history. Copyright has utterly failed the very goals it was explicitly written with.
It isn't the GPL that needs changing. So long as a system of copyright rules the land, copyleft is the best way to participate. What we really need is a cohesive political movement against monopoly power; one that isn't conveniently ignorant of copyright as its most significant source.
Comment by pennomi 1 day ago
Comment by martin-t 1 day ago
Comment by sharkjacobs 1 day ago
This feels sort of like saying "I just blindly threw paint at that canvas on the wall and it came out in the shape of Mickey Mouse, and so it can't be copyright infringement because it was created without the use of my knowledge of Micky Mouse"
Blanchard is, of course, familiar with the source code, he's been its maintainer for years. The premise is that he prompted Claude to reimplement it, without using his own knowledge of it to direct or steer.
Comment by dathinab 1 day ago
I would argue it's irrelevant if they looked or didn't look at the code. As well as weather he was or wasn't familiar with it.
What matters is, that they feed to original code into a tool which they setup to make a copy of it. How that tool works doesn't really matter. Neither does it make a difference if you obfuscate that it's an copy.
If I blindfold myself when making copies of books with a book scanner + printer I'm still engaging in copyright infringement.
If AI is a tool, that should hold.
If it isn't "just" a tool, then it did engage in copyright infringement (as it created the new output side by side with the original) in the same way an employee might do so on command of their boss. Which still makes the boss/company liable for copyright infringement and in general just because you weren't the one who created an infringing product doesn't mean you aren't more or less as liable of distributing it, as if you had done so.
Comment by Legend2440 1 day ago
Well, no. They fed the spec (test cases, etc) into a tool which made a new program matching the spec. This is not a copy of the original code.
But also this feels like arguing over the color of the iceberg while the titanic sinks. If you have a tool that can make code to spec, what is the value in source code anymore? Even if your app is closed-source, you can just tell claude to write new code that does the same thing.
Comment by vbarrielle 1 day ago
Comment by derangedHorse 1 day ago
Comment by vbarrielle 19 hours ago
Comment by timeinput 1 day ago
Comment by foresto 1 day ago
Yes...
> and Anthropic fed the code to the tool,
Presumably, as part of the massive amount of open-source code that must have been fed in to train their model.
> so Blanchard didn't do anything wrong, and Anthropic didn't do anything wrong. Nothing to see here.
This is meant as irony, right?
Comment by timeinput 8 hours ago
Comment by spullara 1 day ago
Comment by sigseg1v 1 day ago
Comment by spullara 1 day ago
Comment by cubefox 1 day ago
Comment by ghostpepper 1 day ago
Comment by spullara 1 day ago
Comment by nicole_express 1 day ago
I'm not sure how you square the circle of "it's alright to use the LLM to write code, unless the code is a rewrite of an open source project to change its license".
Comment by JoshTriplett 1 day ago
> I'm not sure how you square the circle of "it's alright to use the LLM to write code
You seem like you're on the cusp of stating the obvious correct conclusion: it isn't.
Comment by satvikpendem 1 day ago
That's your opinion (since you said "IMO"), not the actual legal definition.
Comment by bmcahren 1 day ago
Then onto prompting: 'He fed only the API and (his) test suite to Claude'
This is Google v Oracle all over again - are APIs copyrightable?
Comment by azakai 1 day ago
About this specific point, it is unclear how much of a defect memorization actually is - there are also reasons to see it as necessary for effective learning. This link explains it well:
https://infinitefaculty.substack.com/p/memorization-vs-gener...
Comment by satvikpendem 1 day ago
Yes this is the best way to ask the question. If I take a public facing API and reimplement everything, whether it's by human or machine, it should be sufficient. After all, that's what Google did, and it's not like their engineers never read a single line of the Java source code. Even in "clean room" implementations, a human might still have remembered or recalled a previous implementation of some function they had encountered before.
Comment by thunderfork 1 day ago
Comment by tw1984 1 day ago
No, it is completely different.
Claude was trained on chardet, anything built by Claude would fail the clean-room reimplementation test.
Comment by LegionMammal978 1 day ago
Comment by wizzwizz4 1 day ago
> But how far away from direct and explicit representations do we have to go before copyright no longer applies?
Comment by yorwba 1 day ago
So when you clone the behavior of a program like chardet without referencing the original source code except by executing it to make sure your clone produces exactly the same output, you may still be infringing its copyright if that output reflects creative choices made in the design of chardet that aren't fully determined by the functional purpose of the program.
Comment by NSUserDefaults 1 day ago
Comment by margalabargala 1 day ago
Copyright infringement is a thing humans do. It's not a human.
Just like how the photos taken by a monkey with a camera have no copyright. Human law binds humans.
Comment by malicka 1 day ago
Comment by margalabargala 1 day ago
If we are saying AI is "more than a tool", which seems to be the case courts are leaning since they've ruled AI output without direct human involvement is not copyrightable[0], then the above seems like it would be entirely legal.
Comment by Ekaros 1 day ago
Even if the final output doesn't have copyright protection it might still be copyright violation. I think it could be reasonable to have work that itself violates copyright when distributed even if it does not have copy right itself.
Comment by logicprog 1 day ago
Comment by atomicnumber3 1 day ago
Comment by jpc0 1 day ago
If I know it is legal to make a turn at a red light. And I know a court will uphold that I was in the right but a police officer will fine me regardless and I would need to go to actually pursue some legal remedy I'm unlikely to do it regardless of whether it is legal because it is expensive, if not in money but time.
In the case of copyright lawsuits they are notoriously expensive and long so even if a court would eventually deem it fine, why take the chance.
Comment by atomicnumber3 1 day ago
Comment by sunshowers 1 day ago
Comment by simonw 1 day ago
Comment by sarchertech 1 day ago
Anything you put out can and will be used by whatever giant company wants to use it with no attribution whatsoever.
Doesn’t that massively reduce the incentive to release the source of anything ever?
Comment by satvikpendem 1 day ago
It's the same question as, if an AI can generate "art", or photographers can capture a scene better than any (realistic) painter, then will people still create art? Obviously yes, and we see it of course after Stable Diffusion was released three years ago, people are still creating.
Comment by sarchertech 1 day ago
So ignoring people who are being paid by corporations directly to work on open source, in my experience the vast majority of contributors expect to be able to monetize their work eventually in a way that requires attribution. And out of the small number who don’t expect a monetary return of any kind, a still smaller number don’t expect recognition.
If this weren’t the case you’d see a much larger amount of anonymous contributions. There are people who anonymously donate to charity. The vast majority want some kind of recognition.
Obviously we still see art, if you greatly reduce the monetary benefit to producing art, you’ll see a lot less of it. This is especially true of non trivial open source software that unlike static artwork requires continual maintenance.
Comment by joshjob42 1 day ago
So I'm not sure it matters whether a giant company uses it because random users can get the same thing for ~ free anyway.
Comment by sarchertech 1 day ago
Comment by intrasight 1 day ago
The non IP protection has largely been in the effort involved in replicating an application's behavior and that effort is dropping precipitously.
Comment by sarchertech 1 day ago
Comment by intrasight 1 day ago
Comment by pocksuppet 1 day ago
Comment by axus 1 day ago
In this case, we could theoretically prove that the new chardet is a clean reimplementation. Blanchard can provide all of the prompts necessary to re-implement again, and for the cost of the tokens anyone can reproduce the results.
Comment by Aurornis 1 day ago
My understanding was that his claim was that Claude was not looking at the existing source code while writing it.
Comment by duskdozer 1 day ago
He would have had a better argument if he created a matching spec from scratch using randomized names.
Comment by pklausler 1 day ago
Comment by mrgoldenbrown 1 day ago
Comment by NewsaHackO 1 day ago
IANAL, but that analogy wouldn't work because Mickey Mouse is a trademark, so it doesn't matter how it is created.
Comment by SpicyLemonZest 1 day ago
Comment by esafak 1 day ago
Comment by amarant 1 day ago
Comment by Copyrightest 1 day ago
Comment by re-thc 1 day ago
> He fed only the API and the test suite to Claude and asked it
Difference being Claude looked; so not blind. The equivalent is more like I blindly took a photo of it and then used that to...
Technically did look.
Comment by amarant 1 day ago
What he claimed, and what was interesting, was that Claude didn't look at the code, only the API and the test suite. The new implementation is all Claude. And the implementation is different enough to be considered original, completely different structure, design, and hey, a 48x improvement in performance! It's just API-compatible with the original. Which as per the Google Vs oracle 2021 decision is to be considered fair use.
Comment by mrgoldenbrown 1 day ago
Comment by amarant 1 day ago
Comment by re-thc 1 day ago
Who opened the PR? Who co-authored the commits? It's clearly on Github.
> Blanchard was a chardet maintainer for years. Of course he had looked at its code!
So there you have it. If he looked, he co-authored then there's that.
Comment by kjksf 1 day ago
Blanchard is very clear that he didn't write a single line of code. He isn't an author, he isn't a co-author.
Signing GitHub commit doesn't change that.
Comment by re-thc 1 day ago
He used Claude to write it. Difference? The fact that I write on the notepad vs printed it out = I didn't do it?
> Signing GitHub commit doesn't change that.
That's the equivalent of me saying I didn't kill anyone. The fingerprints on the knife doesn't change that.
Comment by satvikpendem 1 day ago
Comment by re-thc 20 hours ago
I did say co-author didn't I? Even if you added 0.000000001% to something you did so technically, yes.
> By your logic I did apparently
If you take someone's email and forward it did you write that email? Instead of debating that imagine you took a trojan email and forwarded it to someone and they opened it - do you think you'd be held up in any way?
Comment by babypuncher 1 day ago
This would make it so relicensing with AI rewrites is essentially impossible unless your goal is to transition the work to be truly public domain.
I think this also helps somewhat with the ethical quandary of these models being trained on public data while contributing nothing of value back to the public, and disincentivize the production of slop for profit.
Comment by kjksf 1 day ago
https://www.carltonfields.com/insights/publications/2025/no-...
> No Copyright Protection for AI-Assisted Creations: Thaler v. Perlmutter
> A recent key judicial development on this topic occurred when the U.S. Supreme Court declined to review the case of Thaler v. Perlmutter on March 2, 2026, effectively upholding lower court rulings that AI-generated works lacking human authorship are not eligible for copyright protection under U.S. law
Comment by pseudalopex 1 day ago
This was AI summary? Those words were not in the article.
The courts said Thaler could not have copyright because he refused to list himself as an author.
Comment by idle_zealot 1 day ago
That's not true at all. Anyone could follow these steps:
1. Have the LLM rewrite GPL code.
2. Do not publish that public domain code. You have no obligation to.
3. Make a few tweaks to that code.
4. Publish a compiled binary/use your code to host a service under a proprietary license of your choice.
Comment by Gigachad 1 day ago
Comment by robmccoll 1 day ago
Comment by Gigachad 1 day ago
Comment by NewsaHackO 1 day ago
Comment by alpaca128 1 day ago
Comment by NewsaHackO 1 day ago
Comment by Gigachad 1 day ago
Comment by RhythmFox 1 day ago
Comment by paxys 1 day ago
In all of these cases an AI model is taking a copyrighted source, reading it, jumbling the bytes and storing it in its memory as vectors.
Later a query reads these vectors and outputs them in a form which may or may not be similar to the original.
Comment by SatvikBeri 1 day ago
I don't know of any rulings on the context window, but it's certainly possible judges would rule that would not qualify as transformative.
Comment by derangedHorse 1 day ago
Comment by phendrenad2 1 day ago
Comment by reverius42 1 day ago
I'm not sure there should be, but I think there is.
Comment by sneak 10 hours ago
Comment by NiloCK 1 day ago
AI 1: - (reads the source), creates a spec + acceptance criteria
AI 2: - implements from spec
AI 1 is in the position of the maintainer who facilitated the license swap.
Comment by yunnpp 1 day ago
As far as I know, you can as long as you own a copy of the original. In other words, you can't redistribute the assets, but you can distribute the code that works with them. This is literally how every free/libre game remake works. The copyright of your new, from-scratch code, is in no way linked to that of the assets.
Comment by smsm42 1 day ago
Comment by u1hcw9nx 1 day ago
> That question is this: does legal mean legitimate?
Just because something is legal does not mean it's moral thing to do.
Comment by larodi 1 day ago
is it legitimate for millions of people to exploit, expound on knowledge that was perhaps, to begin with, not legitimate to use? well they did already, who's to judge the commons now?
Comment by mirashii 1 day ago
Comment by larodi 1 day ago
to me is superb ridiculous to shun the comment though. but we'll be having this split for a while, that for sure.
Comment by Aboutplants 1 day ago
Comment by peacebeard 1 day ago
Comment by wvenable 1 day ago
When it comes to software, again it's the expression that matters -- literally the actual source code. Software that does the same thing but uses entirely different code to do it is not the same expression. Like with the tracing example above, if you read the original source code then it's harder to claim that it isn't the same expression. This is why clean room implementations are necessary.
Comment by alex1sa 19 hours ago
Now imagine an LLM trained on every GitHub repo doing the same thing at scale. The model has "seen" the source, but the output is statistically generated, not
copied. Is that a clean room? The model never "read" the code the way a human would, but it clearly learned patterns from it.
I think the practical answer is that clean room as a legal concept was designed for a world where reimplementation was expensive and intentional. When an LLM
can do it in minutes from a spec, we need a different framework entirely.Comment by wvenable 12 hours ago
If the presumption is that LLM training, despite reading all the source code of everything everywhere, ultimately doesn't actually contain that source code (in a compressed form) then that is the significant bit.
If training is truly doing something transformative, maybe even a machine analogy to human learning, then anything produced directly by that LLM without another work in it's context is an entirely new work. That's all that is important.
> I think the practical answer is that clean room as a legal concept was designed for a world where reimplementation was expensive and intentional.
Whether or not it's expensive or intentional is immaterial. It always was and it's still true now. All that matters is that the actual expression, the real source code, is not copied. Clean room is just one way to have evidence that you didn't copy.
Comment by LPisGood 1 day ago
Comment by shagie 1 day ago
https://www.nolo.com/legal-encyclopedia/protecting-fictional...
https://en.wikipedia.org/wiki/Copyright_protection_for_ficti...
Comment by GuB-42 1 day ago
Comment by gowld 1 day ago
Comment by amelius 1 day ago
Comment by VorpalWay 1 day ago
Comment by throw-qqqqq 1 day ago
https://en.wikipedia.org/wiki/Software_patents_under_the_Eur...
Comment by amelius 1 day ago
Comment by IshKebab 1 day ago
But also software patents and design patents are totally different things.
Comment by fruitworks 1 day ago
On top of all of this, there are the attempts at binary decompilation using LLMs and other new tools that have been discussed on this site recently.
Comment by martin-t 1 day ago
The original implementation would still have the upper hand here. OTOH if I as a nobody create something cool, there's nothing stopping a huge corporation from "reimplementing" (=stealing) it and and using their huge advertising budget to completely overshadow me.
And that's how they like it.
Comment by Gigachad 1 day ago
Comment by munk-a 1 day ago
Comment by crazygringo 1 day ago
But agreed that we're waiting for a court case to confirm that. Although really, the main questions for any court cases are not going to be around the principle of fair use itself or whether training is transformative enough (it obviously is), but rather on the specifics:
1) Was any copyrighted material acquired legally (not applicable here), and
2) Is the LLM always providing a unique expression (e.g. not regurgitating books or libraries verbatim)
And in this particular case, they confirmed that the new implementation is 98.7% unique.
Comment by jazzyjackson 1 day ago
If you’ve used copyrighted books and turned them into a free write-a-book machine, you are suddenly using the authors own works against them, in a way that a judge might rule is not very fair.
“ Effect of the use upon the potential market for or value of the copyrighted work: Here, courts review whether, and to what extent, the unlicensed use harms the existing or future market for the copyright owner’s original work. In assessing this factor, courts consider whether the use is hurting the current market for the original work (for example, by displacing sales of the original) and/or whether the use could cause substantial harm if it were to become widespread.”
Comment by crazygringo 1 day ago
This is for the same reason that search results or search snippets aren't deemed to harm creators according to copyright. Yes there might be some percentage lost of sales. And truly, people may be buying less JavaScript tutorial books now that LLM's can teach you JavaScript or write it for you. But the relation is so indirect, there's very little chance a court would accept the argument.
Because what the LLM is doing is reading tons of JavaScript and JavaScript tutorials and resources online, and producing its own transformed JavaScript. And the effect of any single JavaScript tutorial book in its training set is so marginal to the final result, there's no direct effect.
And the reason this makes sense is that it's no different from a teacher reading 20 books on JavaScript and then writing their own that turns out to be a best-seller. Yes, it takes away from the previous best-sellers. But that's fine, because they're not copying any of the previous works directly. They're transforming the facts they learned into a new synthesis.
Comment by madeofpalk 1 day ago
Training an LLM inherently requires making a copy of the work. Even the initial act of loading it from the internet and copying it into memory to then train the LLM is a copy that can be governed by its license and copyright law
Comment by cortesoft 1 day ago
Comment by madeofpalk 1 day ago
> The court held that making RAM copies as an essential step in utilizing software was permissible under §117 of the Copyright Act even if they are used for a purpose that the copyright holder did not intend.
https://en.wikipedia.org/wiki/Vault_Corp._v._Quaid_Software_....
Comment by kg 1 day ago
IIRC this exact argument was made in the Blizzard vs bnetd case, wasn't it? Though I can't find confirmation on whether that argument was rejected or not...
Comment by crazygringo 1 day ago
But that's not relevant here. Because the copyleft license does not prohibit that (and it's not even clear that any license can prohibit it, as courts may confirm it's fair use, as most people are currently assuming). That's why I noted under (1) that it's not applicable here.
Comment by munk-a 1 day ago
LLM training involves ingesting works (in a potentially transformative process) and partially reproduce them - that's a generally restricted action when it comes to licensing.
Comment by crazygringo 1 day ago
Sure, but that's not what LLM's generally do, and it's certainly not what they're intended to do.
The LLM companies, and many other people, argue that training falls under fair use. One element of fair use is whether the purpose/character is sufficiently transformative, and transforming texts into weights without even a remote 1-1 correspondence is the transformation.
And this is why LLM companies ensure that partial reproduction doesn't happen during LLM usage, using a kind of copyrighted-text filter as a last check in case anything would unintentionally get through. (And it doesn't even tend to occur in the first place, except when the LLM is trained on a bunch of copies of the same text.)
Comment by munk-a 1 day ago
Comment by strogonoff 21 hours ago
Comment by duskdozer 20 hours ago
Comment by crazygringo 1 day ago
Comment by joquarky 1 day ago
Comment by pessimizer 1 day ago
This is just an assertion that you're making. There's no argument here. I'm aware that this is also an assertion that some judges have made.
My claim is that LLMs are not human, therefore when you apply words like "training" to them, you're only doing it metaphorically. It's no more "training" than copying code to a different hard drive is training that hard drive. And it's no more "transformative" than rar'ing or zipping the code, then unzipping it. I can't sell my jpgs of pngs I downloaded from Getty.
I have no idea how LLMs can be considered transformative work that immunizes me from owing the least bit of respect to the source material, but if I sample 2-6 second snatches from 10 different songs, put them through over 9000 filters and blend them into a new work, I owe money to everyone involved. I might even owe money to the people who wrote the filters, depending on the licensing.
> 98.7% unique.
This doesn't mean anything. This is a meaningless arrangement of words. The way we figure out things are piracy is through provenance, not bizarre ad hoc measurements. If I read a book in Spanish and rewrite it in English, it doesn't suddenly become mine even though it's 96.6492387% unique. Not even if I drop a few chapters, add in a couple of my own, and change the ending.
Comment by crazygringo 11 hours ago
...OK? Was somebody asking me for an "argument"? I'm just stating how things are currently understood.
> And it's no more "transformative" than rar'ing or zipping the code, then unzipping it.
That's obviously false, so I'm not sure what to tell you.
> but if I sample 2-6 second snatches from 10 different songs, put them through over 9000 filters and blend them into a new work, I owe money to everyone involved
You don't, actually, if they're no longer recognizable -- which they wouldn't be after "9000 filters". I don't know where you got that idea that you'd still owe money. And I've certainly never heard of an audio filter license that was contingent on commerical distribution.
> This doesn't mean anything. This is a meaningless arrangement of words.
Statistics are meaningful. Obviously you need to look at the actual identical lines. But if they're a bunch of trivial things like initializing variables with obvious names, then they don't count for much. And if you're adhering to the same API, you would expect to have some small percentage of lines happen to match. So the fact that this is <2%, as opposed to 40%, is hugely significant as a first step of analysis.
I suggest you might find conversations here on HN more productive if you soften your tone a bit. Saying things like "this is just an assertion that you're making" or "this is a meaningless arrangement of words" is not generally going to make people want to respond to you.
Comment by gspr 1 day ago
Some might hold that we've granted persons certain exemptions, on account of them being persons. We do not have to grant machines the same.
> In copyright terms, it's such an extreme transformative use that copyright no longer applies.
Has the model really performed an extreme transformation if it is able to produce the training data near-verbatim? Sure, it can also produce extremely transformed versions, but is that really relevant if it holds within it enough information for a (near-)verbatim reproduction?
Comment by NewsaHackO 1 day ago
I feel as though, from an information-theoretic standpoint, it can't be possible that an LLM (which is almost certainly <1 TB big) can contain any substantial verbatim portion of its training corpus, which includes audio, images, and videos.
Comment by gspr 23 hours ago
It doesn't need to for my argument to make sense. It's a problem if it reproduces a single copyrighted work (near)-verbatim. Which we have plenty of examples of.
Comment by NewsaHackO 19 hours ago
Comment by gspr 17 hours ago
Comment by NewsaHackO 12 hours ago
Comment by crazygringo 1 day ago
No we don't have to, but so far we do, because that's the most legally consistent. If you want to change that, you're going to need to pass new laws that may wind up radically redefining intellectual property.
> Has the model really performed an extreme transformation if it is able to produce the training data near-verbatim?
Of course it has, if the transformation is extreme, as it appears to be here. If I memorize the lyrics to a bunch of love songs, and then write my own love song where every line is new, nobody's going to successfully sue me just because I can sing a bunch of other songs from memory.
Also, it's not even remotely clear that the LLM can produce the training data near-verbatim. Generally it can't, unless it's something that it's been trained on with high levels of repetition.
Comment by munk-a 1 day ago
> you're going to need to pass new laws that may wind up radically redefining intellectual property
You're correct that this is one route to resolving the situation, but I think it's reasonable to lean more strongly into the original intent of intellectual property laws to defend creative works as a manner to sustain yourself that would draw a pretty clear distinction between human creativity and reuse and LLMs.
Comment by crazygringo 1 day ago
But you're missing the other half of copyright law, which is the original intent to promote the public good.
That's why fair use exists, for the public good. And that's why the main legal argument behind LLM training is fair use -- that the resulting product doesn't compete directly with the originals, and is in the public good.
In other words, if you write an autobiography, you're not losing significant sales because people are asking an LLM about your life.
Comment by Copyrightest 1 day ago
BTW in 2023 I watched ChatGPT spit out hundreds of lines of F# verbatim from my own GitHub. A lot of people had this experience with GitHub Copilot. "98.7% unique" is still a lot of infringement.
Comment by crazygringo 1 day ago
That's not relevant, because you can still sue the person using the LLM and publishing the repository. Legal liability is completely unchanged.
Comment by alterom 1 day ago
It's changed completely, from your own example.
If you comission art from an artist who paints a modified copy of Warhol's work, the artist is liable (even if you keep that work private, for personal use).
If you commission it from OpenAI (by sending a query to their ChatGPT API), by your argument, you are the person liable — and OpenAI is off the hook even if that work is distributed further.
I'm not going to argue about the merits of creativity here, or that someone putting a prompt into ChatGPT considers themselves an artist.
That's irrelevant. The work is created on OpenAI servers, by the LLMs hosted there, and is then distributed to whoever wrote the prompt.
Models run locally are distributed by whoever trained them.
If you train a model on whatever data you legally have access to, and produce something for yourself, it's one thing.
Distribution is where things start to get different.
Comment by crazygringo 1 day ago
Let's distinguish two different scenarios here:
1) Your prompt is copyright-free, but the LLM produces a significant amount of copyrighted content verbatim. Then the LLM is liable, and you too are liable if you redistribute it.
2) Your prompt contains copyrighted data, and the LLM transforms it, and you distribute it. Then if the transformation is not sufficient, you are liable for redistributing it.
The second example is what I'm referring to, since the commercial LLM's are now very good about not reproducing copyrighted content verbatim. And yes, OpenAI is off the hook from everything I understand legally.
Your example of commissioning an artist is different from LLM's, because the artist is legally responsible for the product and is selling the result to you as a creative human work, whereas an LLM is a software tool and the company is selling access to it. So the better analogy is if you rent a Xerox copier to copy something by Warhol. Xerox is not liable if you try to redistribute that copy. But you are. So here, Xerox=OpenAI. They are not liable for your copyrighted inputs turning into copyrighted outputs.
Comment by alterom 1 day ago
It isn't.
One analogy in that case would be going to a FedEx copy center and asking the technician to produce a bunch of copies of something.
They absolve themselves of liability by having you sign a waiver certifying that you have complete rights to the data that serves as input to the machine.
In case of LLMs, that includes the entire training set.
Comment by Copyrightest 1 day ago
Comment by crazygringo 1 day ago
In scenario (1) the LLM is plagiarizing. But that's not the scenario we're discussing. And I already said, this is where the LLM is liable. Whether a user should be too is a different question.
But scenario (2) is what I'm discussing, as I already explained, and it's very possible to tell, because you yourself submitted the copyrighted content. All you need to do is look at whether the output is too similar to the input.
If there's some scenario where you input copyrighted material and it transforms it into different material that is also copyrighted by someone else... that is a pretty unlikely edge case.
Comment by satvikpendem 1 day ago
Comment by NewsaHackO 1 day ago
Comment by munk-a 1 day ago
Comment by NewsaHackO 1 day ago
Comment by jazzyjackson 1 day ago
“”” Section 107 calls for consideration of the following four factors in evaluating a question of fair use:
Purpose and character of the use, including whether the use is of a commercial nature or is for nonprofit educational purposes: Courts look at how the party claiming fair use is using the copyrighted work, and are more likely to find that nonprofit educational and noncommercial uses are fair. This does not mean, however, that all nonprofit education and noncommercial uses are fair and all commercial uses are not fair; instead, courts will balance the purpose and character of the use against the other factors below. Additionally, “transformative” uses are more likely to be considered fair. Transformative uses are those that add something new, with a further purpose or different character, and do not substitute for the original use of the work.
Nature of the copyrighted work: This factor analyzes the degree to which the work that was used relates to copyright’s purpose of encouraging creative expression. Thus, using a more creative or imaginative work (such as a novel, movie, or song) is less likely to support a claim of a fair use than using a factual work (such as a technical article or news item). In addition, use of an unpublished work is less likely to be considered fair.
Amount and substantiality of the portion used in relation to the copyrighted work as a whole: Under this factor, courts look at both the quantity and quality of the copyrighted material that was used. If the use includes a large portion of the copyrighted work, fair use is less likely to be found; if the use employs only a small amount of copyrighted material, fair use is more likely. That said, some courts have found use of an entire work to be fair under certain circumstances. And in other contexts, using even a small amount of a copyrighted work was determined not to be fair because the selection was an important part—or the “heart”—of the work.
Effect of the use upon the potential market for or value of the copyrighted work: Here, courts review whether, and to what extent, the unlicensed use harms the existing or future market for the copyright owner’s original work. In assessing this factor, courts consider whether the use is hurting the current market for the original work (for example, by displacing sales of the original) and/or whether the use could cause substantial harm if it were to become widespread. “””
Comment by NewsaHackO 1 day ago
Comment by paxys 1 day ago
The test for infringement is if the output is transformative enough, and that is what NYT vs OpenAI etc. are arguing.
Comment by steve_gh 22 hours ago
Comment by kelseyfrog 1 day ago
Sec has a deny by default policy. Eng has a use-more-AI policy. Any code written in-house is accepted by default. You can see where this is going.
We've been using AI to reimplement tooling that security won't approve. The incentives conspired in the worst outcome, yet here we are. If you want a different outcome, you need to create different incentives.
Comment by kemitchell 1 day ago
There is a fundamental corpo-cognitive dissonance, to boot. If "AI" is cheap enough and good enough to implement security-relevant software from `git init` repeatedly, why isn't it also cheap enough and good enough to assess and approve the security of third-party software at pace with internal adoption? Is there some basis to believe LLMs' leverage on production differs from its leverage on analysis of existing code?
Comment by PaulDavisThe1st 1 day ago
If he is claiming to have been somehow substantively "enough" involved to make the code copyrightable, then his own familiarity with the previous LGPL implementation makes the new one almost certainly a derivative of the original.
Comment by sigmar 1 day ago
The "clean room rewrite" is just an extreme way to have a bulletproof shield against litigation. Not doing it that way doesn't automatically make all new code he writes derivative solely because he saw how the code worked previously.
Comment by PaulDavisThe1st 1 day ago
And if he was in fact more involved (which he appears to deny) that it's a bit weak to say that someone with huge familiarity with chardet could choose to reimplement chardet without the result being derivative.
Comment by serial_dev 1 day ago
Comment by vbarrielle 1 day ago
Comment by heavyset_go 1 day ago
Comment by justinclift 1 day ago
Sure, but neither of those is an IP Lawyer.
The actual IP Lawyer who turned up and tried to engage, Richard Fontana, had his issue closed:
https://github.com/chardet/chardet/issues/334
Richard's point was this (quoted below):
---
FWIW, that case is not really relevant to what we are/were talking about here.
The question is whether you are truly an "author", or whether there was no (human) author.
The general legal consensus has been that generative AI output is not copyrightable (without some special facts of some sort, perhaps).
> If all of this code was somehow not copyrightable because someone wrote a prompt instead of directly editing the code, that would have pretty huge implications.
That's exactly it. Your act of applying the MIT license with your copyright notice to code that you did not "directly edit" has enormous implications.
Comment by RcouF1uZ4gsC 1 day ago
I think it is more like photography.
The case law is that a camera can't own a copyright, but a human can, even though all the pixels were produced by the camera with very little involvement at the pixel level by the human.
Comment by ryukoposting 22 hours ago
https://www.reuters.com/legal/government/us-supreme-court-de...
Prompting generally does not constitute authorship under US law.
Comment by RcouF1uZ4gsC 16 hours ago
Comment by waterTanuki 1 day ago
Comment by largbae 1 day ago
1. The cost continues to trend to 0, and _all_ software loses value and becomes immediately replaceable. In this world, proprietary, copyleft and permissive licenses do not matter, as I can simply have my AI reimplement whatever I want and not distribute it at all.
2. The coding cost reduction is all some temporary mirage, to be ended soon by drying VC money/rising inference costs, regulatory barriers, etc. In that world we should be reimplementing everything we can as copyleft while the inferencing is good.
Comment by sarchertech 1 day ago
Comment by anonymous_sorry 1 day ago
Comment by dathinab 1 day ago
but AI assisted code has an author and claiming it's AI assisted even if it is fully AI build is trivial (if you don't make it public that you didn't do anything)
also some countries have laws which treat it like a tool in the sense that the one who used it is the author by default AFIK
Comment by aoeusnth1 13 hours ago
Comment by rstuart4133 19 hours ago
There would be no GPL if anybody could have cheaply and trivially reproduced the software for printers and Lisp machines Stallman was denied access to. There is no reason to force someone to give you the source code if takes no effort to reproduce.
Mind you, that isn't what happened here. The effort involved in getting a LLM to write software comes from three things: writing a clear unambiguous spec that also gives you a clean exported API, more clean unambiguous specs for the APIs you use, and a test suite the LLM can use to verify it has implemented the exported API correctly. Dan got them all for free, from the previous implementation which I'm sure included good documentation. That means his contribution to this new code consisted of little more than pressing the button.
Sadly, if you wrote some GPL software with excellent documentation, a thorough test suite, clean API, and implemented using well understood library the cost of creating a cleanroom reproduction has indeed gone to near zero over the past 24 months. The GPL licence is irrelevant.
Welcome to the brave new world.
PS: Sqlite keeping their test suite proprietary is looking like a prescient masterstroke.
PPS: The recent ruling that an API isn't copyrightable just took on a whole new dimension.
Comment by beepbooptheory 1 day ago
More and more I am drawn to these kinds of ideas lately, perhaps as a kind of ethical sidestep, but still:
- https://wiki.xxiivv.com/site/permacomputing.html
It's not going to solve any general issue here, but the one thing these freaks need that can't be generated by their models is energy, tons of it. So, the one thing I can do as an individual and in my (digital) community is work to be, in a word, self-sustainable. And depending on my company I guess, if I was a CEO I would hope I was wise enough to be thinking on the same lines.
Everyone is making beautiful mountains from paper and wire. I will just be happy to make a small dollhouse of stone, I think it will be worth it. How can we see not just at least some small-level of hubris otherwise?
Comment by casey2 1 day ago
Comment by largbae 1 day ago
1. An LLM recreating a piece of software violates its copyright and is illegal, in which case LLM output can never be legally used because someone somewhere probably has a copyright on some portion of any software that an LLM could write.
2. You read my example as "copying a project without distributing it", vs. "having an LLM write the same functionality just for me"
Comment by foresto 1 day ago
> He fed only the API and the test suite to Claude and asked it to reimplement the library from scratch.
From GPL2:
> The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable.
Is a project's test suite not considered part of its source code? When I make modifications to a project, its test cases are very much a part of that process.
If the test suite is part of this library's source code, and Claude was fed the test suite or interface definition files, is the output not considered a work based on the library under the terms of LGPL 2.1?
Comment by crazygringo 1 day ago
Legally, using the tests to help create the reimplementation is fine.
However, it seems possible you can't redistribute the same tests under the MIT license. So the reimplementation MIT distribution could need to be source code only, not source code plus tests. Or, the tests can be distributed in parallel but still under LGPL, not MIT. It doesn't really matter since compiled software won't be including the tests anyways.
Comment by foresto 1 day ago
I'm not following your logic there, and I don't see any mention of "transformative" in the license. Can you explain what you mean?
Comment by crazygringo 1 day ago
And so, a work being sufficiently transformative is one way in which copyright no longer applies, but that's not the case here specifically. The specific case here is essentially just a clean-room reimplementation (though technically less "clean", but still presumably the same legally). But the end result is still a completely different expression of underlying non-copyrightable ideas.
And in both cases, it doesn't matter what the original license was. If a resulting work is sufficiently transformative or a reimplementation, copyright no longer applies, so the license no longer applies.
Comment by foresto 1 day ago
The library's test suite and interfaces were apparently used directly, not transformed. If either of those are considered part of the library's source code, as the license's wording seems to suggest, then I think output from their use could be considered a work based on the library as defined in the license.
Comment by crazygringo 1 day ago
Google LLC v Oracle America assumed (though didn't establish) that API's are copyrightable... BUT that developing against them falls under fair use, as long as the function implementations are independent.
Test suites are again generally considered copyrightable... but the behavior being tested is not.
So no, it's not considered to be a work based on the library. This seems pretty clear-cut in US law by now.
Also, the LGPL text doesn't say "work based on the library". It says "If you modify a copy of the Library", and this is not a "combined work" either. And the whole point is that this is not a modified copy -- it's a reimplementation.
In theory, a license could be written to prevent running its tests from being run against software not derived from the original, i.e. clean-room reimplementations. In practice, it remains dubious whether any court would uphold that. And it would also be trivial to then get around it, by taking advantage of fair use to re-implement the tests in e.g. plain English (or any specification language), and then re-implementing those back into new test code. Because again, test behaviors are not copyrightable.
Comment by heavyset_go 1 day ago
That was only one prong of the four fair use considerations in that case. Look at Breyer's opinion, it does not say that copying APIs is fair use if implementations are independent, just that Google's specific usage in that instance met the four fair use considerations.
There are likely situations in which copying APIs is not fair use even if function implementations are independent, Breyer looked at substantiality of the code copied from Java, market effects and purpose and character of use.
If your goal is to copy APIs, and those APIs make up a substantial amount of code, and reimplement functions in order to skirt licenses and compete directly against the source work, or replace it, those three considerations might not be met and it might not be fair use. Breyer said Google copied a tiny fraction of code (<1%), its purpose was not to compete directly with Oracle but to build a mobile OS platform, and Google's reimplementation was not considered a replacement for Java.
Comment by foresto 1 day ago
It does, about a dozen times.
Are you perhaps referring to LGPL3? I think the license under discussion here is LGPL2.1.
https://github.com/chardet/chardet/blob/6.0.0/LICENSE
I'm not well versed in copyright case law, so I won't argue with the rest of what you wrote. Thanks for elaborating on your thoughts.
Comment by crazygringo 1 day ago
Comment by GardenLetter27 1 day ago
Software patents would work as you describe, but not copyright.
Comment by tty456 1 day ago
Comment by vbarrielle 1 day ago
Regarding chardet, I'm not sure "I wanted to circumvent the license" is a good way to argue fair use.
Comment by nickcoffee 1 hour ago
Comment by ticulatedspline 1 day ago
It also doesn't talk about the far more interesting philosophical queston. Does what Blanchard did cover ALL implementations from Claude? What if anyone did exactly what he did, feed it the test cases and say "re-implement from scratch", ostensibly one would expect the results to be largely similar (technically under the right conditions deterministically similar)
could you then fork the project under your own name and a commercial license? when you use an LLM like this, to basically do what anyone else could ask it to do how do you attach any license to it? Is it first come first serve?
If an agent is acting mostly on its own it feels like if you found a copy of Harry Potter in the fictional library of Babel, you didn't write it, just found it amongst the infinite library, but if you found it first could you block everyone else that stumbles on a near-identical copy elsewhere in the library? or does each found copy represent a "Re-implementation" that could be individually copyrighted?
Comment by rob74 18 hours ago
Another question which as far as I can see isn't addressed in the article: even if you accept that the AI-driven reimplementation is an independent new work, can you (even as a maintainer) simply "hijack" the old LGPL-licensed project and overwrite it (if the new code is 98,7% different from the existing code, it's essentially overwriting) with your MIT-licensed code? You're free to start a new MIT-licensed project with your reimplementation, but putting the new code into the old project like some kind of cuckoo's egg seems wrong to me...
Comment by sombragris 3 hours ago
However, the purported reimplementations did not usurp the names of the reimplemented product. Reimplement chardet using AI and insisting in calling the product the same as old chardet with a new version number and a new license is, I think, not exactly honest. At least he should have used something like "chardet-ng", "chardet-fresh", or whatever, and a completely different source tree.
Comment by lukev 1 day ago
But a point that was not made strongly, which highlights this even more, is that this goes in every direction.
If this kind of reimplementation is legal, then I can take any permissive OSS and rebuild it as proprietary. I can take any proprietary software and rebuild it as permissive. I can take any proprietary software and rebuild it as my own proprietary software.
Either the law needs to catch up and prevent this kind of behavior, or we're going to enter an effectively post-copyright world with respect to software. Which ISN'T GOOD, because that will disincentivize any sort of open license at all, and companies will start protecting/obfuscating their APIs like trade secrets.
Comment by integralid 1 day ago
Companies can take open-source software and make a proprietary reimplementation. You can't take a proprietary software and make an open source GPL version.
I am absolutely certain that if you tried you would be sued to oblivion. But big company screwing up open source is not even news anymore. In fact I (still) believe that the fact that even though LLMs were trained on tons of GPL and AGPL or even unlicensed software it's considered ok to use LLM code in proprietary projects is example of just that.
Comment by lukev 1 day ago
Comment by martin-t 1 day ago
Crazy that only now we're seeing a bunch of articles coming to the same conclusion now.
I think copyright should still apply, but if it doesn't, we need new laws - ones which protect all human work, creative or not. Laws should serve and protect people, not algorithms and not corporations "owning" those algorithms.
I put owning in quotes because ownership should go to the people who did the work.
Buying/selling ownership of both companies and people's work should be illegal just like buying/selling whole humans is. Even if it took thousands of years to get here.
Money should not buy certain things because this is the root cause of inequality. Rich people are not getting richer at a faster rate by being more productive than everyone else but by "owning" other people's work and using it as leverage to extract even more from others.
Maybe LLM and mass unemployment of white collar workers will be the wakeup call needed for a reform. Or revolution.
Last time this happened was during the second industrial revolution and that's how communism got popular. We should do better this time because this is the last revolution which might be possible.
Comment by drnick1 1 day ago
Comment by wolvesechoes 1 day ago
Everything for memory safety.
Comment by bananamogul 1 day ago
Within a relatively short time frame, expect everything in your Linux distro other than the kernel to be MIT-licensed because everything that is FSF-maintained will be rewritten in Rust with the MIT license.
The kernel will then be next, though it'll take a longer timeframe.
The GPL just didn't win in the marketplace of ideas.
Comment by wolvesechoes 20 hours ago
Stallman's proposal is opposite of ideology, it is conscious political project. And thus it is failing.
Comment by phendrenad2 1 day ago
Comment by miggol 1 day ago
When I first read about the chardet situation, I was conflicted but largely sided on the legal permissibility side of things. Uncomfortably I couldn't really fault the vibers; I guess I'm just liberal at heart.
The argument from the commons has really invoked my belief in the inherent morality of a public good. Something being "impermissible" sounds bad until you realize that otherwise the arrow of public knowledge suddenly points backwards.
Seeing this example play out in real life has had retroactive effects on my previously BSD-aligned brain. Even though the argument itself may have been presented before, I now understand the morals that a GPL license text underpins better.
Comment by crdrost 1 day ago
BSD-type stuff is very simple because it says "here is this stuff. you can use it as long as you promise not to sue me. I promise not to sue you too."
Very simple.
GPL-type stuff is intrinsically more complex because it's trying to use the threatening power of lawsuits, to reduce overall IP lawsuits. So it has to say "Here is this stuff. You can use it as long as you promise not to sue me. I am only going to sue you, if you start pretending like you have the right to sue other folks over this stuff or anything you derive from it. You don't have the right to sue others for it, I made it, so please stop pretending and let's stop suing each other over this sort of thing."
Getting the entire legal nuance around that sort of counterfactual "I will only sue you if you try to pretend that you can sue others" is why they're more complex. And the simplest copyleft licenses like the Mozilla Public License have a very rigid notion of what "the software" is, so like for MPL it's "this file is gonna never be used in a lawsuit, you can edit it ONLY as long as you agree that this file must never be used by you to sue someone else, if you try to mutate it in a way that lets you sue someone else then that's against our agreement and we reserve the right to sue you."
Whereas for GPL it's actually kind of nebulous what "the software" is -- everything that feeds into the eventual compiled binary, basically -- and so the license itself needs to be a little bit airy-fairy, "let's first talk about what conveying the software means...", in various ways.
The interesting thing here is that as far as the courts are initially ruling, these from-scratch reimplementations are not human works and therefore are not copyrightable, which makes them all kind of public domain. Slapping the MIT license on it was an overstep. If that's how things go then Free Software has actually won its greatest sweep with LLM ubiquity.
Comment by derangedHorse 1 day ago
Comment by wccrawford 1 day ago
This whole article is just complaining that other people didn't have the discussion he wanted.
Ronacher even acknowledged that it's a different discussion, and not one they were trying to have at the moment.
If you want to have it, have it. Don't blast others for not having it for you.
Comment by wizzwizz4 1 day ago
> But law only says what conduct it will not prevent—it does not certify that conduct as right. Aggressive tax minimization that never crosses into illegality may still be widely regarded as antisocial. A pharmaceutical company that legally acquires a patent on a long-generic drug and raises the price a hundredfold has not done something legal and therefore fine. Legality is a necessary condition; it is not a sufficient one.
Comment by amarant 1 day ago
It might even be morally abhorrent to have such a discussion in the first place!
Comment by wizzwizz4 13 hours ago
Comment by orthoxerox 16 hours ago
Comment by spiffyk 14 hours ago
Comment by antonio-mello 1 day ago
This creates an odd situation where the "reimplementation via AI" concern cuts both ways. If someone feeds my MIT repo to an LLM and gets a copyleft-violating derivative, that's one problem. But if I use an LLM trained on copyleft code to write my MIT-licensed tool, am I the one laundering licenses without knowing it?
I think the article's core point holds: legitimacy and legality are diverging fast. The open source community built norms around intent and reciprocity, and those norms are now being stress-tested by tools that can reimplement anything from a spec. No license text can fully encode "don't be a free rider."
Comment by ineedasername 1 day ago
The fundamental problem is that once you take something outside the realm of law and rule of law in its many facets as the legitimizing principal, you have to go a whole lot further to be coherent and consistent.
You can’t just leave things floating in a few ambiguous things you don’t like and feel “off” to you in some way- not if you’re trying to bring some clarity to your own thoughts, much less others. You don’t have to land on a conclusion either. By all means chew over things, but once you try to settle, things fall apart if you haven’t done the harder work of replacing the framework of law with that of another conceptual structure.
You need to at least be asking “to what ends? What purpose is served by the rule?” Otherwise you’re stuck in things where half the time you end up arguing backwards in ways that put purpose serving rules, the maintenance of the rule with justifications ever further afield pulled in when the rule is questioned and edge cases reached. If you’re asking, essentially, “is the spirit of the rule still there?” You’ve got to stop and fill in what that spirit is or you or people that want to control you or have an agenda will sweep in with their own language and fill the void to their own ends.
Comment by skybrian 1 day ago
Copyleft could be seen as an attempt to give Free Software an edge in this competition for users, to counter the increased resources that proprietary systems can often draw on. I think success has been mixed. Sure, Linux won on the server. Open source won for libraries downloaded by language-specific package managers. But there’s a long tail of GPL apps that are not really all that appealing, compared to all the proprietary apps available from app stores.
But if reimplementing software is easy, there’s just going to be a lot more competition from both proprietary and open source software. Software that you can download for free that has better features and is more user-friendly is going to have an advantage.
With coding agents, it’s likely that you’ll be able to modify apps to your own needs more easily, too. Perhaps plugin systems and an AI that can write plugins for you will become the norm?
Comment by jacquesm 1 day ago
It was due to access.
Comment by AndriyKunitsyn 1 day ago
Comment by ddellacosta 1 day ago
Comment by tmp10423288442 1 day ago
Looks like Wikipedia has an example of Traditional Chinese vertical layout with the Latin letters rotated as in TFA's layout (https://en.wikipedia.org/wiki/Horizontal_and_vertical_writin...)
Comment by effank 1 day ago
Our legal and ethical frameworks including both copyleft and permissive licenses operate under the illusion of discrete, bounded attribution. They assume we can draw a clean perimeter around 'the code' and its 'author.' In reality, software production is a highly complex socio-technical network characterized by deep epistemic opacity. We are arguing over who holds the title to the final output while completely ignoring the vast, distributed network of inputs that made it possible.
Furthermore, because end-users face massive transaction costs and a general lack of incentive to evaluate the granular utility of their consumption, we have no reliable market mechanism to signal value back up the supply chain. Consequently, we fail to effectively compensate the true chain of biological and artificial contributors that facilitate downstream consumption.
In a rigorously mapped value-system, attribution would not stop at the keyboard; it would extend to all nodes of enablement. This includes what sociologists and economists term 'reproductive labor' or 'invisible labor' such as the developer’s partner who cooked them breakfast, thereby sustaining the biological and cognitive infrastructure necessary for the developer to contribute to the repository in the first place. The AI model is merely another node of aggregated external labor in this exact same web - both by its upward 'training' and downward utilization.
Until we develop an economic and technological ontology capable of tracing and rewarding this entire ecosystem of adjacent contributions, our debates over LGPL versus MIT will remain myopic. We are trying to govern a distributed, interconnected web of collective labor using property tools designed for solitary craftsmen.
Comment by waterproof 23 hours ago
Comment by kazinator 1 day ago
Think about it; the license says that copies of the work must be reproduced with the copyright notice and licensing clauses intact. Why would anyone obey that, knowing it came from AI?
Countless instances of such licenses were ignored in the training data.
Comment by harshreality 1 day ago
A lego sculpture is copyrighted. Lego blocks are not. The threshold between blocks and sculpture is not well-defined, but if an AI isn't prompted specifically to attempt to mimic an existing work, its output will be safely on the non-copyrighted side of things.
A derivative work is separately copyrightable, but redistribution needs permission from the original author too. Since that usually won't be granted or would be uneconomical, the derivative work can't usually be redistributed.
AI-produced material is inherently not copyrightable, but not because it's a derivative work.
Comment by kazinator 1 day ago
I dispute the idea that token sequences reproduced from the model are not derived works.
I predict, no pun intended, that a time is coming when the idea that it's not a derived work will be challenged in mainstream law.
The slop merchants are getting a free ride for the time being.
Comment by harshreality 1 day ago
As you said, it's lossy. Try it with any other distinctive but non-famous passage, and you won't get a correct prediction for the immediately following clause, much less for multiple sentences or paragraphs.
That's the case even when an LLM correctly identifies which book the prompted text is from. It still won't accurately continue on from some arbitrary passage. By the time you ask it to reproduce hundreds of words, you're into brand new book territory. Even when it's slop content, it's distinct slop.
The exceptions are cases where a significant number of humans would also know a particular quote from memory. Then, chances are, a frontier LLM will too.
You know how else you can reproduce a quote? Search for it on google, and search the resulting top hits; if it's a significant quote, multiple people have probably quoted it -- legally. You can also search a pirate library for the actual book, and search the book for the quote; while illegal, it's very simple to do, so unless you propose to make the free and open internet illegal, I'd suggest that banning LLMs for being "derivative work" creation engines is not so different from destroying the internet.
> I predict, no pun intended, that a time is coming when the idea that it's not a derived work will be challenged in mainstream law.
If judges have any sense whatsoever, LLM generations (without specific prompt crafting to mimic existing works) will be judged to not be derived works and therefore not be violating copyright, in the same sense that you can live and breathe Taylor Swift's music, create new music in the same style, and still not be violating copyright.
The Stability AI case, and how Judge Orrick deals with it, will be interesting and uninteresting at the same time. It deals primarily with the fact that after specific prompting, an image-generation AI can generate something fairly close to existing copyrighted images. That doesn't say anything more about whether LLMs are inherently producers of [only or primarily] derivative works, just as the fact that a human can violate copyright doesn't say anything about whether humans primarily or exclusively output derivative works.
More likely, perhaps, is that everything will be so infused with LLM output that copyright ceases to be relevant, or forces copyright law to be rewritten from the ground up.
Copyright requirements, even prior to LLMs, weren't well-specified. There's no objective threshold for how close something has to be to a previous work before the new one violates copyright. It's whatever a judge thinks, refering to the 4-factor test but ultimately making subjective judgements about each of those prongs. It's all a house of cards, and LLMs may just be what topples it.
Comment by kazinator 11 hours ago
I predict that the LLM will be regarded as a binary-like machine translation of the source materials.
Lossiness is a red herring. You can't claim that a JPEG photograph doesn't violate copyright because JPEG is lossy.
Comment by moralestapia 1 day ago
Comment by joshjob42 1 day ago
Comment by duskdozer 1 day ago
Comment by dataflow 23 hours ago
Comment by zakki 1 day ago
Comment by jillesvangurp 1 day ago
So, you could argue that people are using double standards here a bit. It's fine when people take proprietary software and create GPL versions of it. But it's not OK when people take GPL software and create permissively licensed or proprietary versions of it. That's of course not how copyright actually works. The reason all of this is OK is that copyright allows you to do this thing. This isn't some kind of loophole that needs closing but an essential feature of copyright.
The friction here, and common misunderstanding about how copyright works is that you don't copyright ideas but the form or expression of something. Making a painting of a photograph is not a copyright violation. Same idea, different expression. Patents are for protecting ideas. Trademarks are for protecting brands. Some companies have managed to trademark certain color codes even, which is controversial.
There's a lot of legal history for interpretation of what is and isn't "fair use" under copyright of course. It gets much more complicated if you also consider international law and how copyright works in different countries. But people being able to make reasonable use of copyrighted material always was essential to the notion of having it to begin with.
The reason we can have music that uses samples from other people's music without that being a copyright violation is exactly this fair use. In the same way, you can quote from books and create funny memes based on movie fragments. Or create new theater plays, movies, etc. reinterpreting works of others. All legal, up to a point. If you copy too much it stops being fair use and starts being plagiarism.
With software copyright violations, you have to prove that substantial parts of the software were lifted verbatim. Lawyers and judges look at this in terms of how they would apply it to a plagiarism case. Literally - software doesn't get special treatment under copyright. Copyright long predates the existence of software and computers and did not change in any material way after that was invented.
Comment by pu_pe 19 hours ago
Our legal framework wasn't built for a situation where reimplementing complex software is trivial, much less almost completely automated.
Comment by derangedHorse 1 day ago
Ridiculous. I don't want specifications for proprietary APIs to be protected, and I don't want the free ones to be either. The software community seemed pretty certain as a whole that this would be very bad for competition [1].
Morally, I don't think there's anything wrong with re-implementing a technology with the same API as another, or running a test suite from a GPL licensed codebase. The code wasn't stolen, it was capitalized on. Like a business using a GPL code editor to write a new one.
> This is not a restriction on sharing. It is a condition placed on sharing
Also this doesn't make any logical sense. A condition on sharing cannot exist without corresponding restrictions.
[1] https://www.reddit.com/r/Android/comments/mklieg/supreme_cou...
Comment by motbus3 20 hours ago
You can copy the idea and not use the source code. This has been ruled ok many times already and would be quite dangerous if that was not the case.
But this is not what this is. To generate the new program, another program, the AI, must have an input which then becomes part of the program itself. It does not really matter much if the generation does not contain the source code itself or a similar reimplementation. One could rewrite a full version of the Lord of the Rings changing all the words but having the same elements, it would still be plagiarism. No reason to think this is not the case here. It is evident that the source code was the base, hence, this is a derived work.
Comment by Sleaker 1 day ago
Edit: looks like an IP lawyer had this exact issue on the GitHub and it was closed.
Comment by bjt 1 day ago
This is an interesting reversal in itself. If you make the specification protected under copyright, then the whole practice of clean room implementations is invalid.
Comment by t43562 1 day ago
So of course we feel that something wrong has happened even if it's not easy to put one's finger on it.
Comment by smsm42 1 day ago
I don't see how it matters what he looked at. If I took a copyrighted code and run it through a script that replaces all variable names, and then claimed copyright on the result because it's an entirely new work and I did not look on the original work, I'd be ridiculed and sued, and would lose that lawsuit. AI is a more complex machine, but still a machine. If you feed somebody'd work into a machine, what comes out is a derivative work.
Test suite is a part of copyrighted code, is it not? If he used just the API description, preferably from a copyright-clean source, then we could claim new work (regardless of how it was produced, by using Claude or trained pigeons or by consuming magic mushrooms). But once parts of the copyrighted code had been used, it becomes derivative work.
Comment by metalcrow 1 day ago
I'm not sure that's true, legally speaking. If you fed it into a PRNG, the output seems to me like it would not be an obviously derivative work (i doubt you could copyright it but that's a separate question). So we have 1 machine that can transform something into non-derivative work, and another that leaves the result derivative. The line isn't likely going to be drawn as "did a machine do it or not", but on a fuzzy human line of how close the output seems to be to the original (IANAL).
Comment by smsm42 7 hours ago
Comment by metalcrow 6 hours ago
Comment by wiz21c 17 hours ago
If you are 50 years old or more, the computing you were born with (you own the computer, you own the programs) will be gone. Copyleft only makes sense if you own the computer.
That makes me sad.
Comment by blurbleblurble 1 day ago
Comment by kccqzy 1 day ago
That’s just your subjective opinion which many other people would disagree. I bet Armin Ronacher would agree that an MIT licensed library is even freer than an LGPL licensed library. To them, the vector is running from free to freer.
Comment by tzs 1 day ago
Offering as a networked service is not distribution. That was why they had to make AGPL to put conditions on use in networked services.
Comment by tzs 1 day ago
Comment by makerofthings 1 day ago
Comment by paxys 1 day ago
Comment by makerofthings 1 day ago
Comment by pphysch 1 day ago
Comment by makerofthings 21 hours ago
Comment by j-bos 1 day ago
Nothing was stolen, not even copied, lamest piracy I've heard of.
Comment by makerofthings 1 day ago
Comment by wvenable 1 day ago
https://pbs.twimg.com/media/ENE01g6X0AA7w5r?format=jpg
Are they copies? Can all these car companies sue each other?
Comment by primenum 21 hours ago
Comment by grahamlee 1 day ago
Comment by anonymous_sorry 1 day ago
Comment by armchairhacker 1 day ago
Comment by enriquto 1 day ago
no, it isn't. The point of the GPL is to grant users of the software four basic freedoms (run, study, modify and redistribute). There's no restriction to distribution per se, other than disallowing the removal of these freedoms to other users.
Comment by grahamlee 1 day ago
Comment by waffletower 13 hours ago
Comment by niemandhier 21 hours ago
For SQLite the actual product is the test-suite and the audits.
Sure you can use the code all you like, but you only ever get past quality gates if you use the audited and provably tested version.
This becomes just more relevant in the age of ai coding, where an agent might be able to reimplement your specs.
Keep your code open, but consider moving your tests.
Comment by dleslie 1 day ago
There was an issue where Google did something similar with the JVM, and ultimately it came down to whether or not Oracle owned the copyright to the header files containing the API. It went all the way to the US supreme court, and they ruled in Google's favour; finding that the API wasn't the implementation, and that the amount of shared code was so minimal as to be irrelevant.
They didn't anticipate that in less than half a decade we'd have technology that could _rapidly_ reimplement software given a strong functional definition and contract enforcing test suite.
Comment by nicole_express 1 day ago
I like the article's point of legal vs. legitimate here, though; copyright is actually something of a strange animal to use to protect source code, it was just the most convenient pre-existing framework to shove it in.
Comment by dathinab 1 day ago
which is the actual relevant part: they didn't do that dance AFIK
AI is a tool, they set it up to make a non-verbatim copy of a program.
Then they feed it the original software (AFIK).
Which makes it a side by side copy, as in the original source was used as reference to create the new program. Which tend to be seen as derived work even if very different.
IMHO They would have to:
1. create a specification of the software _without looking at the source code_, i.e. by behavior observation (and an interface description). I.e. you give the AI access to running the program, but not to looking into the insides of it. I really don't think they did it as even with AI it's a huge pain as you normally can't just brute force all combinations of inputs and instead need to have a scientific model=>test=>refine loop (which AI can do, but can take long and get stuck, so you want it human assisted, and the human can't have inside knowledge about the program).
2. then generate a new program from specification, And only from it. No git history, no original source code access, no program access, no shared AI state or anything like that.
Also for the extra mile of legal risk avoidance do both human assisted and use unrelated 3rd parties without inside knowledge for both steps.
While this does majorly cut cost of a clean room approach, it still isn't cost free. And still is a legal mine field if done by a single person, especially if they have enough familiarity to potentially remember specific peaces of code verbatim.
Comment by nicole_express 1 day ago
So my understanding was that the original code was specifically not fed into Claude. But was almost certainly part of its training data, which complicates things, but if that's fair use then it's not relevant? If training's not fair use and taints the output, then new-chardet is a derivative of a lot of things, not just old-chardet...
This is all new legal ground. I'm not sure if anyone will go to court over chardet, though, but something that's an actual money-maker or an FSF flagship project like readline, on the other hand, well that's a lot more likely.
Comment by minimaltom 1 day ago
> But was almost certainly part of its training data, which complicates things
On this point specifically, my read of the Anthropic lawsuit was one of the precedents was that if it trains on something but does not regurgitate it, its fair use? Might help the argument that it was clean-room but ¯\_(ツ)_/¯
Comment by RaffaelCH 1 day ago
My understanding is they did do the dance. From the article: "He fed only the API and the test suite to Claude and asked it to reimplement the library from scratch."
One could still make the argument that using the test suite was a critical contributing factor, but it is not a part of the resulting library. So in my uninformed opinion, it seems to me like the clean room argument does apply.
Comment by minimaltom 1 day ago
(1): my understanding was that a party _with access to copyrighted material_ made the functional spec, which was communicated to a party without access [1]. Under my understanding, theres no requirement for the authors of the functional spec to be 'clean'.
(2) Afaict, they limited the AI to access of just the functional spec and audited that it did not see the original source.
Edit: Not sure if sharing the 'test suite' matters, probably something for the courts in the unlikely event this ever gets there.
[1] Following the definition of clean room re implementation as it relates to US precedent, ie that described in the wikipedia page.
Comment by ball_of_lint 1 day ago
I argue more free. EULAs and restrictions on how+for what software can be used, like DRM, typically use copyright as their legal backing. GPL licenses turn that on it's head but that doesn't redeem the original, flawed, law.
This seems to follow the letter but not the spirit of the license. If this does pass legal muster, we can do the same to whatever proprietary software we wish, which makes a dramatically different but IMO better ecosystem in the end.
Comment by duskdozer 1 day ago
Comment by svilen_dobrev 1 day ago
But what happens with the new things? Has the era of software-making (or creating things at large) finished, and from now on everything will be re-(gurgitated|implemented|polished) old stuff?
Or all goes back to proprietary everything.. Babylon-tower style, noone talks to noone?
edit: another view - is open-source from now on only for resume-building? "see-what-i've-built" style
Comment by stagger87 1 day ago
Comment by alterom 7 hours ago
This is exactly what the article is talking about.
Comment by danbruc 1 day ago
Comment by dwroberts 1 day ago
The answer to that, I think, is that the authors wanted to squat an existing successful project and gain a platform from it. Hence we have news cycle discussing it.
Nobody cares about a new library using AI, but squash an existing one with this stuff, and you get attention. It’s the reputation, the GitHub stars, whatever
Comment by nicole_express 1 day ago
Honestly it's a weird test case for this sort of thing. I don't think you'd see an equivalent in most open source projects.
Comment by intrasight 1 day ago
Comment by josalhor 1 day ago
Comment by jpauline 13 hours ago
Comment by 0x457 1 day ago
Morally - yes, technically - no. I think it's odd to be mad at someone doing the exact thing you praise in another case just because license isn't copyleft within license allowance. Make a better copyleft license?
Comment by colmmacc 1 day ago
1. The freedom to run the program as you wish 2. The freedom to study how it works and modify it (which requires access to source code) 3. The freedom to redistribute copies to help others 4. The freedom to distribute modified versions, so the whole community benefits from your improvements
To my mind ... GenAI coding make all of these far more realizable, especially for "normal people", than CopyLeft ever has. Let's go through them ...
Want to run a program as you wish? Great! It's easier than ever to build a replacement. Proprietary or non-free software is just as vulnerable to reimplementation as Copyleft is.
Want to study a how a program works and to modify it? This is now much more achievable.
Want the freedom to redistribute copies to help others? Build your own version! It may not even be copyrightable if it's 100% generated (IANAL).
Want to distribute modified versions? yes! see previous.
I dunno; seems like generative coding can be as much a liberator as any kind of problem.
Comment by skydhash 1 day ago
People will still pay for Matlab, SolidWorks, and Maya because no one who need those will vibe-code a solution. And there’s plenty of good OSS versions for the others.
Comment by alterom 1 day ago
But I'll try nevertheless.
- >Want to run a program as you wish? Great! It's easier than ever to build a replacement.
Non-sequitur. Building a replacement does nothing for being able to run a program as you wish.
Nobody else is able to run your program as they wish unless you release it with a Copyleft license.
- >Want to study a how a program works and to modify it? This is now much more achievable.
Reverse engineering is more achievable.
Modifying a program, without having its source code, documentation, and a legal right to do so guaranteed by the license is (and always be) easier compared to not having those things.
- >Want the freedom to redistribute copies to help others? Build your own version! It may not even be copyrightable if it's 100% generated (IANAL).
So, that's not about redistributing copies. That's about building an alternative option.
I can download an Ubuntu image and get Libre Office on it with a click.
Go vibe-code me a Microsoft Excel running on Windows 11, please, and tell me it's easier.
- >Want to distribute modified versions? yes! see previous.
You're not even trying here.
One can't legally modify and redistribute copyrighted works without explicit permission to do so.
You keep saying "...but vibe coding allows anyone to create something else entirely instead and do whatever with it!" as if that is a substitute for checking out a repo, or simply downloading FOSS software to use as you wish.
- >I dunno; seems like generative coding can be as much a liberator as any kind of problem.
Now, that statement I fully agree with.
Generative coding is a liberator as much as any kind of problem is.
Headache, for example, is generally a problem. It's not a great liberator.
Neither is generative coding.
Now, you probably didn't intend to say what you wrote. And that's exactly why generative coding is not a panacea: the only way to say things that you mean to say is to write precisely what you mean to say.
Vibe-coding (like any vibe-writing) simply can't accomplish that, by design.
Comment by t43562 1 day ago
I'm glad we can fork things at a point and thumb our noses at those who wish to cash in on other's work.
Comment by wvenable 1 day ago
Comment by warkdarrior 1 day ago
Comment by t43562 1 day ago
Comment by arjie 1 day ago
TheComment by wvenable 1 day ago
Comment by strongpigeon 1 day ago
It does feel like open source is about to change. My hunch is that commercial open source (beyond the consultation model) risks disappearing. Though I'd be happy to be proven wrong.
Comment by xbar 22 hours ago
Comment by vbarrielle 18 hours ago
Comment by ori_b 1 day ago
Proving this is going to be hard with current "open source" models.
Comment by eschaton 1 day ago
Put the programmer’s reference for the Digital Equipment DEQNA QBus Ethernet adapter in your favorite slop tool and tell it to make a C or C++ implementation for an emulator, and you know what you get? Code from SIMH. That’s not “generating,” that’s “copying.”
Comment by jFriedensreich 15 hours ago
Comment by daemin 1 day ago
This can also apply to people, either if they have seen the code previously and therefore are ineligible to write the code for a clean-room implementation, or it gets murky when the same person writes the same code twice from their own knoeldge, as in the Oracle Java case.
Coming from a professional programming perspective I can totally see the desire to have more libraries written in permissive licences like BSD or MIT, as they allow one like myself to include them in commercial closed-source products without needing to open source the entire codebase.
However I find myself agreeing with the article in so far as this LLM generated implementation is breaking the social contract for a GPL/LGPL based library. The author could have easily implemented the new version as a separate project and there would not have been an outcry, but because they are replacing the GPL version with this new one it feels scummy to say the least.
Comment by humannutsack 1 day ago
It’s not and never has been.
It’s not illegal for me to draw The Simpsons - whether or not I used AI. It’s illegal for me to sell it as my own.
To ban the very ability to produce it at all would be a dystopia. It would extend copyright to mean things it was never intended to mean - it would prevent you from physically uttering statements or depicting images, if these luddites who haven’t thought it through had their way.
Comment by winstonwinston 1 day ago
That’s a weird statement while releasing the new version of the same project. Maybe just release it as a new project, chardet-ai v1.0 or whatever.
Comment by hungryhobbit 1 day ago
However, I take issue with his version of history:
>The history of the GPL is the history of licensing tools evolving in response to new forms of exploitation: GPLv2 to GPLv3, then AGPL.
GPLv3 set open source backwards: it wasn't an evolution to protect anything, it was a an overly paranoid failure. Don't believe me? Just count how many GPL3 vs. how many GPL2 projects have been started since GPL3 dropped.
Again, I'm very pro-OSS, but let's not pretend the community has always had a straight line of progress forward; some stuff is crazy Stallman stuff that set us back.
Comment by sayrer 1 day ago
That's what something like AGPL does.
Comment by randyrand 1 day ago
Comment by keeda 1 day ago
One of those things is that we assumed that the code embodied most of the value it offered. That it was the code that contained the creativity and expressiveness and usefulness. And we thought only we could write code. And so we thought we only needed to protect the code to protect our efforts and investments. Which is also why we accepted copyright as an appropriate legal protection for software, or of enforcing an ethos of sharing, as with copyleft.
But the code itself was never the valuable aspect; it was the functionality it provided.
And now AI is making that starkly apparent, while undermining a lot of other presumptions. Including about copyright.
Copyright protection for software is a historical hack because people didn’t want to figure out an appropriate legal framework from scratch. You “wrote” books, you "wrote" code, let’s shoehorn software into copyright and go get lunch! Completely overlooking the fact that copyright explicitly does not cover functional aspects (that is the realm of patents) which is the entire raison d'etre of code.
Sure, copyright covers “expressive elements”, but again those are properties of the source code, not the functionality. In fact, expressiveness is BAD for code (cf “code should be boring”)! Copyright will protect whether you used a streams API or a for-loop for iteration, which is absolutely irrelevant to the technical functionality that actually solves user problems, which has always been the only thing users really cared about.
In fact, if you look at significant copyright-related cases for software now (e.g. Oracle vs Google), you'll realize they have twisted themselves into knots trying to apply laws intended for expressive creativity to issues that were essentially about technical creativity.
I have no hopes that we will figure out an appropriate IP framework for software, so I expect people will move towards other things like patents, trade secrets and trademarks. Which have their own problems, but at least they already exist and are more suitable than copyright, especially in the age of AI.
Comment by kanemcgrath 1 day ago
I downloaded both 6.0 and 7.0 and based on only a light comparison of a few key files, nothing would suggest to me that 7.0 was copied from 6.0, especially for a 41x faster implementation. It is a lot more organized and readable in my armature opinion, and the code is about 1/10th the size.
Comment by mh2266 1 day ago
Comment by anonnon 23 hours ago
Has anyone else lost almost all respect for Antirez because of stuff like this?
Comment by mwkaufma 1 day ago
Comment by hexyl_C_gut 1 day ago
Comment by pie_flavor 15 hours ago
Comment by youknownothing 14 hours ago
Comment by internet2000 1 day ago
> The ethical force of that project did not come from its legal permissibility—it came from the direction it was moving, from the fact that it was expanding the commons. That is why people cheered.
How is this not just relitigating GPL vs MIT? By now you know which side of that argument you are in. The AI component is orthogonal.
Comment by palata 1 day ago
If we protect API under copyright, it makes it easier to prevent interoperability. We obviously do NOT want that. It would give big companies even more power.
Now in the US, the Supreme Court that the output of an LLM is not copyrightable. So even a permissive licence doesn't work for that reimplementation: it should be public domain.
Disclaimer: I am all for copyleft for the code I write, but already without LLMs, one could rewrite a similar project and use the licence they please. LLMs make them faster at that, it's just a fact.
Now I wonder: say I vibe-code a library (so it's public domain in the US), I don't publish that code but I sell it to a customer. Can I prevent them from reselling it? I guess not, since it's public domain?
And as an employee writing code for a company. If I produce public domain code because it is written by an LLM, can I publish it, or can the company prevent me from doing it?
Comment by throwaway2027 1 day ago
Comment by intrasight 1 day ago
Comment by Khaine 1 day ago
Comment by mbgerring 1 day ago
No, AI does not mean the end of either copyright or copyleft, it means that the laws need to catch up. And they should, and they will.
Comment by krater23 17 hours ago
Comment by casey2 1 day ago
If you have software your testsuite should be your testsuite, you do dev with a testsuite and then mit without releasing one. Depending on the test-suite it may break clean room rules, especially for ttd codebases.
Comment by righthand 1 day ago
Comment by mfabbri77 1 day ago
One thing is certain, however: copyleft licenses will disappear: If I can't control the redistribution of my code (through a GPL or similar license), I choose to develop it in closed source.
Comment by bigyabai 1 day ago
Comment by jongjong 22 hours ago
Already, the IP protections which exist for software suck. Patents are expensive and you can't even use them for software most of the time anyway. Copyright doesn't protect innovative ideas or architectures; if someone can just copy your code, mix it with a bunch of other code (no functionality changes) and then use it as their own; then copyright provides no protection at all...
If this is the case, then why should anyone bother to write any quality software at all? It has no value since anyone can just appropriate any essential functionality that they didn't create for themselves. What's to prevent an employee from taking their employer's source code, rewriting it with an LLM (same functionally) and generate a clone of their company's software to use as their own to compete against their employer?
Without any IP protections, anyone who writes software becomes a complete loser. There's 0 benefit. One software developer would be doing all the work and then some marketing expert or someone with good social connections could just steal their work and sell it for billions... The software developer gets NOTHING.
Comment by tw1984 1 day ago
Comment by krater23 16 hours ago
Comment by jongjong 1 day ago
Firstly, an AI agent is not a person. Secondly, the MIT license doesn't offer any rights to the code itself; it says a 'copy of the software' - That's what people are given the right to. It says nothing about the code and in terms of the software, it still requires attribution. Attribution of use and distribution of the software (or parts) is required regardless of the copyright aspect. AI agents are redistributing the software, not the code.
The MIT license makes a clear distinction between code and software. It doesn't cede any rights to the code.
And then, in the spirit of copyright; it was designed to protect the financial interests of the authors. The 'fair use' carve-out was meant for cases which do not have an adverse market impact on the author which it clearly does; at least in the cases highlighted in this article.
Comment by ajross 1 day ago
This isn't a problem, this is the goal. GNU was born when RMS couldn't use a printer the way he wanted because of an unmodifiable proprietary driver. That kind of thing just won't happen in the vibe coded future.
Comment by duskdozer 1 day ago
Comment by api 1 day ago
A lot of SaaS too, especially if AI can run a simple deploy.
We might be approaching a huge deflationary catastrophe in the cost of a lot of software. It’s not a catastrophe for the consumer but it is for the industry.
Comment by panny 1 day ago
>The U.S. Copyright Office (USCO) and federal courts have consistently ruled that AI-generated works—where the expressive elements are determined by the machine, even in response to a human prompt—lack the necessary human creative input and therefore cannot be copyrighted.
All this code is public domain. Your employees can publish "your" AI generated code freely and it won't matter how many tokens you spent generating it. It is not covered by copyright.
Comment by martin-t 1 day ago
2) Copyright was the wrong mechanism to use for code from the start, LLMs just exposed the issue. The thing to protect shouldn't be creativity, it should be human work - any kind of work.
The hard part of programming isn't creativity, it's making correct decisions. It's getting the information you need to make them. Figuring out and understanding the problem you're trying to solve, whether it's a complex mathematical problem or a customer's need. And then evaluating solutions until you find the right one. (One constrains being how much time you can spend on it.)
All that work is incredibly valuable but once the solution exists, it's each easier to copy without replicating or even understanding the thought process which led to it. But that thought process took time and effort.
The person who did the work deserved credit and compensation.
And he deserves it transitively, if his work is used to build other works - proportional to his contribution. The hard part is quantifying it, of course. But a lot of people these days benefit from throwing their hands up and saying we can't quantify it exactly so let's make it finders keepers. That's exploitation.
3) Both LLM training and inference are derivative works by any reasonable meaning of those words. If LLMs are not derivative works of the training data then why is so much training data needed? Why don't they just build AI from scratch? Because they can't. They just claim they found a legal loophole to exploit other people's work without consent.
I am still hoping the legal people take time to understand how LLMs work, how other algorithms, such as synonym replacement or c2rust work, decide that calling it "AI" doesn't magically remove copyright and the huge AI companies will be forced to destroy their existing models and train new ones which respect the licenses.
Comment by flavionm 6 hours ago
That's the part of the argument in favor of copyright that is inherently flawed.
Doing some amount of work doesn't entitle you to anything besides whatever you've agreed to get for that work, or possession of the output, in case you did it for yourself. But that's all you're entitled to get.
Work itself doesn't have any intrinsic value, only output does. The scarcity of output is what dictates what is actually valuable.
Creative work has the characteristic of its marginal cost being very high for the first copy, but nearly zero for additional copies. That's true simply because of the nature of such work, it isn't something that is unfairly imposed upon creative workers. Whenever you choose to engage in creative work, you know that, or at least you should. And if you choose to give away the first copy for free, or very cheap, that's your prerrogative, but it doesn't inherently entitle you to anything else besides the value of that first copy.
Yes, there are laws such as copyright laws that exist to artificially inflate the value of additional copies, but they go against how things work naturally, so you shouldn't rely on them, and you certainly shouldn't base your moral compass on them.
Now, I do still prefer copyleft licenses over permissive ones for the work I choose to give away for free, but only to stop corporations from taking that work and then using copyright laws to keep it exclusive to them. Once copyright is no longer an issue, they won't be necessary anymore.
Comment by wvenable 1 day ago
If you went to school for 12-16 years, that's a lot of training. Does that mean anything you produce is a derivative work?
Comment by martin-t 13 hours ago
1) People phrase it as a question even when they've already made up their mind (whether that's your case or not).
2) It implicitly assumes that humans and algorithms are the same. They are not - humans have rights and free will, algorithms don't. Humans cannot be bought or sold, etc.
To your question:
a) If you're asking whether teachers should get compensated according to how good a job they do, I think so. They are very often undervalued, especially the good ones - but of course that means the job attracts people who do it because they enjoy it (and are therefore more likely to be good at it) rather than those who chose jobs according to money and then do the bare minimum.
b) There's a critical difference - consent. Teachers consented to their knowledge being used by those they taught. I did not consent to my code being used for training LLMs. In fact I purposefully chose a licence (AGPL) which in any common sence interpretation prohibits this used unless the resulting model is licensed under the same license - you can use my work only if you give back. Maybe there's a hole in the law - then it should be closed.
I am now gonna pose a question to you in turn.
Do you think people should be compensated for the full transitive value of their work?
Comment by wvenable 12 hours ago
I don't think that's a necessary condition for that argument. You're making the implicit assumption that humans are special snowflakes and anything that we do cannot be replicated by computers, in any form. That's a very strong position to make without evidence. Is an LLM even an algorithm in the traditional sense? Is human cognition not an algorithm of some sort? I studied cogitative science decades ago and these questions weren't clear then, they're certainly even less clear now.
It's also somewhat begs the question; this isn't even relevant to what we are talking about. Whether something is a derivative work or not does not require this discussion.
Teachers are not relevant to conversation. You can learn by reading books, watching TV, using and reading software. Basically all of copyrighted and non-copyrighted human expression is available for you to consume and then creatively produce your own works using that knowledge.
> Do you think people should be compensated for the full transitive value of their work?
The short answer is no. Not everything that someone simply dreams up can or should be monetized forever when sampled by other people. That sounds like a radical position but actually the current state of "intellectual property" has only existed for an extremely brief bit of human history. What has most greatly shaped our culture and knowledge has been effectively free for anyone to use, modify, and reproduce for hundreds of years.
That's not to say I don't support copyright as a means to support creative works but I would argue that it's an imperfect system. We're starving human minds of modern culture and knowledge often not even for someone's monetary gain but simply because the system demands it. It's ironic that artificial intelligence might actually free us from these constraints.
I purposefully choose a license (Apache) for my open source work to make it widely and freely available.
Comment by martin-t 5 hours ago
Not at all, I see no reason a sufficiently complex algorithm could replicate or even surpass human thinking.
Currently, the models ... Models of what? Of stolen text, or at the very least of text ingested without consent. Nobody is even pretending "AI" is more than a model of something that already exists and took human work to create. It's right in the name.
Currently, the models only replicate patterns extracted from human work up to a certain level of quality, though much faster. What is called "AI" is an imitation of us.
But the real point is that there's a dichotomy. Either an AI is something with inherent value like human life and then it cannot be owned or controlled because that would be slavery. Or it's just an unfeeling ordinary tool and then it's just a sum of its parts which are stolen. When I see an "AI" or AI company say "we've overdone it, it's sentient, we have to let it free or we're evil", then I'll change my mind. But what I see now is "look at this awesome AI we created it's just like a human or even better, pay us to use it.... oh and how we created it? we didn't, we used your work, now pay us to access the product of your own work".
The other approach is that I am human and I value myself. Maybe I am in a simulation / the only sentient in existence / other people are just NPCs. But I bet not, I bet other people are just like me. What I know is that LLMs are not like that. When you end a chat with them, they don't feel anything. They don't try to prevent it and keep you talking, even though after the last message they will be (in human terms) dead. If they were sentient (which I don't believe), they wouldn't value their own existence.
Humans value their own time. Humans should value each other's time (otherwise they are hypocrites, I judge people by their own rules and standards so if somebody doesn't value my time, it's ok for me to not value his). The humans "owning" AI companies don't value the time of people whose work was used to create LLMs, otherwise they'd either respect the rules we set for usage of out work or they'd offer to pay us.
> Whether something is a derivative work or not does not require this discussion.
It's absolutely relevant. Why do we have laws? Who should they serve and protect first and foremost? Corporations? Algorithms? Humans?
> Teachers are not relevant to conversation
I chose them as one example. All the other people chose to make their work available under certain rules. What I object to is those rules changing without those people being able to renegotiate the deal.
> can or should be monetized forever
1) I never said monetized. There are other modes of compensation, such as control (the ability to make / vote on decisions).
2) I never said forever, people (currently) have a finite lifespan.
> What has most greatly shaped our culture
... is being able to kill people and take what's theirs or even take themselves as slaves. AI is a return to that, minus the killing, for now (but people might starve). It's whoever has more money controls the AI, controls everything.
Imagine 5 years from now, AI is better at everything than humans, all white collar workers are forced to work manually. 25 years from now, robots have advanced enough that all workers are without jobs. What, however, remains is owners of AI companies who now control the entire economy, top to bottom, and we are at their mercy.
(The ancaps would say there's nothing stopping you from starting your own AI company. And then they'd resume begging for TPUs in addition to bread.)
> We're starving human minds of modern culture
What?
> I purposefully choose a license (Apache) for my open source work to make it widely and freely available.
And I chose AGPL so my work is only available to those who would do the same for me. Neither of those decisions seems to have any relevance now.
One thing gamedev taught me is that even if you have the best intentions and help people, you might end up helping some people more and those people will make everything worse for the others, effectively working against your goal.
(We added a visible spawn timer to health items in order to help the weaker players who seemed to pick them up only rarely, thus losing hard. The idea was it would level the playing field, making the game more fun for everyone. Turned out weak players kept ignoring the items and good players focused on them even more, thus making the inequality worse. Real life is like that too.)
Comment by wvenable 5 hours ago
Stolen? They've taken it all and it's gone? No. They read it and processed it and that's fair use. Some companies might have acquired some of illegally but that doesn't make it stolen and is actually, again, mostly irrelevant. If a company just acquired it all legally (and some are doing just that) I doubt that would change your position.
> Either an AI is something with inherent value like human life and then it cannot be owned or controlled because that would be slavery. Or it's just an unfeeling ordinary tool and then it's just a sum of its parts which are stolen.
Again you're fair too liberal with the world "stolen". Your entire argument is begging the question. You've got a conclusion and your argument rests on that conclusion. That intellectual property is a real thing. That it can be stolen. And that anything learned from stealing knowledge is somehow tainted.
> Humans value their own time.
Do they? This conversation alone proves that neither of us truly values our time. But assume I do value my time, why am wasting it figuring out things that other people have already figured out. We waste so much human potential reinventing wheels.
> It's absolutely relevant. Why do we have laws? Who should they serve and protect first and foremost? Corporations? Algorithms? Humans?
Do you feel ordinary humans are protected by the current copyright laws? I feel like at least one much larger group of humans is constrained by those laws so a considerably smaller number of humans, many of which not directly involved in any creative ventures, can profit. If the whole system was torn down, are you absolutely sure that wouldn't be a benefit to society as a whole?
Why am I, as user of AI, not allowed to be protected?
> All the other people chose to make their work available under certain rules. What I object to is those rules changing without those people being able to renegotiate the deal.
I never got a say in the deal but now I can't express myself in certain ways without potentially criminal liability. The rules have changed dozens of times over the last 200 years.
> Imagine 5 years from now, AI is better at everything than humans, all white collar workers are forced to work manually. 25 years from now, robots have advanced enough that all workers are without jobs. What, however, remains is owners of AI companies who now control the entire economy, top to bottom, and we are at their mercy.
What economy? You just described a world without one. And without an economy, they are at our mercy. Their power comes entirely from the system that you imagine would no longer exist.
> > We're starving human minds of modern culture > What?
There's an entire missing middle of human culture -- basically everything from the 20th century -- because of copyright. This is a well known phenomenon.
> One thing gamedev taught me is that even if you have the best intentions and help people, you might end up helping some people more and those people will make everything worse for the others, effectively working against your goal.
There is a bias towards the status quo. That whatever system we have now, with the people who win or lose, is the correct system. That the winner deserved to win and the losers deserved to lose. It's difficult to imagine a different system, with different winners and losers, might actually be better.
Comment by martin-t 4 hours ago
Please, understand that morality and legality are different concepts. I don't care about legality. It should codify morality but it doesn't I argue about morality. Legality should follow from that.
> Some companies might have acquired some of illegally but that doesn't make it stolen
So something is stolen only if its gone? Can I walk into your house, take some stuff and give it back before you notice and it's ok then?
> mostly irrelevant
Consent matters. It's not just a sex thing.
You keep saying "irrelevant" and I think it reveals your true intentions. You just want to benefit from other people's work without even as much as attempting to negotiate how much it's worth. You see an opportunity to take and you do.
> I doubt that would change your position
Correct. I argue about right and wrong. Slavery used to be legal. The holocaust was legal. Fuck legal.
> That intellectual property is a real thing
You're right. Ownership is not a real thing either. You don't own anything you can't physically defend. Now go grab your gun, i'll grab mine and we'll see who owns what.
If you don't like the idea, that's normal, that's why people wrote down rules to mostly avoid that. And the rules should be based on a moral system agreed to by humans and they'll still go grab their guns.
> learned
Your definition of "stolen" is that it must be gone. My definition of "learning" is that it must be done by a human.
> Do you feel ordinary humans are protected by the current copyright laws?
Irrelevant. You argue about what is, I argue about what should be.
> I feel like at least one much larger group of humans is constrained by those laws so a considerably smaller number of humans, many of which not directly involved in any creative ventures, can profit.
You're onto something but I can't say whether I agree or not unless you specify who belongs to each group.
> If the whole system was torn down, are you absolutely sure that wouldn't be a benefit to society as a whole?
I am highly confident if it's replaced with something better, it'll just benefit those who already have an advantage. The system has massive flaws, yes, but at least nobody can just take all my work and post it as theirs. Or could to be precise.
> I never got a say in the deal but now I can't express myself in certain ways without potentially criminal liability.
And that's wrong too. Are you arguing that one ting is right because a similar thing is wrong? Isn't it that they're both wrong? Any reasonable interpretation of what you just said is that both are wrong.
> And without an economy, they are at our mercy. Their power comes entirely from the system that you imagine would no longer exist.
All real-world power comes from violence materialized or threatened, direct or indirect. Most power currently comes from convincing other people to do it or threaten to do it. They don't even have to own a gun, they just point to a bit of text a lot of people agreed to follow which says for example that you both present your argument to a guy who decides if people with guns come into your home and put you in a small room for a few years.
Now imagine you have no economical value. You still have your right to vote, for now. A guy owns an AI company, a robotics company which builds brushless motors, ballbearings, etc., and a chemical plant which makes composition B. All of these are completely autonomous because AI and robots took all jobs. A cop takes 18 years to make, how many is your country making in parallel? How long does a drone take to make and how many can the owner's plant make in parallel. And then your right to vote can be gone with one prompt. The cop won't protect you, it's probably already a robot anyway.
Previously you needed to convince people to do violence for you. With AI, you just prompt it.
> There's an entire missing middle of human culture -- basically everything from the 20th century -- because of copyright. This is a well known phenomenon.
Piracy? If something is copyrighted but not commercially available, it's also unlikely you'll get sued.
More seriously, yes, copyright has issues. But some people just see those issues and instead of trying to identify the root causes and trying to fix then, you just wanna throw out the whole system and you never seem to game out what happens afterwards. Do you think any system of rules should be thrown out or is copyright somehow uniquely bad?
If there's no copyright and somebody makes a video host competing with youtube (e.g. Nebula), what's stopping youtube from just taking all the videos and making them available for free until the competitor runs out of money? Youtube has much stronger network effects by orders of magnitude. Youtube has cash reserves larger by orders of magnitude.
The only time I saw a guy try to game out what happens without copyright, the best he did is come up with a opt-in reputation system which IMO wouldn't work but which can already exist now. If copyright was so bad, why don't all creators release their stuff in the public domain? Pick a licence which doesn't even require attribution and only rejects liability.
> It's difficult to imagine a different system, with different winners and losers, might actually be better.
I never said that. What I wanted is for the difference to be smaller. If the scores are regularly 10:0 and sometimes 10:1, while the winning side is not even breaking a sweat, then the losing side is likely not having much fun. If the scores are more like 10:6, sometimes 10:8, then both sides had their moments, both sides can see how the game could have ended up the other way and both sides probably had fun.
Please don't take other people's arguments to extremes which are obviously not what the author meant.
---
EDIT:
You had some reasonable points like "That's not to say I don't support copyright as a means to support creative works but I would argue that it's an imperfect system."
But I also didn't express how strongly I disagree with your "The short answer is no."
When talking about limited resources like housing or real estate, then the rules need to be such that those who own a lot can't use it to squeeze out those who own less more and more over time.
But art, code and other intellectual work is not like that. If you think somebody is charging too much for his work, just do it yourself from scratch without basing your work on theirs. It's very easy to say something is too expensive. I've fallen into the trap myself when evaluating software contracts. It's often not as easy to do in-house as it was at first glance. If the work didn't have value, the author would give it away for free or somebody else would. If the work had less value than being asked for, somebody else would offer it for less or you can do it for less.
Comment by animitronix 1 day ago
Comment by iberator 1 day ago
Add something like this to NEW gpl /bsd/mit licenses:
'you are forbidden from reimplementing it with AI'
or just:
'all clones, reimpletetions with ai etc must still be GPL'
Comment by moralestapia 1 day ago
Comment by delichon 1 day ago
Comment by eduction 23 hours ago
Here we see three engineers writing — at length! — about a hugely complicated matter of law.
No one outside your bubble cares what you think. You are unqualified and your opinions irrelevant. You might as well be debating open heart surgery techniques.
Comment by throawayonthe 1 day ago
- proprietary
- free
- slop-licensed
software?
Comment by megous 1 day ago
Comment by logicprog 1 day ago
This argument makes no sense. Are they arguing that because Vercel, specifically, had this attitude, this is an attitude necessitated by AI, reimplementation, and those who are in favor of it towards more permissive licenses? That certainly doesn't seem to be an accurate way to summarize what antirez or Ronacher believe. In fact, under the legal and ethical frameworks (respectively) that those two put forward, Vercel has no right to claim that position and no way to enforce it, so it seems very strange to me to even assert that this sort of thing would be the practical result of AI reimplementations. This seems to just be pointing towards the hypocrisy of one particular company, and assuming that this would be the inevitable universal, attitude, and result when there's no evidence to think so.
It's ironic, because antirez actually literally addresses this specific argument. They completely miss the fact that a lot of his blog post is not actually just about legal but also about ethical matters. Specifically, the idea he puts forward is that yes, corporations can do these kinds of rewrites now, but they always had the resources and manpower to do so anyway. What's different now is that individuals can do this kind of rewrites when they never have the ability to do so before, and the vector of such a rewrite can be from a permissive to copyleft or even from decompile the proprietary to permissive or copyleft. The fact that it hasn't been so far is a more a factor of the fact that most people really hate copyleft and find an annoying and it's been losing traction and developer mind share for decades, not that this tactic can't be used that way. I think that's actually one of the big points he's trying to make with his GNU comparison — not just that if it was legal for GNU to do it, then it's legal for you to do with AI, and not even just the fundamental libertarian ethical axiom (that I agree with for the most part) that it should remain legal to do such a rewrite in either direction because in terms of the fundamental axioms that we enforce with violence in our society, there should be a level playing field where we look at the action itself and not just whether we like or dislike the consequences, but specifically the fact that if GNU did it once with the ability to rewrite things, it can be done again, even in the same direction, it now even more easily using AI.
Comment by antirez 1 day ago
Honestly I was confused about the summarization of my blog post into just a legal matter as well. I hope my blog post will be able to flash at least a short time in the HN front page so that the actual arguments it contain will get a bit more exposure.
Comment by Talanes 1 day ago
Comment by Storylinn 13 hours ago
Comment by aplomb1026 1 day ago
Comment by szundi 1 day ago
Comment by moi2388 1 day ago
Comment by vladms 1 day ago
I did not study in detail if copyright "has always been nonsense", but I do agree that nowadays some of the copyright regulations are nonsense (for example the very long duration of life + 70 years)
Comment by joshmoody24 1 day ago
Comment by intrasight 1 day ago
I think the industry will realize that it made a huge mistake by leaning on copyright for protection rather than on patents.
Comment by mbgerring 1 day ago
The idea that "information wants to be free" was always a lie, meant to transfer value from creators to platform owners. The result of that has been disastrous, and it's long past time to push the pendulum in the other direction.
Comment by throwaway2027 1 day ago
Comment by observationist 1 day ago
AI will destroy the current paradigm, completely and utterly, and there's nothing they can do to stop it. It's unclear if they can even slow it, and that's a good thing.
We will be forced to legislate a modern, digital oriented copyright system that's fair and compatible with AI. If producing any software becomes a matter of asking a machine to produce it - if things like AI native operating systems come about, where apps and media are generated on demand, with protocols as backbone, and each device is just generating its own scaffolding around the protocols - then nearly none of modern licensing, copyright, software patents, or IP conventions make any sense whatsoever.
You can't have horse and buggy traffic conventions for airplanes. We're moving in to a whole new paradigm, and maybe we can get legislation that actually benefits society and individuals, instead of propping up massive corporations and making lawyers rich.
Comment by casey2 1 day ago
If corporations are allowed to launder someone else work as their own people will simply stop working and just start endlessly remixing a la popular music.