Show HN: Sweep, Open-weights 1.5B model for next-edit autocomplete
Posted by williamzeng0 2 days ago
Hey HN, we trained and open-sourced a 1.5B model that predicts your next edits, similar to Cursor. You can download the weights here (https://huggingface.co/sweepai/sweep-next-edit-1.5b) or try it in our JetBrains plugin (https://plugins.jetbrains.com/plugin/26860-sweep-ai-autocomp...).
Next-edit autocomplete differs from standard autocomplete by using your recent edits as context when predicting completions. The model is small enough to run locally while outperforming models 4x its size on both speed and accuracy.
We tested against Mercury (Inception), Zeta (Zed), and Instinct (Continue) across five benchmarks: next-edit above/below cursor, tab-to-jump for distant changes, standard FIM, and noisiness. We found exact-match accuracy correlates best with real usability because code is fairly precise and the solution space is small.
Prompt format turned out to matter more than we expected. We ran a genetic algorithm over 30+ diff formats and found simple `original`/`updated` blocks beat unified diffs. The verbose format is just easier for smaller models to understand.
Training was SFT on ~100k examples from permissively-licensed repos (4hrs on 8xH100), then RL for 2000 steps with tree-sitter parse checking and size regularization. The RL step fixes edge cases SFT can’t like, generating code that doesn’t parse or overly verbose outputs.
We're open-sourcing the weights so the community can build fast, privacy-preserving autocomplete for any editor. If you're building for VSCode, Neovim, or something else, we'd love to see what you make with it!
Comments
Comment by leonardcser 2 days ago
I am the author of this Neovim plugin for edit completions. I was able to integrate it with the Sweep Edit model.
For anyone who is interested: https://github.com/leonardcser/cursortab.nvim
Comment by lasgawe 1 day ago
Comment by treyd 1 day ago
Comment by leonardcser 1 day ago
Comment by 9999gold 1 day ago
Comment by kevinlu1248 1 day ago
Comment by KronisLV 2 days ago
People posting stuff like this is really cool because otherwise it kinda feels like nobody gives a crap, for example even with Cline/RooCode/KiloCode there’s no good way for me to hook up an autocomplete model that either runs in Ollama or maybe a remote Cerebras Code model, like KiloCode doesn’t have a proper model configuration option even if it has it for the chat or regular agentic stuff - I don’t get why autocomplete is such a special case.
I guess what I’m saying is that I’m glad someone’s at least trying so I don’t have to keep a Copilot subscription just because I genuinely like their autocomplete and the rest of it is basically wasted: Claude Code and Codex and others are better for the actual chat/agentic stuff, KiloCode and others are really nice IDE plugins.
Comment by lostmsu 2 days ago
Comment by vanillameow 2 days ago
This is a really good plugin. I'm a diehard JetBrains user, I tried switching to VSCode and its various forks many times because of AI but muscle memory from years of use is hard to override. And for a lot of languages JetBrains is just much better, especially out of the box. But they dropped the ball so hard on AI it's unbelievable. Claude Code pulled it back a bit because at least now the cutting edge tools aren't just VSCode plugins, but I was still missing a solid autocomplete tool. Glad this is here to fill that niche. Very likely will be switching my GitHub copilot subscription to this.
I also really appreciate publishing open weights and allowing a privacy mode for anonymous trial users, even if it's opt-in. Usually these things seem to be reserved for paying tiers these days...
Comment by zarzavat 2 days ago
I'm starting to understand that there are two cultures.
Developers who are mostly writing new code get the most benefit from autocomplete and comparatively less from Claude Code. CC is neat but when it attempts to create something from nothing the code is often low quality and needs substantial work. It's kind of like playing a slot machine. Autocomplete, on the other hand, allows a developer to write the code they were going to write, but faster. It's always a productivity improvement.
Developers who are mostly doing maintenance experience the opposite. If your workflow is mostly based around an issue tracker rather than figma, CC is incredible, autocomplete less so.
Comment by norir 1 day ago
Comment by genghisjahn 1 day ago
Comment by zarzavat 1 day ago
The main "issue" I have with Claude is that it is not good at noticing when code can be simplified with an abstraction. It will keep piling on lines until the file is 3000 lines long. You have to intervene and suggest abstractions and refactorings. I'm not saying that this is a bad thing. I don't want Claude refactoring my code (GPT-5 does this and it's very annoying). Claude is a junior developer that thinks it's a junior. GPT-5 is a junior developer that thinks it's a senior.
[0]: https://www.folklore.org/Negative_2000_Lines_Of_Code.html
Comment by kevinlu1248 1 day ago
Comment by mark_l_watson 2 days ago
I will definitely try the 1.5B model but I usually use LLMs by taking the time to edit a large one-shot prompt and feed it to either one of the new 8B or 30B local models or to gemini 3 flash via the app, web interface, or API.
Small purpose-built models are largely under-appreciated. I believe that it is too easy to fall into the trap of defaulting to the strongest models and to over rely on them. Shameless plug: it is still incomplete, but I have released an early version on my book ‘Winning Big With Small AI’ - so, I admit my opinions are a little biased!
Comment by pdyc 2 days ago
Comment by cmrdporcupine 2 days ago
I think we're still in the early days of these systems. The models could be capable of a lot more than this "chat log" methodology.
Agree about JetBrains dropping the ball. Saddens me because I've also been a diehard user of their products since 2004.
Comment by qorrect 2 days ago
Comment by sitkack 2 days ago
You are still using it but no longer getting updates?
Comment by kevinlu1248 1 day ago
Comment by cmrdporcupine 1 day ago
Mainly, they're pushing Junie and it just isn't that good or compelling, when faced off against the competition.
The key thing for me is that I think they had an opportunity here to really rethink how LLMs could interact with an editor since they potentially controlled both the editor and the LLM interaction. But they just implemented another chat-based interaction model with some bells and whistles, and also were late getting it out really, and what they delivered seemed a bit meh.
I was hoping for something that worked more closely inside the editing process, inline in the code, not just completions and then an agentic log alongside.
I also don't like that I can't seem to get it to work with 3rd party LLM providers, really. It seems to allow specifying an OpenAI API compatible endpoint, but it's janky and doesn't seem to allow me to refresh and manage the list of models properly?
It just still seems half-baked.
I love Opus and I am a heavy CC user now, but I don't like that Claude Code is taking me out of my IDE, away from hands on with the code, and out of my editing process.And I don't like how it tries to take over and how weak its review flow is. I end up almost always with surprises during my review process, despite my finding the quality of its code and analysis quite good. To me there was a real chance here for a company like JetBrains to show its worth in applying AI in a more sensible way than Anthropic has.
VSCode and Zed have no appeal to me though. I've mostly gone back to emacs.
In the meantime, their IDEs themselves feel a bit stalled in terms of advancement. And they've always suffered from performance problems since I started using them over 20 ago.
Comment by KronisLV 1 day ago
I still buy a personal Ultimate license because I want to see them succeed even if like 80% of my time is spent either in a CLI or Visual Studio Code (for quicker startup and edits), a bit unfortunate that Fleet never got to be really good but oh well.
Comment by kevinlu1248 1 day ago
Comment by cmrdporcupine 1 day ago
I dislike VSCode very much, but I do think the foundational pieces of the JetBrain's IDEs are starting to show their age.
Comment by wwfn 2 days ago
I see most would-be-boilerplate code refactored so the redundant bit becomes a small utility or library. But most of what I write is for research/analysis pipelines, so I'm likely missing an important insight. Like more verbose configuration over terse convention?
For code structure, snippets tempting[1] ("iff[tab]" => "if(...){...}") handles the bare conditional/loop completes in a more predictable way and offline/without a LLM eating into RAM.
[1] https://github.com/joaotavora/yasnippet; https://github.com/SirVer/ultisnips; https://code.visualstudio.com/docs/editing/userdefinedsnippe...
Comment by djfdat 1 day ago
You bring up a good point with snippets though, and I wonder if that would be good information to feed into the LLM for autocomplete. That snippet is helpful if you want to write on condition at a time, but say you have a dozen conditions if statements to write with that snippet. After writing one, the LLM could generate a suggestion for the other 11 conditions using that same snippet, while also taking into consideration the different types of values and what you might be checking against.
As for RAM/processing, you're not wrong there, but with specialized models, specialized hardware, and improvements in model design, the number of people working under such restricted environments where they are concerned about resource use will decrease over time, and the utility of these tools will increase. Sure a lower-tech solution works just fine, and it'll continue to work fine, but at some point the higher-tech solution will have similar levels of friction and resource use for much better utility.
Comment by esafak 1 day ago
Comment by norir 1 day ago
Comment by notsylver 2 days ago
I threw together a vscode extension to run it and while the extension is rough, the model seems decent. I'm trying to keep my expectations contained, in the past local models have been absolutely terrible for inline completion, this seems much better already. I hope this kicks off more competition.
Comment by kevinlu1248 1 day ago
Comment by dainiusse 1 day ago
Comment by kleiba 2 days ago
I understand that the 1.5B is small enough to run locally... but does it actually in the Sweep AI Jetbrains plugin? That is, if I install the plugin, will I download the model automatically and the plugin doesn't phone home?
Comment by bjarteaarmolund 2 days ago
Comment by NewsaHackO 2 days ago
Comment by rkagerer 1 day ago
Can someone make a better plugin?
Comment by kevinlu1248 1 day ago
Comment by esquire_900 2 days ago
This looks really neat, interesting technical writeup as well!
Comment by kevinlu1248 1 day ago
Comment by martianlantern 2 days ago
Again amazing work! waiting for what you guys cook next
Comment by knowaveragejoe 1 day ago
Comment by kevinlu1248 1 day ago
https://blog.sweep.dev/posts/next-edit-jetbrains#building-au...
Comment by _ache_ 2 days ago
Comment by evanreichard 2 days ago
Which I've been using with Qwen3 Coder. As long as infill is supported, that should work. I'll try later today.
Comment by jmanandchanny 8 hours ago
Comment by mromanuk 2 days ago
Comment by kevinlu1248 1 day ago
Comment by WanderlingSmurf 2 days ago
Comment by kamranjon 2 days ago
Comment by evolving-silica 1 day ago
Comment by kevinlu1248 21 hours ago
But basically suggesting changes away from your cursor position
Comment by sheepscreek 2 days ago
I know there are the original autocomplete models that simply complete the endings. Then there are Cursor like models capable of editing/filling text between blocks of code. In essence, they look at both the text before the insertion point and after it - then find the best fitting completion in the middle. My guess is FIM is the latter.
Comment by aidos 2 days ago
Comment by logicallee 2 days ago
>We ran a genetic algorithm over 30+ diff formats
Can you you give more information about your genetic algorithm? Did you do crossover over the trained models (for example, ranking by fitness, take 20% most elite and create children by mixing their weights randomly)? Did you have a 'population size' (number of instances) for the genetic algorithms, and if so what was it?
Comment by zoobab 2 days ago
We can't keep calling those models "open source" if we have a black box and know precisely how they were made.
"Open weights" are the new binary.
Comment by kevinlu1248 1 day ago
Comment by jrop 1 day ago
Comment by magnat 2 days ago
Comment by denysvitali 2 days ago
> We’re open sourcing the model weights so the community can build fast, privacy-preserving autocomplete for every IDE - VSCode, Neovim, Emacs, and beyond.
Comment by magnat 2 days ago
Comment by KeplerBoy 2 days ago
Comment by ttoinou 2 days ago
Comment by pezgrande 2 days ago
Comment by woile 2 days ago
ollama pull hf.co/sweepai/sweep-next-edit-1.5B
Comment by woile 2 days ago
This kind of AI are the ones I like and I'm looking to run in my workstation.
Comment by kevinlu1248 1 day ago
Example here: https://huggingface.co/sweepai/sweep-next-edit-1.5B/blob/mai...
Comment by theophaniel 2 days ago
Comment by ihales 23 hours ago
If you have llama.cpp installed, you can start the model with `llama-server -hf sweepai/sweep-next-edit-1.5B --port 11434`
Add the following to your settings.json:
```
"features": {
"edit_prediction_provider": { "experimental": "sweep-local" },
},
"edit_predictions": {
"sweep_local": {
"api_url": "http://localhost:11434/v1/completions",
},
}
```Other settings you can add in `edit_predictions.sweep_local` include:
- `model` - defaults to "sweepai/sweep-next-edit-1.5B"
- `max_tokens` - defaults to 2048
- `max_editable_tokens` - defaults to 600
- `max_context_tokens` - defaults to 1200
I haven't had time to dive into Zed edit predictions and do a thorough review of Claude's code (it's not much, but my rust is... rusty, and I'm short on free time right now), and there hasn't been much discussion of the feature, so I don't feel comfortable submitting a PR yet, but if someone else wants to take it from here, feel free!
Comment by oakesm9 22 hours ago
Comment by ihales 16 hours ago
Comment by woile 1 day ago
{
"agent": {
"inline_assistant_model": {
"model": "hf.co/sweepai/sweep-next-edit-1.5B:latest",
"provider": "ollama",
},
}
}Comment by Imustaskforhelp 2 days ago
Comment by Imustaskforhelp 1 day ago
{ "agent": { "default_model": { "model": "hf.co/sweepai/sweep-next-edit-1.5B:latest" } }, "inline_completion": { "default_provider": { "model": "hf.co/sweepai/sweep-next-edit-1.5B" } }, "chat_panel": { "default_provider": { "model": "hf.co/sweepai/sweep-next-edit-1.5B" } } }
Then go on the down bottom AI button or that gemini like logo and then select sweep model. And also you are expected to run ollama run command and ollama serve it
ollama pull hf.co/sweepai/sweep-next-edit-1.5B ollama run hf.co/sweepai/sweep-next-edit-1.5B
I did ask Chatgpt some parts about it tho and had to add this setting into my other settings too so ymmw but Its working for me
It's an interesting model for sure but I am unable to get tab auto_completion/inline in zed, I can ask it in summary and agentic mode of sorts and have a button at top which can generate code in file itself (which I found to be what I preferred in all this)
But I asked it to generate a simple hello world on localhost:8080 in golang and in the end it was able to but it took me like 10 minutes. But some other things like simple hello world was one shot for the most part
It's definitely an interesting model that's for sure. We need stronger model like these I can't imagine how strong it might be at 7B or 8B as iirc someone mentioned that this i think already has it or similar.
A lot of new developments are happening in here to make things smaller and I am all for it man!
Comment by mika6996 1 day ago
Comment by Imustaskforhelp 23 hours ago
I then pasted that whole convo into aistudio gemini flash to then summarize & give you the correct settings as my settings included some servers and their ip's by the zed remote feature too
Sorry that it didn't work. I um again asked from my working configuration to chatgpt and here's what I get (this may also not work or something so ymmv)
{ "agent": { "default_model": { "provider": "ollama", "model": "hf.co/sweepai/sweep-next-edit-1.5B:latest" }, "model_parameters": [] },
"ui_font_size": 16,
"buffer_font_size": 15,
"theme": {
"mode": "system",
"light": "One Light",
"dark": "One Dark"
},
// --- OLLAMA / SWEEP CONFIG ---
"openai": {
"api_url": "http://localhost:11434/v1",
"low_latency_mode": true
},
// TAB AUTOCOMPLETE (THIS IS THE IMPORTANT PART)
"inline_completion": {
"default_provider": {
"name": "openai",
"model": "hf.co/sweepai/sweep-next-edit-1.5B"
}
},
// CHAT SIDEBAR
"chat_panel": {
"default_provider": {
"name": "openai",
"model": "hf.co/sweepai/sweep-next-edit-1.5B"
}
}
}Comment by kevinlu1248 1 day ago
Comment by h33t-l4x0r 2 days ago
Comment by BoredPositron 2 days ago
Comment by gunalx 2 days ago
Comment by smusamashah 2 days ago
Comment by mgz 2 days ago
Comment by smusamashah 2 days ago
Comment by 8n4vidtmkvmk 2 days ago
I did buy their $100/yr AI but its about to run out.
Comment by keyle 2 days ago
It's really impressive so far, so quick to respond on a mac mini M2. And it appears to be accurate at least for the obvious questions.
I couldn't get it to work as an autocomplete of Zed unfortunately. It looks like it's hardwired to work with some providers and LMStudio is not included in the prediction engines list. Has anyone got a work around?
Comment by kevinlu1248 1 day ago
Comment by syntaxing 2 days ago
Comment by bangaladore 2 days ago
What about SFT?
Presumably basing this of Qwen is the reason it can be done for so cheap?
Comment by andruby 2 days ago
Comment by bradfa 1 day ago
Comment by ajayarama 2 days ago
Comment by kevinlu1248 1 day ago
Comment by Semaphor 1 day ago
Comment by kevinlu1248 21 hours ago
Comment by keepamovin 2 days ago
I am thinking that one effect is:
- it will become normal for meta-models to train a model specific to a particular task/product.
Also, differently, I'm quite sure that AGI is not available on this current path (useful tho it is), but that some algo improvements might crack ubiquitous trainable AGI. Probably including some kind of embodiment to provide world-models and emotions (which are essential to embodied survival and success).
Comment by kevinlu1248 1 day ago
Comment by keepamovin 1 day ago
I guess a nice advantage of backwardness here is that economic opportunities exist for those who can solve pain points in the use of existing intel. Older models often do almost as well at agentic tasks in reality, can probably go further.
Still, AGI should remove a lot of this making it redundant, and it will then be more about the intel than the tooling. But an opportunity exists now. We may not have widespread AGI until 8 - 10 years later, so plenty of money to be made in the meantime.
Comment by kevinlu1248 15 hours ago
Comment by pdyc 2 days ago
Comment by dainiusse 2 days ago
Comment by k9294 19 hours ago
Comment by sim04ful 2 days ago
Would instead of the RL step a constrained decoding say via something like xgrammar fix syntax generation issue ?
Comment by NitpickLawyer 2 days ago
It can, but you have to consider two things here:
a) constrained decoding ensures adherence to syntax, not semantics. Say you're editing a field in an enum in rust. You can write syntactically correct rust code that doesn't address the new field further in the code (say in a switch). You'd get correctly syntactic code, but the compiler will scream at you. RL works on both.
b) if your goal is to further train the model, so it works on many tasks, RL helps with exploring new paths and training the model further. Constrained grammars help with inference, but the model doesn't "learn" anything. With RL you can also have many reward functions at the same time. Say one that rewards good syntax, one that rewards "closing" all the functions so tree-sitter doesn't complain, and one that rewards 0 errors from the compiler. The model gets to train on all 3 at the same time.
Comment by kevinlu1248 1 day ago
The other one is that constrained decoding only works on CFGs (simpler grammars like JSON schemas) since only these ones can produce automatas which can be used for constrained decoding. Programming languages like Python and C++ aren't CFGs so it doesn't work.
Also constrained decoding generally worsens model quality since the model would be generating off-policy. So RL helps push corrected syntax back on-policy.
Comment by deepsquirrelnet 2 days ago
This seems like an ideal case for trying DFT as well. I’m not sure if you’re using trl, but I’d suggest checking that out.
Comment by kevinlu1248 1 day ago
Comment by ttoinou 2 days ago
Comment by kevinlu1248 1 day ago
Also that's crazy, M4 Mac?
Comment by ttoinou 1 day ago
Comment by vichle 2 days ago
Comment by bodegajed 2 days ago
Comment by moffkalast 2 days ago
Comment by bradfa 1 day ago
Comment by moffkalast 1 day ago
Comment by kevinlu1248 1 day ago
Comment by jychang 2 days ago
Comment by BoredomIsFun 2 days ago
Comment by gunalx 2 days ago
Comment by whimsicalism 2 days ago
I wonder whether we are perhaps the point of usefulness of 'next edit' code development in 2026 though.
Comment by _boffin_ 2 days ago
Comment by jedisct1 2 days ago
But how to use it instead of Copilot in VSCode ?
Comment by flanked-evergl 2 days ago
Comment by replete 2 days ago
Comment by BoredomIsFun 2 days ago
Comment by mika6996 1 day ago
Comment by BoredomIsFun 1 day ago
Comment by cmrdporcupine 1 day ago
I'd love to see them making a larger model in the 10-20b range maybe? I know most people wouldn't be able to run that on their machines, but some could.
Running on ollama locally on NVIDIA Spark GB10. Tried it also with vLLM. Pretty fast.
Comment by kevinlu1248 1 day ago
Comment by cmrdporcupine 1 day ago
Comment by mijoharas 1 day ago
Comment by cmrdporcupine 1 day ago
Comment by bberenberg 2 days ago
Comment by kevinlu1248 1 day ago
Comment by ragchronos 2 days ago
Comment by moelf 2 days ago
Comment by dajonker 2 days ago
Comment by rationably 2 days ago
Comment by _ache_ 2 days ago
Comment by kevinlu1248 1 day ago
Comment by _mugencode 2 days ago
This is a great resource to explore similar approach. https://blog.sweep.dev/posts/oss-next-edit
My notes so far https://kapilreddy.me/notes/2024/11/17/building-clojure-slm-...
Comment by rw_panic0_0 2 days ago
Comment by kevinlu1248 1 day ago
Comment by ing33k 2 days ago
Comment by wepaean 2 days ago
Comment by asyncze 2 days ago
Comment by plutodev 2 days ago
Comment by oefrha 2 days ago
Comment by subscribed 2 days ago
Comment by lelanthran 2 days ago
I agree that green accounts could be regarded as suspicious and, if it were me, I'd disclose each time I mention it.
Comment by kouteiheika 2 days ago
It's hard to compare without more details about the training process and the dataset, but, is it? Genuine question, because I had the opposite impression. Like, for example, recently I did a full finetuning run on a 3B model chewing through a 146k entry dataset (with 116k entries having reasoning traces, so they're not short) in 7 hours on a single RTX 6000.
Comment by kevinlu1248 1 day ago
Comment by dcreater 2 days ago
Comment by dang 2 days ago
Comment by kevinlu1248 1 day ago