Mistral releases Devstral2 and Mistral Vibe CLI
Posted by pember 16 hours ago
Comments
Comment by simonw 14 hours ago
llm install llm-mistral
llm mistral refresh
llm -m mistral/devstral-2512 "Generate an SVG of a pelican riding a bicycle"
https://tools.simonwillison.net/svg-render#%3Csvg%20xmlns%3D...Pretty good for a 123B model!
(That said I'm not 100% certain I guessed the correct model ID, I asked Mistral here: https://x.com/simonw/status/1998435424847675429)
Comment by Jimmc414 12 hours ago
Comment by simonw 12 hours ago
Comment by th0ma5 11 hours ago
Comment by vanschelven 10 hours ago
Comment by dugidugout 11 hours ago
Comment by bravetraveler 10 hours ago
It's perfectly fine to link for convenience, but it does feel a little disrespectful/SEO-y to not 'continue the conversation'. A summary in the very least, how exactly it pertains. Sell us.
In a sense, link-dropping [alone] is saying: "go read this and establish my rhetorical/social position, I'm done here"
Imagine meeting an author/producer/whatever you liked. You'd want to talk about their work, how they created it, the impact it had, and so on. Now imagine if they did that... or if they waved their hand vaguely at a catalog.
Comment by simonw 8 hours ago
Comment by bravetraveler 8 hours ago
You could be done, nothing is making you defend this (sorry) asinine benchmark across the internet. Not trying to (m|y)uck your yum, or whatever.
Remember, I did say linking for convenience is fine. We're belaboring the worst reading in comments. Inconsequential, unnecessary heartburn. Link the blog posts together and call it good enough.
Comment by Barbing 6 hours ago
I hadn’t seen the post. It was relevant. I just read it. Lucky Ten Thousand can read it next time even though I won’t.
Simon has never seemed annoying so unlike other comments that might worry me (even “Opus made this” even though it’s cool but I’m concerned someone astroturfed), that comment would’ve never raised my eyebrows. He’s also dedicated and I love he devotes his time to a new field like this where it’s great to have attempts at benchmarks, folks cutting through chaff, etc.
Comment by bravetraveler 6 hours ago
Yes, the LLM people will train on this. They will train on absolutely everything [as they have]. The comments/links prioritize engagement over awareness. My point, I suppose, if I had one is that this blogosphere can add to the chaff. I'm glad to see Simon here often/interested.
Aside: all this concern about over-fitting just reinforces my belief these things won't take the profession any time soon. Maybe the job.
Comment by simonw 8 hours ago
Comment by bravetraveler 8 hours ago
You bring the benchmark and anticipated their... cheesing, with a promise to catch them on it. Cool announcement of an announcement. Just do that [or don't]. In a hippy sense, this is no longer yours. It's out there. Like everything else anyone wrote.
Let the LLM people train on your test. Catch them as claimed. Publish again. Huzzah, industry without overtime in the comments. It makes sense/cents to position yourself this way :)
Obviously they're going to train on anything they can get. They did. Mouse, meet cat. Some of us in the house would love it if y'all would keep it down! This is 90s rap beef all over again
Comment by charcircuit 7 hours ago
Comment by bravetraveler 7 hours ago
Comment by tomrod 9 hours ago
Comment by bravetraveler 9 hours ago
No, no, remember? Points to the blog you were already reading! Working diligently to build a brand: podcast, paid newsletter, the works.
Comment by tomrod 3 hours ago
Comment by th0ma5 11 hours ago
Comment by dugidugout 11 hours ago
You asserted a pattern of conduct on the user simonw:
> I think constantly replying to everybody with some link which doesn't address their concerns
Then claimed that conduct was:
> condescending and disrespectful.
I am asking you to elaborate to whom simonw is condescending and disrespecting. I don't see how it follows.
Comment by Workaccount2 6 hours ago
So far though, the models good at bike pelican are also good at kayak bumblebee, or whatever other strange combo you can come up with.
So if they are trying to benchmaxx by making SVG generation stronger, that's not really a miss, is it?
Comment by majormajor 5 hours ago
Comment by 0cf8612b2e1e 7 hours ago
Comment by thatwasunusual 7 hours ago
I may be stupid, but _why_ is this prompt used as a benchmark? I mean, pelicans _can't_ ride a bicycle, so why is it important for "AI" to show that they can (at least visually)?
The "wine glass problem"[0] - and probably others - seems to me to be a lot more relevant...?
[0] https://medium.com/@joe.richardson.iii/the-curious-case-of-t...
Comment by simonw 7 hours ago
Honestly though, the benchmark was originally meant to be a stupid joke.
I only started taking it slightly more seriously about six months ago, when I noticed that the quality of the pelican drawings really did correspond quite closely to how generally good the underlying models were.
If a model draws a really good picture of a pelican riding a bicycle there's a solid chance it will be great at all sorts of other things. I wish I could explain why that was!
If you start here and scroll through and look at the progression of pelican on bicycle images it's honestly spooky how well they match the vibes of the models they represent: https://simonwillison.net/2025/Jun/6/six-months-in-llms/#ai-...
So ever since then I've continue to get models to draw pelicans. I certainly wouldn't suggest anyone take serious decisions on model usage based on my stupid benchmark, but it's a fun first-day initial impression thing and it appears to be a useful signal for which models are worth diving into in more detail.
Comment by thatwasunusual 5 hours ago
Why?
If I hired a worker that was really good at drawing pelicans riding a bike, it wouldn't tell me anything about his/her other qualities?!
Comment by suspended_state 5 minutes ago
Comment by vikramkr 2 hours ago
It's not a human intelligence - it's a totally different thing, so why would the same test that you use to evaluate human abilities apply here?
Also more directly the "all sorts of other things" we want llms to be good at often involve writing code/spatial reasoning/world understanding which creating an svg of a pelican riding a bicycle very very directly evaluates so it's not even that surprising?
Comment by simonw 4 hours ago
Comment by jtbaker 4 hours ago
Comment by wisty 7 hours ago
Yes it's like the wine glass thing.
Also it's kind of got depth. Does it draw the pelican and the bicycle? Can the penguin reach the peddles? How?
I can imagine a really good AI finding a funny or creative or realistic way for the penguin to reach the peddles.
An slightly worse AI will do an OK job, maybe just making the bike small or the legs too long.
An OK AI will draw a penguin on top of a bicycle and just call it a day.
It's not as binary as the wine glass example.
Comment by thatwasunusual 5 hours ago
> Yes it's like the wine glass thing.
No, it's not!
That's part of my point; the wine glass scenario is a _realistic_ scenario. The pelican riding a bike is not. It's a _huge_ difference. Why should we measure intelligence (...) in regards to something that is realistic and something that is unrealistic?
I just don't get it.
Comment by Fnoord 1 hour ago
It is unrealistic because if you go to a restaurant, you don't get served a glass like that. It is frowned upon (alcohol is a drug, after all) and impractical (wine stains are annoying) to fill a glass of wine as such.
A pelican riding a bike, on the other hand, is realistic in a scenario because of TV for children. Example from 1950's animation/comic involving a pelican [1].
[1] https://en.wikipedia.org/wiki/The_Adventures_of_Paddy_the_Pe...
Comment by vikramkr 2 hours ago
Comment by baq 13 hours ago
Comment by aschobel 12 hours ago
Comment by lagniappe 12 hours ago
Comment by tarsinge 12 hours ago
Comment by lagniappe 12 hours ago
Comment by tarsinge 11 hours ago
I may have missed something but where are we saying the website should be recreated with 1996 tech or specs? The model is free to use any modern CSS, there is no technical limitations. So yes I genuinely think it is a good generalization test, because it is indeed not in the training set, and yet it is easy an easy task for a human developer.
Comment by locallost 12 hours ago
Comment by lagniappe 12 hours ago
Comment by locallost 1 hour ago
Browsers are able to parse a webpage from 1996. I don't know what the argument in the linked comment is about, but in this one, we discuss the relevance of creating a 1996 page vs a pelican on a a bicycle in SVG.
Here is Gemini when asked how to build a webpage from 1996. Seems pretty correct. In general I dislike grand statements that are difficult to back up. In your case, if models have only a cursory knowledge of something (what does this mean in the context of LLMs anyway), what exactly they were trained on etc.
The shortened Gemini answer, the detailed version you can ask for yourself:
Layout via Tables: Without modern CSS, layouts were created using complex, nested HTML tables and invisible "spacer GIFs" to control white space.
Framesets: Windows were often split into independent sections (like a static sidebar and a scrolling content window) using Frames.
Inline Styling: Formatting was not centralized; fonts and colors were hard-coded individually on every element using the <font> tag.
Low-Bandwidth Design: Visuals relied on tiny tiled background images, animated GIFs, and the limited "Web Safe" color palette.
CGI & Java: Backend processing was handled by Perl/CGI scripts, while advanced interactivity used slow-loading Java Applets.
Comment by utopiah 10 hours ago
I'd be curious about that actually, feel like W3C specifications (I don't mean browser support of them) rarely deprecate and precisely try to keep the Web running.
Comment by baq 12 hours ago
Comment by tomashubelbauer 12 hours ago
Comment by lagniappe 12 hours ago
Comment by willahmad 14 hours ago
Yes, SVG is code, but not in a sense of executable with verifiable inputs and outputs.
Comment by jstummbillig 12 hours ago
Comment by andrepd 10 hours ago
Comment by hdjrudni 4 hours ago
Comment by iberator 11 hours ago
Comment by fauigerzigerk 11 hours ago
Comment by techsystems 8 hours ago
Comment by simonw 8 hours ago
Comment by cpursley 14 hours ago
Comment by aorth 14 hours ago
Comment by BudaDude 13 hours ago
Comment by lubujackson 10 hours ago
Comment by troyvit 10 hours ago
Comment by felixg3 13 hours ago
Comment by joombaga 12 hours ago
Comment by breedmesmn 12 hours ago
Comment by esafak 14 hours ago
Comment by kevin061 12 hours ago
(Surely they won't release it like that, right..?)
Comment by esafak 12 hours ago
That looks like the next flagship rather than the fast distillation, but thanks for sharing.
Comment by kevin061 11 hours ago
Comment by BoorishBears 8 hours ago
Google should be punishing these sites but presumably it's too narrow of a problem for them to care.
Comment by kevin061 8 hours ago
Comment by dmix 5 hours ago
Or at least a profit model. I don't see either on that page but maybe I'm missing something
Comment by ewoodrich 4 hours ago
Comment by ttul 5 hours ago
Comment by YetAnotherNick 12 hours ago
Comment by esafak 11 hours ago
edit: Mea culpa. I missed the active vs dense difference.
Comment by NitpickLawyer 10 hours ago
Devstral 2 is 123B dense. Deepseek is 37B Active. It will be slower and more expensive to run inference on this than dsv3. Especially considering that dsv3.2 has some goodies that make inference at higher context be more effective than their previous gen.
Comment by syntaxing 8 hours ago
Comment by aimanbenbaha 7 hours ago
Comment by esafak 5 hours ago
Comment by InsideOutSanta 11 hours ago
It spent about half an hour, correctly identified what the program did, found two small bugs, fixed them, made some minor improvements, and added two new, small but nice features.
It introduced one new bug, but then fixed it on the first try when I pointed it out.
The changes it made to the code were minimal and localized; unlike some more "creative" models, it didn't randomly rewrite stuff it didn't have to.
It's too early to form a conclusion, but so far, it's looking quite competent.
Comment by MLgulabio 11 hours ago
Comment by syntaxing 8 hours ago
Comment by seaal 4 hours ago
Comment by embedding-shape 15 hours ago
I'm a bit saddened by the name of the CLI tool, which to me implies the intended usage. "Vibe-coding" is a fun exercise to realize where models go wrong, but for professional work where you need tight control over the quality, you can obviously not vibe your way to excellency, hard reviews are required, so not "vibe coding" which is all about unreviewed code and just going with whatever the LLM outputs.
But regardless of that, it seems like everyone and their mother is aiming to fuel the vibe coding frenzy. But where are the professional tools, meant to be used for people who don't want to do vibe-coding, but be heavily assisted by LLMs? Something that is meant to augment the human intellect, not replace it? All the agents seem to focus on off-handing work to vibe-coding agents, while what I want is something even tighter integrated with my tools so I can continue delivering high quality code I know and control. Where are those tools? None of the existing coding agents apparently aim for this...
Comment by williamstein 15 hours ago
Comment by esafak 15 hours ago
Comment by 4b11b4 14 hours ago
Comment by embedding-shape 13 hours ago
This is exactly the CLI I'm referring to, whose name implies it's for playing around with "vibe-coding", instead of helping professional developers produce high quality code. It's the opposite of what I and many others are looking for.
Comment by chrsw 13 hours ago
Comment by hadlock 11 hours ago
A surprising amount of programming is building cardboard services or apps that only need to last six months to a year and then thrown away when temporary business needs change. Execs are constantly clamoring for semi-persistent dashboards and ETL visualized data that lasts just long enough to rein in the problem and move on to the next fire. Agentic coding is good enough for cardboard services that collapse when they get wet. I wouldn't build an industrial data lake service with it, but you can certainly build cardboard consumers of the data lake.
Comment by bigiain 6 hours ago
But there is nothing more permanent that a quickly hacked together prototype or personal productivity hack that works. There are so many Python (or Perl or Visual Basic) scripts or Excel spreadsheets - created by people who have never been "developers" - which solve in-the-trenches pain points and become indispensable in exactly the way _that_ xkcd shows.
Comment by pdntspa 14 hours ago
Claude Code not good enough for ya?
Comment by embedding-shape 13 hours ago
Still, I do use Claude Code and Codex daily as there is nothing better out there currently. But they still feel tailored towards vibe-coding instead of professional development.
Comment by vidarh 13 hours ago
Comment by Havoc 6 hours ago
Comment by jbs789 2 minutes ago
Comment by embedding-shape 12 hours ago
Comment by pdntspa 2 hours ago
Comment by johnfn 12 hours ago
Err, doesn’t it have /review?
Comment by victorbjorklund 12 hours ago
Comment by embedding-shape 10 hours ago
Imagine a GUI built around git branches + agents working in those branches + tooling to manage the orchestration and small review points, rather than "here's a chat and tool calling, glhf".
Comment by jbellis 12 hours ago
This is what we're building at Brokk: https://brokk.ai/
Quick intro: https://blog.brokk.ai/introducing-lutz-mode/
Comment by johanvts 15 hours ago
Comment by embedding-shape 13 hours ago
Comment by vidarh 13 hours ago
If you babysit every interaction, rather than reviewing a completed unit of work of some size, you're wasting your time second-guessing that the model won't "recover" from stupid mistakes. Sometimes that's right, but more often than not it corrects itself faster than you can.
And so it's far more effective to interact with it far more async, where the UI is more for figuring out what it did if something doesn't seem right, than for working live. I have Claude writing a game engine in another window right now, while writing this, and I have no interest in reviewing every little change, because I know the finished change will look nothing like the initial draft (it did just start the demo game right now, though, and it's getting there). So I review no smaller units of change than 30m-1h, often it will be hours, sometimes days, between each time I review the output, when working on something well specified.
Comment by johanvts 12 hours ago
Comment by macNchz 11 hours ago
Comment by reachtarunhere 12 hours ago
Comment by zmmmmm 10 hours ago
The chat interface is optimal to me because you often are asking questions and seeking guidance or proposals as you are making actual code changes. On reason I do like it is that its default mode of operation is to make a commit for each change it makes. So it is extremely clear what the AI did vs what you did vs what is a hodge podge of both.
As others have mentioned, you can integrate with your IDE through the watch mode. It's somewhat crude but still useful way. But I find myself more often than not just running Aider in a terminal under the code editor window and chatting with it about what's in the window.
Comment by embedding-shape 10 hours ago
> The chat interface
Seems very much not, if it's still a chat interface :) Figuring out a chat UX is easy compared to something that was creating with letting LLM fill in some parts from the beginning. I guess I'm searching for something with a different paradigm than just "chat + $Something".
Comment by zmmmmm 8 hours ago
Comment by embedding-shape 8 hours ago
It's all very fluffy and theoretical of course.
Comment by xmcqdpt2 1 hour ago
Comment by zmmmmm 7 hours ago
Comment by embedding-shape 6 hours ago
Comment by mhast 6 hours ago
"I want you to do feature X. Analyse the code for me and make suggestions how to implement this feature."
Then it will go off and work for a while and typically come back after a bit with some suggestions. Then iterate on those if needed and end with.
"Ok. Now take these decided upon ideas and create a plan for how to implement. And create new tests where appropriate."
Then it will go off and come back with a plan for what to do. And then you send it off with.
"Ok, start implementing."
So sure. You probably can work on this to make it easier to use than with a CLI chat. It would likely be less like an IDE and more like a planning tool you'd use with human colleagues though.
Comment by troyvit 9 hours ago
So you'd write a function name and then tell it to flesh it out.
function factorial(n) // Implement this. AI!
Becomes: function factorial(n) {
if (n === 0 || n === 1) {
return 1;
} else {
return n \* factorial(n - 1);
}
}
Last I looked Aider's maintainer has had to focus on other things recently, but aider-ce is a fantastic fork.I'm really curious to try Mistral's vibe, but even though I'm a big fanboi I don't want to be tied to just one model. Aider lets tier your models such that your big, expensive model can do all the thinking and then stuff like code reviews can run through a smaller model. It's a pretty capable tool
Edit: Fix formatting
Comment by zmmmmm 9 hours ago
Very much this for me - I really don't get why, given a new models are popping out every month from different providers, people are so happy to sink themselves into provider ecosystems when there are open source alternatives that work with any model.
The main problem with Aider is it isn't agentic enough for a lot of people but to me that's a benefit.
Comment by andai 14 hours ago
While True:
0. Context injected automatically. (My repos are small.)
1. I describe a change.
2. LLM proposes a code edit. (Can edit multiple files simultaneously. Only one LLM call required :)
3. I accept/reject the edit.
Comment by true2octave 8 hours ago
What matters is high quality specifications including test cases
Comment by embedding-shape 8 hours ago
Says the person who will find themselves unable to change the software even in the slightest way without having to large refactors across everything at the same time.
High quality code matters more than ever, would be my argument. The second you let the LLM sneak in some quick hack/patch instead of correctly solving the problem, is the second you invite it to continue doing that always.
Comment by bigiain 6 hours ago
I have a feeling this will only supercharge the long established industry practice of new devs or engineering leadership getting recruited and immediately criticising the entire existing tech stack, and pushing for (and often succeeding) a ground up rewrite in language/framework de jour. This is hilariously common in web work, particularly front end web work. I suspect there are industry sectors that're well protected from this, I doubt people writing firmware for fuel injection and engine management systems suffer too much from this, the Javascript/Nodejs/NPM scourge _probably_ hasn't hit the PowerPC or 68K embedded device programming workflow. Yet...
Comment by bigiain 6 hours ago
In my mind, it's somewhat orthogonal to code quality.
Waterfall has always been about "high quality specifications" written by people who never see any code, much less write it. Agile make specs and code quality somewhat related, but in at least some ways probably drives lower quality code in the pursuit of meeting sprint deadlines and producing testable artefacts at the expense of thoroughness/correctness/quality.
Comment by chrsw 13 hours ago
What kind of hardware do you have to be able to run a performant GPT-OSS-120b locally?
Comment by embedding-shape 12 hours ago
Comment by kristianp 9 hours ago
Comment by fgonzag 12 hours ago
There are many platforms out there that can run it decently.
AMD strix halo, Mac platforms. Two (or three without extra ram) of the new AMD AI Pro R9700 (32GB of RAM, $1200), multi consumer gpu setups, etc.
Comment by FuckButtons 10 hours ago
Comment by freakynit 4 hours ago
Here is what I think about the bigger model: It sits between sonnet 4 and sonnet 4.5. Something like "sonnet 4.3". The response sped was pretty good.
Overall, I can see myself shifting to this for reguar day-to-day coding if they can offer this for copetitive pricing.
I'll still use sonnet 4.5 or gemini 3 for complex queries, but, for everything else code related, this seems to be pretty good.
Congrats Mistral. You most probably have caught up to the big guys. Not there yet exactly, but, not far now.
Comment by pluralmonad 15 hours ago
Comment by tormeh 14 hours ago
Comment by klysm 15 hours ago
Comment by kilpikaarna 2 hours ago
Even the Gemini 3 announcement page had some bit like "best model for vibe coding".
Comment by jimmydoe 15 hours ago
Comment by isodev 14 hours ago
Comment by andai 14 hours ago
If you're actually making sure it's legit, it's not vibe coding anymore. It's just... Backseat Coding? ;)
There's a level below that I call Power Coding (like power armor) where you're using a very fast model interactively to make many very small edits. So you're still doing the conceptual work of programming, but outsourcing the plumbing (LLM handles details of syntax and stdlib).
Comment by HarHarVeryFunny 11 hours ago
Maybe common usage is shifting, but Karpathy's "vibe coding" was definitely meant to be a never look at the code, just feel the AI vibes thing.
Comment by isodev 10 hours ago
Also, we’re both “people in tech”, we know LLMs can’t conceptualise beyond finding the closest collection of tokens rhyming with your prompt/code. Doesn’t mean it’s good or even correct. So that’s why it’s vibe coding.
Comment by brazukadev 13 hours ago
sorry to disappoint you but that is also been considered vibecoding. It is just not pejorative.
Comment by theLiminator 12 hours ago
Imo, if you read the code, it's no longer vibecoding.
Comment by NitpickLawyer 14 hours ago
Comment by sunaookami 1 hour ago
Comment by tomashubelbauer 13 hours ago
Comment by giancarlostoro 9 hours ago
Comment by princehonest 13 hours ago
Comment by clusterhacks 11 hours ago
I've personally decided to just rent systems with GPUs from a cloud provider and setup SSH tunnels to my local system. I mean, if I was doing some more HPC/numerical programming (say, similarity search on GPUs :-) ), I could see just taking the hit and spending $15,000 on a workstation with an RTX Pro 6000.
For grins:
Max t/s for this and smaller models? RTX 5090 system. Barely squeezing in for $5,000 today and given ram prices, maybe not actually possible tomorrow.
Max CUDA compatibility, slower t/s? DGX Spark.
Ok with slower t/s, don't care so much about CUDA, and want to run larger models? Strix Halo system with 128gb unified memory, order a framework desktop.
Prefer Macs, might run larger models? M3 Ultra with memory maxed out. Better memory bandwidth speed, mac users seem to be quite happy running locally for just messing around.
You'll probably find better answers heading off to https://www.reddit.com/r/LocalLLaMA/ for actual benchmarks.
Comment by kpw94 9 hours ago
That's a good idea!
Curious about this, if you don't mind sharing:
- what's the stack ? (Do you run like llama.cpp on that rented machine?)
- what model(s) do you run there?
- what's your rough monthly cost? (Does it come up much cheaper than if you called the equivalent paid APIs)
Comment by clusterhacks 8 hours ago
I am usually just running gpt-oss-120b or one of the qwen models. Sometimes gemma? These are mostly "medium" sized in terms of memory requirements - I'm usually trying unquantized models that will easily run on an single 80-ish gb gpu because those are cheap.
I tend to spend $10-$20 a week. But I am almost always prototyping or testing an idea for a specific project that doesn't require me to run 8 hrs/day. I don't use the paid APIs for several reasons but cost-effectiveness is not one of those reasons.
Comment by Juminuvi 4 hours ago
Comment by bigiain 6 hours ago
Comment by clusterhacks 5 hours ago
Here are my lazy notes + a snippet of the history file from the remote instance for a recent setup where I used the web chat interface built into llama.cpp.
I created an instance gpu_1x_gh200 (96 GB on ARM) at lambda.ai.
connected from terminal on my box at home and setup the ssh tunnel.
ssh -L 22434:127.0.0.1:11434 ubuntu@<ip address of rented machine - can see it on lambda.ai console or dashboard>
Started building llama.cpp from source, history:
21 git clone https://github.com/ggml-org/llama.cpp
22 cd llama.cpp
23 which cmake
24 sudo apt list | grep libcurl
25 sudo apt-get install libcurl4-openssl-dev
26 cmake -B build -DGGML_CUDA=ON
27 cmake --build build --config Release
MISTAKE on 27, SINGLE-THREADED and slow to build see -j 16 below for faster build 28 cmake --build build --config Release -j 16
29 ls
30 ls build
31 find . -name "llama.server"
32 find . -name "llama"
33 ls build/bin/
34 cd build/bin/
35 ls
36 ./llama-server -hf ggml-org/gpt-oss-120b-GGUF -c 0 --jinja
MISTAKE, didn't specify the port number for the llama-server 37 clear;history
38 ./llama-server -hf Qwen/Qwen3-VL-30B-A3B-Thinking -c 0 --jinja --port 11434
39 ./llama-server -hf Qwen/Qwen3-VL-30B-A3B-Thinking.gguf -c 0 --jinja --port 11434
40 ./llama-server -hf Qwen/Qwen3-VL-30B-A3B-Thinking-GGUF -c 0 --jinja --port 11434
41 clear;history
I switched to qwen3 vl because I need a multimodal model for that day's experiment. Lines 38 and 39 show me not using the right name for the model. I like how llama.cpp can download and run models directly off of huggingface.Then pointed my browser at http//:localhost:22434 on my local box and had the normal browser window where I could upload files and use the chat interface with the model. That also gives you an openai api-compatible endpoint. It was all I needed for what I was doing that day. I spent a grand total of $4 that day doing the setup and running some NLP-oriented prompts for a few hours.
Comment by bigiain 10 minutes ago
Comment by tgtweak 8 hours ago
48GB of vram and lots of cuda cores, hard to beat this value atm.
If you want to go even further, you can get an 8x V100 32GB server complete with 512GB ram and nvlink switching for $7000 USD from unixsurplus (ebay.com/itm/146589457908) which can run even bigger models and with healthy throughput. You would need 240V power to run that in a home lab environment though.
Comment by lostmsu 8 hours ago
Comment by monster_truck 12 hours ago
Fuck nvidia
Comment by clusterhacks 11 hours ago
How is it? I'd guess a bunch of the MoE models actually run well?
Comment by stusmall 8 hours ago
Comment by androiddrew 11 hours ago
Comment by eavan0 11 hours ago
Comment by kristianp 9 hours ago
Comment by zimbatm 13 hours ago
nix run github:numtide/llm-agents.nix#mistral-vibe
The repo is updated daily.Comment by jquaint 12 hours ago
Comment by pzmarzly 15 hours ago
As long as it doesn't mean 10x worse performance, that's a good selling point.
Comment by Macha 14 hours ago
In work, where my employer pays for it, Haiku tends to be the workhorse with Sonnet or Opus when I see it flailing. On my own budget I’m a lot more cost conscious, so Haiku actually ends up being “the fancy model” and minimax m2 the “dumb model”.
Comment by phildougherty 14 hours ago
Comment by amarcheschi 14 hours ago
Comment by fastball 14 hours ago
Comment by gunalx 11 hours ago
Comment by fastball 10 hours ago
> this model is worse (but cheaper)
> use it to output 10x the amount of trashier trash
You've lost me.
Comment by gunalx 10 hours ago
Comment by rubin55 10 hours ago
Comment by alexmorley 15 hours ago
Comment by rsolva 11 hours ago
Comment by SyneRyder 13 hours ago
I'm team Anthropic with Claude Max & Claude Code, but I'm still excited to see Mistral trying this. Mistral has occasionally saved the day for me when Claude refused an innocuous request, and it's good to have alternatives... even if Mistral / Devstral seems to be far behind the quality of Claude.
Comment by tomashubelbauer 13 hours ago
Comment by SyneRyder 12 hours ago
That was very helpful, thanks!
Comment by joostdevries 14 hours ago
Comment by mentalgear 9 hours ago
Comment by pshirshov 10 hours ago
The competition is much smoother. Where are the subscriptions which would give users the coding agent and the chat for a flat fee and working out of the box?..
Comment by weitendorf 12 hours ago
Going to start hacking on this ASAP
Comment by syntaxing 11 hours ago
Comment by kristianp 9 hours ago
Comment by syntaxing 6 hours ago
[1] https://openhands.dev/blog/devstral-a-new-state-of-the-art-o...
Comment by tucnak 15 hours ago
Comment by ismailmaj 15 hours ago
Comment by maelito 9 hours ago
Comment by poszlem 15 hours ago
This tech is simply too critical to pretend the military won’t use it. That’s clearer now than ever, especially after the (so far flop-ish) launch of the U.S. military’s own genAI platform.
Comment by programLyrique 14 hours ago
- https://helsing.ai/newsroom/helsing-and-mistral-announce-str... - https://sifted.eu/articles/mistral-helsing-defence-ai-action... - Luxembourg army chose Mistral: https://www.forcesoperations.com/la-pepite-francaise-mistral... - French army: https://www.defense.gouv.fr/actualites/ia-defense-sebastien-...
Comment by embedding-shape 13 hours ago
Not sure you've kept up to date, US have turned their backs on most allies so far including Europe and the EU, and now welcome previous enemies with open arms.
Comment by breedmesmn 12 hours ago
Comment by hobofan 14 hours ago
Comment by maelito 9 hours ago
They did.
Comment by simonw 11 hours ago
core/prompts/cli.md https://github.com/mistralai/mistral-vibe/blob/v1.0.4/vibe/c...
core/prompts/compact.md https://github.com/mistralai/mistral-vibe/blob/v1.0.4/vibe/c...
.../prompts/bash.md https://github.com/mistralai/mistral-vibe/blob/v1.0.4/vibe/c...
.../prompts/grep.md https://github.com/mistralai/mistral-vibe/blob/v1.0.4/vibe/c...
.../prompts/read_file.md https://github.com/mistralai/mistral-vibe/blob/v1.0.4/vibe/c...
.../prompts/write_file.md https://github.com/mistralai/mistral-vibe/blob/v1.0.4/vibe/c...
.../prompts/search_replace.md https://github.com/mistralai/mistral-vibe/blob/v1.0.4/vibe/c...
.../prompts/todo.md https://github.com/mistralai/mistral-vibe/blob/v1.0.4/vibe/c...
Comment by giancarlostoro 9 hours ago
Comment by simonw 8 hours ago
Here's n example of the kinds of things I do with Claude Code now: https://gistpreview.github.io/?b64d5ee40439877eee7c224539452... - that one involved several from-scratch rewrites of the history of an entire Git repo just because I felt like it.
Comment by therealmarv 14 hours ago
Surprising and good is only: Everything including graphics fixed when clicking my "speedreader" button in Brave. So they are doing that "cool look" by CSS.
Comment by netghost 10 hours ago
There's a scan lines affect they apply to everything that's "cool", but gets old after a minute.
Comment by rwky 12 hours ago
Comment by maelito 11 hours ago
Comment by badsectoracula 15 hours ago
Uh, the "Modified MIT license" here[0] for Devstral 2 doesn't look particularly permissively licensed (or open-source):
> 2. You are not authorized to exercise any rights under this license if the global consolidated monthly revenue of your company (or that of your employer) exceeds $20 million (or its equivalent in another currency) for the preceding month. This restriction in (b) applies to the Model and any derivatives, modifications, or combined works based on it, whether provided by Mistral AI or by a third party. You may contact Mistral AI (sales@mistral.ai) to request a commercial license, which Mistral AI may grant you at its sole discretion, or choose to use the Model on Mistral AI's hosted services available at https://mistral.ai/.
[0] https://huggingface.co/mistralai/Devstral-2-123B-Instruct-25...
Comment by Arcuru 14 hours ago
If you want to use something, and your company makes $240,000,000 in annual revenue, you should probably pay for it.
Comment by badsectoracula 13 hours ago
I do not mind having a license like that, my gripe is with using the terms "permissive" and "open source" like that because such use dilutes them. I cannot think of any reason to do that aside from trying to dilute the term (especially when some laws, like the EU AI Act, are less restrictive when it comes to open source AIs specifically).
Comment by kouteiheika 12 hours ago
Good. In this case, let it be diluted! These extra "restrictions" don't affect normal people at all, and won't even affect any small/medium businesses. I couldn't care less that the term is "diluted" and that makes it harder for those poor, poor megacorporations. They swim in money already, they can deal with it.
We can discuss the exact threshold, but as long as these "restrictions" are so extreme that they only affect huge megacorporations, this is still "permissive" in my book. I will gladly die on this hill.
Comment by dragonwriter 6 hours ago
Yes, they do, and the only reason for using the term “open source” for things whose licensing terms flagrantly defy the Open Source definition is to falsely sell the idea that using the code carries the benefits that are tied to the combination of features that are in the definition and which are lost with only a subset of those features. The freedom to use the software in commercial services is particularly important to end-users that are not interested in running their own services as a guarantee against lock-in and of whatever longevity they are able to pay to have provided even if the original creator later has interests that conflict with offering the software as a commercial service.
If this deception wasn't important, there would be no incentive not to use the more honest “source available for limited uses” description.
Comment by JoshTriplett 6 hours ago
It also makes life harder for individuals and small companies, because this is not Open Source. It's incompatible with Open Source, it can't be reused in other Open Source projects.
Terms have meanings. This is not Open Source, and it will never be Open Source.
Comment by kouteiheika 1 hour ago
I'm amazed at the social engineering that the megacorps have done with the whole Open Source (TM) thing. They engineered a whole generation of engineers to advocate not in their own self-interest, nor for the interest of the little people, but instead for the interest of the megacorps.
As soon as there is even the tiniest of restrictions, one which doesn't affect anyone besides a bunch of richiest corporations in the world, a bunch of people immediately come out of the woodwork, shout "but it's not open source!" and start bullying everyone else to change their language. Because if you even so much as inconvenience a megacorporation even a little bit it's not Open Source (TM) anymore.
If we're talking about ideals then this is something I find unsettling and dystopian.
I hard disagree with your "It also makes life harder for individuals and small companies" statement. It's the opposite. It gives them a competitive advantage vs megacorps, however small it may be.
Comment by whimsicalism 14 hours ago
Comment by joseda-hg 14 hours ago
Whatever name they come up with for a new license will be less useful, because I'll have to figure out that this is what that is
Comment by jrm4 14 hours ago
"Open Source" is nebulous. It reasonably works here, for better or worse.
Comment by stonemetal12 10 hours ago
No it isn't it is well defined. The only people who find it "nebulous" are people who want the benefits without upholding the obligations.
Comment by whimsicalism 14 hours ago
Open source has a well understood meaning, including licenses like MIT and Apache - but not including MIT but only if you make less than $500million dollars, MIT unless you were born on a wednesday, etc.
Comment by whimblepop 11 hours ago
Comment by fastball 14 hours ago
And honestly it wasn't a good hill to begin with: if what you are talking about is the license, call it "open license". The source code is out in the open, so it is "open source". This is why the purists have lost ground to practical usage.
Comment by embedding-shape 13 hours ago
As someone who was born and raised on FOSS, and still mostly employed to work on FOSS, I disagree.
Open source is what it is today because it's built by people with a spine who stand tall for their ideals even if it means less money, less industry recognition, lots of unglorious work and lots of other negatives.
It's not purist to believe that what built open source so far should remain open source, and not wanting to dilute that ecosystem with things that aren't open source, yet call themselves open source.
Comment by kouteiheika 12 hours ago
With all due respect, don't you see the irony in saying "people with a spine who stand tall for their ideals", and then arguing that attaching "restrictions" which only affect the richest megacorporations in the world somehow makes the license not permissive anymore?
What ideals are those exactly? So that megacorporations have the right to use the software without restrictions? And why should we care about that?
Comment by embedding-shape 10 hours ago
Anyone can use the code for whatever purpose they want, in any way they want. I've never been a "rich megacorporation", but I have gone from having zero money to having enough money, and I still think the very same thing about the code I myself release as I did from the beginning, it should be free to be used by anyone, for any purpose.
Comment by fastball 12 hours ago
Because instead of making the point "this license isn't as permissive as it could/should be" (easy to understand), instead the point being made is "this isn't real open source", which comes across to most people as just some weird gate-keeping / No True Scotsman kinda thing.
Comment by JoshTriplett 6 hours ago
Comment by whimsicalism 12 hours ago
Comment by fastball 12 hours ago
Though given the stance you are taking in this conversation, I'm not surprised you want to quibble over that.
¯\_(ツ)_/¯
Comment by whimsicalism 12 hours ago
Comment by fastball 11 hours ago
Comment by JoshTriplett 6 hours ago
> if what you are talking about is the license, call it "open license".
If you want to build something proprietary, call it something else. "Open Source" is taken.
Comment by whimsicalism 14 hours ago
well we don't really want to open that can of worms though, do we?
I don't agree with ceding technical terms to the rest of the world. I'm increasingly told we need to stop calling cancer detection AI "AI" or "ML" because it is not the 'bad AI' and confuses people.
I guess I'm okay with being intransigent.
Comment by fastball 12 hours ago
Who gives a shit what we call "cancer AI", what matters is the result.
Comment by jsnell 11 hours ago
Comment by mkmk3 14 hours ago
Comment by JimDabell 14 hours ago
Comment by fastball 14 hours ago
Comment by JimDabell 13 hours ago
Whenever anybody tries to claim that a non-commercial licenses is open-source, it always gets complaints that it is not open-source. This particular word hasn’t been watered down by misuse like so many others.
There is no commonly-accepted definition of open-source that allows commercial restrictions. You do not get to make up your own meaning for words that differs from how other people use it. Open-source does not have commercial restrictions by definition.
Comment by fastball 12 hours ago
Looking up open-source in the dictionary does include definitions that would allow for commercial restrictions, depending on how you define "free" (a matter that is most certainly up for debate).
Comment by whimblepop 11 hours ago
The term "open-source" exists for the purposes of a particular movement. If you are "for" the misuse and abuse of the term, you not only aren't part of that movement, but you are ignorant about it and fail to understand it— which means you frankly have no place speaking about the meanings of its terminology.
Comment by fastball 10 hours ago
Unless this authority has some ownership over the term and can prevent its misuse (e.g. with lawsuits or similar), it is not actually the authority of the term, and people will continue to use it how they see fit.
Indeed, I am not part of a movement (nor would I want to be) which focuses more on what words are used rather than what actions are taken.
Comment by JoshTriplett 6 hours ago
People can also say 2+2=5, and they're wrong. And people will continue to call them out on it. And we will keep doing so, because stopping lets people move the Overton window and try to get away with even more.
Comment by fastball 6 hours ago
The same is not true for "open source", which is a purely linguistic construct.
Comment by JimDabell 3 hours ago
And whenever they do so, this pointless argument will happen. Again, and again, and again. Because that’s not what the word means and your desired redefinition has been consistently and continuously rejected over and over again for decades.
What do you gain from misusing this term? The only thing it does is make you look dishonest and start arguments.
Comment by JoshTriplett 6 hours ago
This kind of thing is how people try to shift the Overton window. No.
Comment by udev4096 13 hours ago
Comment by fastball 12 hours ago
Comment by pxc 10 hours ago
Comment by fastball 8 hours ago
Comment by badsectoracula 14 hours ago
Comment by simonw 14 hours ago
Comment by jrm4 14 hours ago
Comment by squigz 14 hours ago
Comment by lillecarl 12 hours ago
Comment by tigranbs 11 hours ago
Comment by whimsicalism 14 hours ago
How is that a measure of model size? It should either be parameter size, activated parameters, or cost per output token.
Looks like a typo because the models line up with reported param sizes.
Comment by Poudlardo 13 hours ago
Comment by qwertox 13 hours ago
If Mistral is so permissive they could be the first ones, provided that hardware is then fast/cheap/efficient enough to create a small box that can be placed in an office.
Maybe in 5 years.
Comment by giancarlostoro 9 hours ago
Comment by bakies 12 hours ago
Comment by sosodev 10 hours ago
The Apple offerings are interesting but the lack of x86, Linux, and general compatibility make it hard sell imo.
Comment by brazukadev 13 hours ago
Comment by baq 13 hours ago
...so it won't ever happen, it'll require wifi and will only be accessible via the cloud, and you'll have to pay a subscription fee to access the hardware you bought. obviously.
Comment by tgtweak 8 hours ago
Comment by kevin061 15 hours ago
The only thing I found is a pay-as-you-go API, but I wonder if it is any good (and cost-effective) vs Claude et al.
Comment by pzo 13 hours ago
With pricing so low I don't see any reason why someone would buy sub for 200 EUR. These days those subs are so much limited in Claude Code or Cursor than it used to be (or used to unlimited). Better pay-as-you-go especially when there are days when you probably use AI less or not at all (weekends/holidays etc.) as long as those credits don't expire.
Comment by kevin061 13 hours ago
Comment by esafak 14 hours ago
Comment by cyp0633 15 hours ago
Comment by abuson 12 hours ago
After querying the model about .NET, it seems that its knowledge comes from around June 2024.
Comment by huqedato 5 hours ago
Comment by moffkalast 12 hours ago
Comment by da_grift_shift 14 hours ago
Comment by andai 14 hours ago
Comment by jedisct1 14 hours ago
Why does every AI provider need to have its own tool, instead of contributing to existing tools like Roo Code or Opencode?
Comment by villgax 12 hours ago
Just call it Mistral License & flush it down