VibeThinker-3B achieves 80.2 on LCBv6

Comment by SwellJoe 9 hours ago

I added this to my benchmark of models looking for Mythos-reported security bugs. Unsurprisingly, it found 0. There is, after all, a lower bound on how small a model can be and still find security bugs. https://swelljoe.com/post/will-it-mythos/

It can seemingly reliably write working Python code though, which is impressive for such a little guy.

Comment by dadoum 18 hours ago

So I downloaded the model and tried a few Math prompts. The simple addition was a little tedious because it checked multiple times that the calculation was right, I then gave it a quite long integral to solve but which is straightforward if you know the techniques, and it got it in 5 minutes on my Macbook Pro M4 Pro 24 GB, I just had to increase the context window. I finally tried giving it a full math exam but here it wouldn't score much points as it takes so many shortcuts it writes wrong steps in its answers. Still pretty good as it generally identifies what it should do, but I did not try anything in that weight class before so I can't really talk if that's impressive in the full picture.

Comment by moondistance 21 hours ago

Paper: https://huggingface.co/papers/2606.16140

VibeThinker-3B achieves 80.2 on LCBv6

Comments