Making a vintage LLM from scratch
Posted by croqaz 6 days ago
Comments
Comment by mg794613 5 days ago
I appreciate the honesty, but now there's no journey, and that's what I'm interested in. I can ask a LLM myself.
Comment by abetusk 4 days ago
There's a lot of pre-processing, experimentation and validation that went into this project. The training data collection and sanitization alone is a big undertaking.
As for the blog post itself, from the article:
> Note: This blog post is 100% written by me. No AI has been used whatsoever.
Put another way: You can ask the LLM yourself to do this project? Please do, share your prompt, I'd like to see it.
Comment by JayNitram 4 days ago
Comment by croqaz 4 days ago
Comment by skerit 4 days ago
Comment by tancop 5 days ago
im pretty sure its a real text in Welsh. there might be typos from ocr but yeah thats what the language really looks like, i dont speak it but its easy to recognize.
Comment by croqaz 5 days ago
Comment by throw310822 5 days ago
"It will be easy for the knowledgeable to fix the few errors that remain [in the text]". (Bydd yn rwydd iawn i'r cyfarwydd ddiwygio'r ychydig.")
Which is exactly what the OP is doing.
Comment by noman-land 3 days ago
Comment by HexPhantom 5 days ago
Comment by dennysora-main 4 days ago
I've spent a ton of time reading up on math, ML, and DL through books, open courses, and papers, while also studying all the major open-source LLM architectures.
Since I only have one DGX Spark machine to run experiments, I can't train a massive LLM from the get-go. Instead, I'm experimenting with an auto-scaling parameter mechanism, which has led me to create a pretty unconventional and fun architecture!
Why go through all this effort when modern LLMs can basically write simple LLMs themselves, and I clearly can't out-compute the big tech giants?
Honestly, it's because I'm obsessed with the core mechanics of LLMs. I want to build something exclusively for myself and hopefully discover some completely undiscovered mechanisms along the way.
Just keeping a record and sharing my progress—having fun with it is truly the biggest reward!
I'll share it when I get a chance!
Comment by charcircuit 4 days ago
Comment by dennysora-main 4 days ago
Cloud rentals are usually billed hourly. Since I constantly tweak the architecture and run it again, having a local rig completely kills any cost anxiety—it's just a one-off payment.
Plus, regular users can't even get access to H100s anyway. I applied on AWS and GCP before and couldn't get them.
Comment by croqaz 4 days ago
Comment by dennysora-main 4 days ago
Comment by croqaz 6 days ago
Comment by giancarlostoro 4 days ago
Comment by LoganDark 4 days ago
Comment by croqaz 4 days ago
Comment by skerit 4 days ago
And anyway, I think the most important thing is dataset quality. Dumping in whatever dataset you find on Huggingface is a recipe for mediocrity, so I'm also spending a lot of time on that.
Comment by giancarlostoro 4 days ago
Comment by cyberge99 5 days ago
Thanks for the writeup. A more granular followup would be cool too.
Comment by charcircuit 4 days ago
Comment by HexPhantom 5 days ago
Comment by croqaz 4 days ago
Do you mind expanding this question? More granular in what way? what would you like to know that is missing from the post?
Comment by breezybottom 5 days ago
Comment by rxm 5 days ago
Comment by macwhisperer 4 days ago
Comment by croqaz 4 days ago
Comment by HexPhantom 5 days ago
Comment by nnnnnmnnnnnn 5 days ago