I got an Nvidia GH200 server for €7.5k on Reddit and converted it to a desktop
Posted by dnhkng 1 day ago
Comments
Comment by dnhkng 1 day ago
Comment by amirhirsch 1 day ago
I needed this info, thanks for putting it up. Can this really be an issue for every data center?
Comment by ipsum2 1 day ago
Comment by pointbob 1 day ago
Comment by baud147258 21 hours ago
Comment by devilbunny 19 hours ago
Comment by leipert 18 hours ago
Comment by devilbunny 18 hours ago
Comment by jerome-jh 1 day ago
Comment by dnhkng 19 hours ago
Comment by dauertewigkeit 1 day ago
How does the seller get these desktops directly from NVIDIA?
And if the seller's business is custom made desktop boxes, why didn't he just fit the two H100s into a better desktop box?
Comment by dnhkng 1 day ago
This thing too unwieldy to make into a desktop (you can see how much effort it took), and was in pretty bad condition. I think he just wanted to get rid of it without having to deal with returns. I took a bet on it, and was lucky it paid out.
Comment by Ntrails 1 day ago
I expect because they were no longer in the sort of condition to sell as new machines? They were clearly well used and selling "as seen" is the lowest reputational risk associated with offload
Comment by wtallis 1 day ago
Comment by renewiltord 1 day ago
Comment by GPTshop 21 hours ago
H100 PCI and GH200 are two very different things. The advantages of Grace Hopper are much higher connections speeds, bandwidth and lower power consumption.
Comment by ProAm 1 day ago
Comment by Fire-Dragon-DoL 1 day ago
Comment by Helmut10001 1 day ago
Pre-story: For 3 years I wanted to build a rack-gaming-server, so I can play with my son in our small apartment where we don't have enough space for a gaming computer (wife also doesn't allow it). I have a stable IPsec connection to my parents house, where I have a powerfull PV plant (90kWp) and a rack server, for my freelance job.
Fast forward to 2 months ago, I see a Supermicro SYS-7049GP-TRT for 1400€ on Ebay. It looks clean, sold by some IT reuse-warehouse. No desription, just 3 photos and the case label. I ask the seller whether he knows whats in it and he says he didn't check. The case alone comes new at 3k here in Germany. I buy it.
It arrives. 64GB ECC memory, 2x Xeon silver, 1x 500GB SSD, 5x GBit LAN Cards. Dual 2200 Watt PowerSupply. I remove the airshroud, and: A Nvidia V100S 32GB emerges. I sell the card on ebay for 1600€ and buy 2x Xeon 6254 CPUs (100€ each) to replace the 2x Silver ones that are in it. Last week, I bought two Blackwell RTX 4000 Pro for 1100€ each. Enough for gaming with my son! (and I can do some fun with LLMs and home assistant/smart home..)
The case fits 4x dual-size GPUs, so I could fit 4x RTX 6000 in it (384GB VRAM). At a price of 3k, this would come at 12k (still too much for me.. but let's check back in a couple of years..).
Buying used enterprise gear is fun. I had so many good experiences and this stuff is just rock solid.
Comment by systemtest 1 day ago
Comment by dnhkng 1 day ago
Comment by Ao7bei3s 1 day ago
Comment by rtkwe 1 day ago
Comment by ivanjermakov 1 day ago
Comment by rtkwe 1 day ago
Comment by n3t 1 day ago
Comment by djoldman 1 day ago
> # Data Center/HGX-Series/HGX H100/Linux aarch64/12.8 seem to work! wget https://us.download.nvidia.com/tesla/570.195.03/NVIDIA-Linux...
> ...
Nothing makes you feel more "I've been there" than typing inscrutable arcana to get a GPU working for ML work...
Comment by crapple8430 1 day ago
Comment by dnhkng 1 day ago
The Blackwells are superior on paper, but there's some "Nvidia Math" involved: When they report performance in press announcements, they don't usually mention the precision. Yes, the Blackwells are more than double the speed of the Hopper H100's, but thats comparing FP8 to FP4 (the H100's can't do native FP4). Yes, thats great for certain workloads, but not the majority.
What's more interesting is the VRAM speed. The 6000 Pro has 96 GB of GPU memory and 1.8 TB/s bandwidth, the H100 haas the same amount, but with HBM3 at 4.9 TB/s. That 2.5X increase is very influential in the overall performance of the system.
Lastly, if it works, the NVLink-C2C does 900 GB/s of bandwidth between the cards, so about 5x what a pair of 6000 Pros could do over PCIE5. Big LLMs need well over the 96 GB on a single card, so this becomes the bottleneck.
e.g. Here are benchmarks on the RTX 6000 pro using the GPT-OSS-120B model, where it generates 145 tokens/sec, and I get 195 tokens/sec on the GH200. https://www.reddit.com/r/LocalLLaMA/comments/1mm7azs/openai_...
Comment by crapple8430 23 hours ago
The NVLink is definitely a strong point, I missed that detail. For LLM inference specifically it matters fairly little iirc, but for training it might.
Comment by segmondy 1 day ago
Comment by Helmut10001 1 day ago
GPUs have such a short liefspan these days that it is really important to compare new vs. used.
Comment by segmondy 20 hours ago
Comment by dnhkng 1 day ago
I had 4x 4090, that I had bought for about $2200 each in early 2023. I sold 3 of them to help pay for the GH200, and got 2K each.
Comment by skizm 1 day ago
Also:
> I arrived at a farmhouse in a small forest…
Were you not worried you were going to get murdered?
Comment by dnhkng 1 day ago
Comment by aeve890 1 day ago
Comment by jaggirs 1 day ago
Comment by the8472 1 day ago
People have gotten games to run on a DGX Spark, which is somewhat similar (GB10 instead of GH200)
Comment by dnhkng 1 day ago
Comment by throawayonthe 20 hours ago
Comment by nicman23 1 day ago
Comment by wtcactus 1 day ago
In AMD I’ve read it works great, but for NVIDIA chips, in mouse heavy games, it becomes unusable for me.
Comment by nicman23 21 hours ago
Comment by wtcactus 21 hours ago
But I also think that for people that didn't try a "snappier" alternative, it was possible not to realize it's there.
Try and make a comparison with Parsec of even the Steam's own streaming. You will notice a big difference if the issue still exists.
Comment by nicman23 7 hours ago
Comment by wtcactus 5 hours ago
I now remember there was a way to go around it (a bit cumbersome and ugly) which was to render the mouse pointer only locally. That means no mouse cursor changes for tooltips/resizing/different pointers in games, etc. But at least it gets rid of the lag.
Comment by nicman23 3 hours ago
Comment by zamadatix 1 day ago
Comment by Havoc 1 day ago
LTT tried it in one of their videos...forgot which card but one of the serious nvidia AI cards.
...it runs like shit for gaming workloads. It does the job but comfortably beaten by a mid tier consumer card for 1/10th the price
Their AI track datacenter cards are definitely not same thing different badge glued on
Comment by mrandish 1 day ago
It's an interesting question, and since OP indicates he previously had a 4090, he's qualified to reply and hopefully will. However, I suspect the GH200 won't turn out to run games much faster than a 5090 because A) Games aren't designed to exploit the increased capabilities of this hardware, and B) The GH200 drivers wouldn't be tuned for game performance. One of the biggest differences of datacenter AI GPUs is the sheer memory size, and there's little reason for a game to assume there's more than 16GB of video memory available.
More broadly, this is a question that, for the past couple decades, I'd have been very interested in. For a lot of years, looking at today's most esoteric, expensive state-of-the-art was the best way to predict what tomorrow's consumer desktop might be capable of. However, these days I'm surprised to find myself no longer fascinated by this. Having been riveted by the constant march of real-time computer graphics from the 90s to 2020 (including attending many Siggraph conferences in the 90s and 00s), I think we're now nearing the end of truly significant progress in consumer gaming graphics.
I do realize that's a controversial statement, and sure there will always be a way to throw more polys, bigger textures and heavier algorithms at any game, but... each increasing increment just doesn't matter as much as it once did. For typical desktop and couch consumer gaming, the upgrade from 20fps to 60fps was a lot more meaningful to most people than 120fps to 360fps. With synthetic frame and pixel generation, increasing resolution beyond native 4K matters less. (Note: head-mounted AR/VR might one of the few places 'moar pixels' really matters in the future). Sure, it can look a bit sharper, a bit more varied and the shadows can have more perfect ray-traced fall-off, but at this point piling on even more of those technically impressive feats of CGI doesn't make the game more fun to play, whether on a 75" TV at 8 feet or a 34-inch monitor at two feet. As an old-school computer graphics guy, it's incredible to be see real-time path tracing adding subtle colors to shadows from light reflections bouncing off colored walls. It's living in the sci-fi future we dreamed of at Siggraph '92. But as a gamer looking for some fun tonight, honestly... the improved visuals don't contribute much to the overall gameplay between a 3070, 4070 and 5070.
Comment by Scene_Cast2 1 day ago
Comment by jsheard 1 day ago
Comment by fuzzythinker 1 day ago
Good one
Comment by volf_ 1 day ago
These are the best kinds of posts
Comment by BizarroLand 1 day ago
Comment by kinow 15 hours ago
I found interesting to learn there are businesses around converting used servers into desktops. Sounds like a good initiative to avoid some e-waste (assuming the desktops are easy to maintain).
Comment by Beijinger 1 day ago
Most of them are in California? Anything in NY/NJ
Comment by bombcar 1 day ago
There should be some all over the country.
Comment by m4r1k 1 day ago
Comment by fnands 22 hours ago
Nice find, and I admire your courage for even attempting this!
Comment by mrose11 1 day ago
Comment by ycwatcher 1 day ago
Comment by dnhkng 1 day ago
Comment by tigranbs 1 day ago
Comment by Frannky 1 day ago
Comment by jauntywundrkind 1 day ago
We'll see how it goes, but what _is_ happening is ram replacement. Nvidia 5090's with 96GB are somewhat a thing now. $4K. YMMV, caveat emptor. https://www.alibaba.com/product-detail/Newest-RTX-5090-96gb-...
Comment by MLgulabio 1 day ago
Lets continue to hope
Comment by rcarmo 1 day ago
Comment by DANmode 22 hours ago
Comment by albertgoeswoof 1 day ago
How long would it take to recoup the cost if you made the model available for others to run inference at the same price as the big players?
Comment by kingstnap 1 day ago
Assumptions:
Batch 4x and get 400 tokens per second and push his power consumption to 900W instead of the underutilized 300W.
Electricity around €0.2/kWhr.
Tokens valued at €1/1M out.
Assume ~70% utilization.
Result:
You get ~1M tokens per hour which is a net profit of ~€0.8/hr. Which is a payoff time of a bit over a year or so given the €9K investment.
Honestly though there is a lot of handwaving here. The most significant unknown is getting high utilization with aggressive batching and 24/7 load.
Also the demand for privacy can make the utility of the tokens much higher than typical API prices for open source models.
In a sort of orthogonal way renting 2 H100s costs around $6 per hour which makes the payback time a bit over a couple months.
Comment by segmondy 1 day ago
Comment by nicman23 1 day ago
Comment by PhilippGille 1 day ago
GLM 4.5 Air, to be precise. It's a smaller 166B model, not the full 355B one.
Worth mentioning when discussing token throughput.
Comment by dnhkng 1 day ago
It will fit in system RAM, and as its mixture of experts and the experts are not too large, I can at least run it. Token/second speed will be slower, but as system memory bandwidth is somewhere around 5-600Gb/s, so it should feel OK.
Comment by Gracana 22 hours ago
Comment by Deathmax 23 hours ago
Comment by dnhkng 1 day ago
I think there are probably Law Firms/doctors offices that would gladly pay ~3-4K euro a month to have this thing delivered and run truely "on-prem" to work with documents they can't risk leaking (patent filings, patient records etc).
For a company with 20-30 people, the legal and privacy protection is worth the small premium over using cloud providers.
Just a hunch though! This would have it paid-off in 3-4 months?
Comment by hollow-moe 1 day ago
Comment by jauntywundrkind 1 day ago
> 4x Arctic Liquid Freezer III 420 (B-Ware) - €180
Quite aside, but man: I fricking love Arctic. Seeing their fans in the new Corsi-Rosenthal boxes has been awesome. Such good value. I've been sing a Liquid Freeze II after nearly buying my last air-cooled heat-sink & seeing the LF-II onsale for <$75. Buy.
Please give us some power consumption figures! I'm so curious how it scales up and down. Do different models take similar or different power? Asking a lot, but it'd be so neat to see a somewhat high res view (>1 sample/s) of power consumption (watts) on these things, such a unique opportunity.
Comment by Tenemo 1 day ago
Comment by arein3 1 day ago
Comment by zkmon 1 day ago
Comment by 20after4 1 day ago
Comment by danr4 1 day ago
Comment by ionwake 1 day ago
Comment by Philpax 1 day ago
Comment by KellyCriterion 1 day ago
SCNR
Comment by pointbob 1 day ago
Comment by ChrisArchitect 1 day ago
Comment by dnhkng 1 day ago
Comment by walrus01 1 day ago
https://www.google.com/search?client=firefox-b-m&q=grace%20h...
Comment by pointbob 1 day ago