$100 Nvidia Tesla V100 still beats newer GPUs in local AI

A used Nvidia Tesla V100, a server accelerator that once lived in data centers, is turning up as a surprisingly sharp tool for local AI work. In a hands-on test, the Tesla V100 outpaced newer consumer cards like the GeForce RTX 3060 12 GB and Radeon RX 7800 XT 16 GB on several model runs, which is a neat reminder that gaming GPUs and AI GPUs are not always playing the same sport.

The test setup was anything but plug-and-play. The Tesla V100 SXM2 needed an SXM-to-PCIe adapter, separate power, and a homemade cooling solution with a 3D-printed air duct and fan. Even so, the whole build reportedly came in at about $235, while the card itself can be found on the second-hand market for about $100. That is still a pile of tinkering, but it is a lot less painful than shopping for brand-new hardware with enough VRAM to run larger models comfortably.

Tesla V100 AI benchmark results

The numbers are the part that make the old server card look embarrassingly fresh. In GPT-oss 20B, the Tesla V100 reached about 130 tokens per second, ahead of the Radeon RX 7800 XT at roughly 90 tokens per second. In another run with Gemma4:e4b, it posted 108 tokens per second, beating the RTX 3060 12 GB at 76 tokens per second.

GPT-oss 20B: about 130 tokens/s on Tesla V100
Radeon RX 7800 XT: about 90 tokens/s in the same test
Gemma4:e4b: 108 tokens/s on Tesla V100
GeForce RTX 3060 12 GB: 76 tokens/s in that run

Why an old data-center card still wins

The explanation is less mysterious than Nvidia marketing would like it to sound. The Tesla V100 is built around HBM2 memory bandwidth and compute-first priorities, while modern consumer cards often spend more silicon on graphics features, efficiency tuning, and gaming tricks. That tradeoff can leave older accelerators looking oddly competitive when the job is large-language-model inference and not rendering pretty trees.

It also held up well under a 100 W power cap, where it still beat the RTX 3060 on both speed and tokens per watt. That matters because the used market keeps pushing more ex-server hardware into hobbyist AI rigs, especially as mainstream GPUs remain expensive and often VRAM-limited for bigger local models. A 32 GB Tesla V100, sold for about $400 to $500, looks even more tempting for people who want to run heavier LLMs without handing over a small fortune.

The catch with buying one

This is not a simple ”buy old, win AI” story. The Tesla V100 is cheap because it asks for compromise: adapters, custom cooling, and enough patience to make all the pieces talk to each other. For developers and AI hobbyists who enjoy a bit of hardware surgery, that trade may be worth it. For everyone else, the sane choice is still a modern card that works out of the box.

The more interesting question is how long this loophole stays open. If used data-center GPUs keep delivering high memory bandwidth at bargain prices, they will keep undercutting consumer models for local AI workloads, and the people who bought ”last year’s” gaming card for inference may not be thrilled about that.

Source: Ixbt

Tesla V100 AI benchmark results

Why an old data-center card still wins

The catch with buying one

Leave a comment