Very large amounts of gaming gpus vs AI gpus

TheMightyCat@ani.social · 1 day ago

Very large amounts of gaming gpus vs AI gpus

starshipwinepineapple@programming.dev · 13 hours ago

Tflops is a generic measurement, not actual utilization, and not specific to a given type of workload. Not all workloads saturate gpu utilization equally and ai models will depend on cuda/tensor. the gen/count of your cores will be better optimized for AI workloads and better able to utilize those tflops for your task. and yes, amd uses rocm which i didn’t feel i needed to specify since its a given (and years behind cuda capabilities). The point is that these things are not equal and there are major differences here alone.

I mentioned memory type since the cards you listed use different versions ( hbm vs gddr) so you can’t just compare the capacity alone and expect equal performance.

And again for your specific use case of this large MoE model you’d need to solve the gpu-to-gpu communication issue (ensuring both connections + sufficient speed without getting bottlenecked)

I think you’re going to need to do actual analysis of the specific set up youre proposing. Good luck

GPU	VRAM	Price (€)	Bandwidth (TB/s)	TFLOP16	€/GB	€/TB/s	€/TFLOP16
NVIDIA H200 NVL	141GB	36284	4.89	1671	257	7423	21
NVIDIA RTX PRO 6000 Blackwell	96GB	8450	1.79	126.0	88	4720	67
NVIDIA RTX 5090	32GB	2299	1.79	104.8	71	1284	22
AMD RADEON 9070XT	16GB	665	0.6446	97.32	41	1031	7
AMD RADEON 9070	16GB	619	0.6446	72.25	38	960	8.5
AMD RADEON 9060XT	16GB	382	0.3223	51.28	23	1186	7.45