Skip to content

DEV Community

# gpu

👋 Sign in for the ability to sort posts by relevant, latest, or top.

plasmon

Apr 8

99.8% of LLM Inference Power Isn't Spent on Computation

#llm #gpu #hardware #ai

7 min read

João André Gomes Marques

Apr 7

Running 1M-token context on a single GPU (the math)

#ai #gpu #llm #infrastructure

2 min read

João André Gomes Marques

Apr 7

KV cache memory calculator: how much does your LLM actually use?

#llm #machinelearning #python #gpu

3 min read

João André Gomes Marques

Apr 7

How Much GPU Memory Does NexusQuant Actually Save?

#machinelearning #gpu #llm #python

4 min read

augustine Egbuna

Apr 7

Running Gemma 2 27B Locally: MLX vs vLLM vs llama.cpp Performance Comparison

#llm #mlops #aiinfrastructure #gpu

4 min read

Valeria Solovyova

Apr 7

GPU-Accelerated ML/DL Performance on MacBook Pro M5 Pro vs. M4 Max: Feasibility and Benchmarks for Developers.

#gpu #machinelearning #applesilicon #performance

23 min read

plasmon

Apr 6

They Routed Power Through the Back of the Chip and 30% IR Drop Vanished

#semiconductor #hardware #ai #gpu

6 min read

soy

Apr 6

CUDA Memory Hierarchy, Tile Programming, & DLSS 310.6 Driver Enhancements

#gpu #nvidia #hardware

3 min read

Yaroslav Pristupa

Apr 6

I built a duty-cycle throttler for my RTX 4060 (because undervolting wasn't enough)

#softwaredevelopment #gpu #vram #hardware

4 min read

Christopher Maher

Apr 6

I tested speculative decoding on my home GPU cluster. Here's why it didn't help.

#llm #kubernetes #gpu #ai

5 min read

plasmon

Apr 5

If Memory Could Compute, Would We Still Need GPUs?

#ai #semiconductor #hardware #gpu

6 min read

Alan West

Apr 6

Hackers Can Now Root Your Machine Through Your GPU. No, Really.

#security #gpu #hardware #aiinfrastructure

5 min read

Biricik Biricik

Apr 4

Why I Self-Host 7 RTX 5090 GPUs Instead of Using Cloud AI

#ai #gpu #selfhosted #infrastructure

6 min read

Apr 5

We Built the First Pure Go DXIL Generator — Because Optimizing the Wrong Path Wasn't Enough

#go #gpu #graphics #opensource

10 min read

soy

Apr 5

Hopper/Blackwell Tensor Core Optimization, llama.cpp VRAM Fix & 4W NPU Inference

#gpu #nvidia #hardware

3 min read

👋 Sign in for the ability to sort posts by relevant, latest, or top.