[go: up one dir, main page]

DEV Community

# gpu

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
99.8% of LLM Inference Power Isn't Spent on Computation

99.8% of LLM Inference Power Isn't Spent on Computation

Comments
7 min read
Running 1M-token context on a single GPU (the math)

Running 1M-token context on a single GPU (the math)

Comments
2 min read
KV cache memory calculator: how much does your LLM actually use?

KV cache memory calculator: how much does your LLM actually use?

Comments
3 min read
How Much GPU Memory Does NexusQuant Actually Save?

How Much GPU Memory Does NexusQuant Actually Save?

Comments
4 min read
Running Gemma 2 27B Locally: MLX vs vLLM vs llama.cpp Performance Comparison

Running Gemma 2 27B Locally: MLX vs vLLM vs llama.cpp Performance Comparison

Comments
4 min read
GPU-Accelerated ML/DL Performance on MacBook Pro M5 Pro vs. M4 Max: Feasibility and Benchmarks for Developers.

GPU-Accelerated ML/DL Performance on MacBook Pro M5 Pro vs. M4 Max: Feasibility and Benchmarks for Developers.

Comments
23 min read
They Routed Power Through the Back of the Chip and 30% IR Drop Vanished

They Routed Power Through the Back of the Chip and 30% IR Drop Vanished

Comments
6 min read
CUDA Memory Hierarchy, Tile Programming, & DLSS 310.6 Driver Enhancements

CUDA Memory Hierarchy, Tile Programming, & DLSS 310.6 Driver Enhancements

Comments
3 min read
I built a duty-cycle throttler for my RTX 4060 (because undervolting wasn't enough)

I built a duty-cycle throttler for my RTX 4060 (because undervolting wasn't enough)

Comments
4 min read
I tested speculative decoding on my home GPU cluster. Here's why it didn't help.

I tested speculative decoding on my home GPU cluster. Here's why it didn't help.

Comments
5 min read
If Memory Could Compute, Would We Still Need GPUs?

If Memory Could Compute, Would We Still Need GPUs?

Comments
6 min read
Hackers Can Now Root Your Machine Through Your GPU. No, Really.

Hackers Can Now Root Your Machine Through Your GPU. No, Really.

1
Comments
5 min read
Why I Self-Host 7 RTX 5090 GPUs Instead of Using Cloud AI

Why I Self-Host 7 RTX 5090 GPUs Instead of Using Cloud AI

Comments
6 min read
We Built the First Pure Go DXIL Generator — Because Optimizing the Wrong Path Wasn't Enough

We Built the First Pure Go DXIL Generator — Because Optimizing the Wrong Path Wasn't Enough

Comments
10 min read
Hopper/Blackwell Tensor Core Optimization, llama.cpp VRAM Fix & 4W NPU Inference

Hopper/Blackwell Tensor Core Optimization, llama.cpp VRAM Fix & 4W NPU Inference

Comments
3 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.