Address
:
[go:
up one dir
,
main page
]
Include Form
Remove Scripts
Accept Cookies
Show Images
Show Referer
Rotate13
Base64
Strip Meta
Strip Title
Session Cookies
Skip to content
Navigation menu
Search
Powered by Algolia
Search
Log in
Create account
DEV Community
Close
#
gpu
Follow
Hide
Posts
Left menu
👋
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
Right menu
99.8% of LLM Inference Power Isn't Spent on Computation
plasmon
plasmon
plasmon
Follow
Apr 8
99.8% of LLM Inference Power Isn't Spent on Computation
#
llm
#
gpu
#
hardware
#
ai
Comments
Add Comment
7 min read
Running 1M-token context on a single GPU (the math)
João André Gomes Marques
João André Gomes Marques
João André Gomes Marques
Follow
Apr 7
Running 1M-token context on a single GPU (the math)
#
ai
#
gpu
#
llm
#
infrastructure
Comments
Add Comment
2 min read
KV cache memory calculator: how much does your LLM actually use?
João André Gomes Marques
João André Gomes Marques
João André Gomes Marques
Follow
Apr 7
KV cache memory calculator: how much does your LLM actually use?
#
llm
#
machinelearning
#
python
#
gpu
Comments
Add Comment
3 min read
How Much GPU Memory Does NexusQuant Actually Save?
João André Gomes Marques
João André Gomes Marques
João André Gomes Marques
Follow
Apr 7
How Much GPU Memory Does NexusQuant Actually Save?
#
machinelearning
#
gpu
#
llm
#
python
Comments
Add Comment
4 min read
Running Gemma 2 27B Locally: MLX vs vLLM vs llama.cpp Performance Comparison
augustine Egbuna
augustine Egbuna
augustine Egbuna
Follow
Apr 7
Running Gemma 2 27B Locally: MLX vs vLLM vs llama.cpp Performance Comparison
#
llm
#
mlops
#
aiinfrastructure
#
gpu
Comments
Add Comment
4 min read
GPU-Accelerated ML/DL Performance on MacBook Pro M5 Pro vs. M4 Max: Feasibility and Benchmarks for Developers.
Valeria Solovyova
Valeria Solovyova
Valeria Solovyova
Follow
Apr 7
GPU-Accelerated ML/DL Performance on MacBook Pro M5 Pro vs. M4 Max: Feasibility and Benchmarks for Developers.
#
gpu
#
machinelearning
#
applesilicon
#
performance
Comments
Add Comment
23 min read
They Routed Power Through the Back of the Chip and 30% IR Drop Vanished
plasmon
plasmon
plasmon
Follow
Apr 6
They Routed Power Through the Back of the Chip and 30% IR Drop Vanished
#
semiconductor
#
hardware
#
ai
#
gpu
Comments
Add Comment
6 min read
CUDA Memory Hierarchy, Tile Programming, & DLSS 310.6 Driver Enhancements
soy
soy
soy
Follow
Apr 6
CUDA Memory Hierarchy, Tile Programming, & DLSS 310.6 Driver Enhancements
#
gpu
#
nvidia
#
hardware
Comments
Add Comment
3 min read
I built a duty-cycle throttler for my RTX 4060 (because undervolting wasn't enough)
Yaroslav Pristupa
Yaroslav Pristupa
Yaroslav Pristupa
Follow
Apr 6
I built a duty-cycle throttler for my RTX 4060 (because undervolting wasn't enough)
#
softwaredevelopment
#
gpu
#
vram
#
hardware
Comments
Add Comment
4 min read
I tested speculative decoding on my home GPU cluster. Here's why it didn't help.
Christopher Maher
Christopher Maher
Christopher Maher
Follow
Apr 6
I tested speculative decoding on my home GPU cluster. Here's why it didn't help.
#
llm
#
kubernetes
#
gpu
#
ai
Comments
Add Comment
5 min read
If Memory Could Compute, Would We Still Need GPUs?
plasmon
plasmon
plasmon
Follow
Apr 5
If Memory Could Compute, Would We Still Need GPUs?
#
ai
#
semiconductor
#
hardware
#
gpu
Comments
Add Comment
6 min read
Hackers Can Now Root Your Machine Through Your GPU. No, Really.
Alan West
Alan West
Alan West
Follow
Apr 6
Hackers Can Now Root Your Machine Through Your GPU. No, Really.
#
security
#
gpu
#
hardware
#
aiinfrastructure
1
 reaction
Comments
Add Comment
5 min read
Why I Self-Host 7 RTX 5090 GPUs Instead of Using Cloud AI
Biricik Biricik
Biricik Biricik
Biricik Biricik
Follow
Apr 4
Why I Self-Host 7 RTX 5090 GPUs Instead of Using Cloud AI
#
ai
#
gpu
#
selfhosted
#
infrastructure
Comments
Add Comment
6 min read
We Built the First Pure Go DXIL Generator — Because Optimizing the Wrong Path Wasn't Enough
Andrey Kolkov
Andrey Kolkov
Andrey Kolkov
Follow
Apr 5
We Built the First Pure Go DXIL Generator — Because Optimizing the Wrong Path Wasn't Enough
#
go
#
gpu
#
graphics
#
opensource
Comments
Add Comment
10 min read
Hopper/Blackwell Tensor Core Optimization, llama.cpp VRAM Fix & 4W NPU Inference
soy
soy
soy
Follow
Apr 5
Hopper/Blackwell Tensor Core Optimization, llama.cpp VRAM Fix & 4W NPU Inference
#
gpu
#
nvidia
#
hardware
Comments
Add Comment
3 min read
👋
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
We're a place where coders share, stay up-to-date and grow their careers.
Log in
Create account