Address
:
[go:
up one dir
,
main page
]
Include Form
Remove Scripts
Accept Cookies
Show Images
Show Referer
Rotate13
Base64
Strip Meta
Strip Title
Session Cookies
Skip to content
Navigation menu
Search
Powered by Algolia
Search
Log in
Create account
Forem
Close
#
vllm
Follow
Hide
Posts
Left menu
đź‘‹
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
Right menu
From one model to seven — what it took to make TurboQuant model-portable
Alberto Nieto
Alberto Nieto
Alberto Nieto
Follow
Apr 1
From one model to seven — what it took to make TurboQuant model-portable
#
python
#
vllm
#
gpu
#
triton
Comments
Add Comment
3 min read
Compressed VLM inference from a single Containerfile — turboquant-vllm v1.1
Alberto Nieto
Alberto Nieto
Alberto Nieto
Follow
Mar 28
Compressed VLM inference from a single Containerfile — turboquant-vllm v1.1
#
python
#
vllm
#
gpu
#
containers
1
 reaction
Comments
Add Comment
2 min read
vLLM On-Demand Gateway: Zero-VRAM Standby for Local LLMs on Consumer GPUs
soy
soy
soy
Follow
Mar 26
vLLM On-Demand Gateway: Zero-VRAM Standby for Local LLMs on Consumer GPUs
#
vllm
#
llm
#
gpu
#
python
1
 reaction
Comments
Add Comment
4 min read
vLLM Request Lifecycle (Where TTFT is measured)
iapilgrim
iapilgrim
iapilgrim
Follow
Mar 11
vLLM Request Lifecycle (Where TTFT is measured)
#
vllm
#
monitoring
Comments
Add Comment
2 min read
Gemma-SRE: Self-Hosted vLLM Infrastructure Agent
xbill
xbill
xbill
Follow
for
Google Developer Experts
Mar 27
Gemma-SRE: Self-Hosted vLLM Infrastructure Agent
#
gemma
#
mcpserver
#
tpusprint
#
vllm
1
 reaction
Comments
Add Comment
18 min read
I Pushed Local LLMs Harder. Here's What Two Models Actually Did.
Donald Cruver
Donald Cruver
Donald Cruver
Follow
Mar 2
I Pushed Local LLMs Harder. Here's What Two Models Actually Did.
#
claudecode
#
vllm
#
selfhosted
#
amd
1
 reaction
Comments
Add Comment
8 min read
The Ghost in the Batch: How vLLM Silently Switches Algorithms
Mayank Ketkar
Mayank Ketkar
Mayank Ketkar
Follow
Feb 15
The Ghost in the Batch: How vLLM Silently Switches Algorithms
#
vllm
#
machinelearning
#
gpu
#
determinism
Comments
Add Comment
5 min read
Compiling the Vision Encoder: Squeezing 3% More Throughput from Qwen3-VL on Hopper GPUs
Mayank Ketkar
Mayank Ketkar
Mayank Ketkar
Follow
Feb 9
Compiling the Vision Encoder: Squeezing 3% More Throughput from Qwen3-VL on Hopper GPUs
#
vllm
#
pytorch
#
gpu
#
machinelearning
Comments
Add Comment
11 min read
Session 1: vLLM Overview and the User API
Ben
Ben
Ben
Follow
Feb 1
Session 1: vLLM Overview and the User API
#
vllm
#
llm
#
python
#
machinelearning
Comments
Add Comment
12 min read
vLLM — Session 2: The Engine Layer — Request Management
Ben
Ben
Ben
Follow
Feb 1
vLLM — Session 2: The Engine Layer — Request Management
#
vllm
#
llm
#
python
#
machinelearning
Comments
Add Comment
13 min read
Pare de Brincar com LLMs Locais: Leve a IAG Open Source para a Produção na Magalu Cloud
Gláucio
Gláucio
Gláucio
Follow
for
Magalu Cloud
Feb 5
Pare de Brincar com LLMs Locais: Leve a IAG Open Source para a Produção na Magalu Cloud
#
ai
#
llm
#
vllm
#
docker
1
 reaction
Comments
3
 comments
22 min read
Running Claude Code with Local LLMs via vLLM and LiteLLM
Donald Cruver
Donald Cruver
Donald Cruver
Follow
Feb 5
Running Claude Code with Local LLMs via vLLM and LiteLLM
#
claudecode
#
vllm
#
selfhosted
#
ai
2
 reactions
Comments
Add Comment
6 min read
đź‘‹
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
We're a blogging-forward open source social network where we learn from one another
Log in
Create account