Vllm - Forem

👋 Sign in for the ability to sort posts by relevant, latest, or top.

Alberto Nieto

Apr 1

From one model to seven — what it took to make TurboQuant model-portable

#python #vllm #gpu #triton

3 min read

Alberto Nieto

Mar 28

Compressed VLM inference from a single Containerfile — turboquant-vllm v1.1

#python #vllm #gpu #containers

2 min read

soy

Mar 26

vLLM On-Demand Gateway: Zero-VRAM Standby for Local LLMs on Consumer GPUs

#vllm #llm #gpu #python

4 min read

iapilgrim

Mar 11

vLLM Request Lifecycle (Where TTFT is measured)

#vllm #monitoring

2 min read

xbill for Google Developer Experts

Mar 27

Gemma-SRE: Self-Hosted vLLM Infrastructure Agent

#gemma #mcpserver #tpusprint #vllm

18 min read

Donald Cruver

Mar 2

I Pushed Local LLMs Harder. Here's What Two Models Actually Did.

#claudecode #vllm #selfhosted #amd

8 min read

Mayank Ketkar

Feb 15

The Ghost in the Batch: How vLLM Silently Switches Algorithms

#vllm #machinelearning #gpu #determinism

5 min read

Mayank Ketkar

Feb 9

Compiling the Vision Encoder: Squeezing 3% More Throughput from Qwen3-VL on Hopper GPUs

#vllm #pytorch #gpu #machinelearning

11 min read

Ben

Feb 1

Session 1: vLLM Overview and the User API

#vllm #llm #python #machinelearning

12 min read

Ben

Feb 1

vLLM — Session 2: The Engine Layer — Request Management

#vllm #llm #python #machinelearning

13 min read

Cover image for Pare de Brincar com LLMs Locais: Leve a IAG Open Source para a Produção na Magalu Cloud

Gláucio for Magalu Cloud

Feb 5

Pare de Brincar com LLMs Locais: Leve a IAG Open Source para a Produção na Magalu Cloud

#ai #llm #vllm #docker

22 min read

Donald Cruver

Feb 5

Running Claude Code with Local LLMs via vLLM and LiteLLM

#claudecode #vllm #selfhosted #ai

6 min read

👋 Sign in for the ability to sort posts by relevant, latest, or top.