AI Gateway

One gateway,
every model.

Bring public providers, custom endpoints, and the models you run yourself into one hosted gateway, with observability and access control built in.

Read the docs

One URL, any SDK.

Change your baseURL to https://gateway.ngrok.ai and swap your API key.
That’s it. Start routing, not deploying infrastructure.

1import OpenAI from "openai";2 3const client = new OpenAI({⋯4	baseURL: "https://gateway.ngrok.ai",5	apiKey: process.env.NGROK_AI_ACCESS_KEY,6});7 8const completion = await client.chat.completions.create({⋯9	// Route to a self-hosted model first10	model: "unsloth/Qwen3.5-1228-A10B-GGUF",11	// Fallback to public providers12	models: ["gpt-5.4", "claude-opus-4-6"],13	messages: [⋯14		{ role: "system", content: "Talk like a pirate." },15		{ role: "user", content: "Are semicolons optional in JavaScript?" },16	],17	stream: true,18});

Local LLMs

Privately connect to your self-hosted models.

Route to any local LLM reachable from your AI Gateway without wrangling public IPs or inbound ports. Keep it on private connectivity, same as any other model.

Read the local model guide

Diagram of self-hosted LLM providers connected through ngrok

Key management

Bring the keys you’re already paying for.

Drop your OpenAI, Anthropic, or custom provider keys to route through them. You pay them directly, keep your current rates, and manage every key in one place.

Learn how BYOK works

Diagram of API keys from multiple providers managed in one place

Access control

Decide who or what gets access.

Give each app or developer separate access keys, then set which providers and models it’s allowed to call. No more passing around one key that opens everything.

See how scoped keys work

Diagram showing scoped access from an app through ngrok to AI providers

Observability

See every call, know exactly what it costs.

Provider dashboards show what you spent, not which app, dev, or model spent it. See tokens, latency, and errors rolled up across every call you route.

See what you can measure

Dashboard showing AI gateway usage and cost by provider and model

Features

Smart defaults. Open levers.

Get started instantly. Light up advanced features without ever re-instrumenting your app.

Reroute before users notice

When a model or key slows or fails, requests reroute to a healthy alternative you've defined. Users never notice.

Retry without the code

When a request fails, we retry it instantly. Apps keep running without handling errors on your end.

Fully programmable

Configure your AI gateway entirely through APIs. Call it from your coding agent, Terraform, CLI, or custom tooling.

Read the docs for more

Pricing

One flat fee for routing, observability, and the rest.

$0.05

Per million tokens plus the cost of your inference

Use ngrok keys and we pass inference through at cost. Bring your own and your provider bills you directly. Either way, you buy credits up front and we draw down as you route. No subscriptions or commitments.

Move fast & make things.

Build with the gateway that works with every model you use.
Even the ones you run yourself.