One gateway,
every model.
Bring public providers, custom endpoints, and the models you run yourself into one hosted gateway, with observability and access control built in.
One URL, any SDK.
Change your baseURL to https://gateway.ngrok.ai and swap your API key.
That’s it. Start routing, not deploying infrastructure.
1import OpenAI from "openai";2 3const client = new OpenAI({4 baseURL: "https://gateway.ngrok.ai",5 apiKey: process.env.NGROK_AI_ACCESS_KEY,6});7 8const completion = await client.chat.completions.create({9 // Route to a self-hosted model first10 model: "unsloth/Qwen3.5-1228-A10B-GGUF",11 // Fallback to public providers12 models: ["gpt-5.4", "claude-opus-4-6"],13 messages: [14 { role: "system", content: "Talk like a pirate." },15 { role: "user", content: "Are semicolons optional in JavaScript?" },16 ],17 stream: true,18});Privately connect to your self-hosted models.
Route to any local LLM reachable from your AI Gateway without wrangling public IPs or inbound ports. Keep it on private connectivity, same as any other model.
Read the local model guideBring the keys you’re already paying for.
Drop your OpenAI, Anthropic, or custom provider keys to route through them. You pay them directly, keep your current rates, and manage every key in one place.
Learn how BYOK worksDecide who or what gets access.
Give each app or developer separate access keys, then set which providers and models it’s allowed to call. No more passing around one key that opens everything.
See how scoped keys workSee every call, know exactly what it costs.
Provider dashboards show what you spent, not which app, dev, or model spent it. See tokens, latency, and errors rolled up across every call you route.
See what you can measureSmart defaults. Open levers.
Get started instantly. Light up advanced features without ever re-instrumenting your app.
Reroute before users notice
When a model or key slows or fails, requests reroute to a healthy alternative you've defined. Users never notice.
Retry without the code
When a request fails, we retry it instantly. Apps keep running without handling errors on your end.
Fully programmable
Configure your AI gateway entirely through APIs. Call it from your coding agent, Terraform, CLI, or custom tooling.
One flat fee for routing, observability, and the rest.
$0.05
Use ngrok keys and we pass inference through at cost. Bring your own and your provider bills you directly. Either way, you buy credits up front and we draw down as you route. No subscriptions or commitments.
Move fast & make things.
Build with the gateway that works with every model you use.
Even the ones you run yourself.