Proxy Integration
Route your AI agent's LLM calls through the Kurral proxy for automatic observability and security scanning. Zero changes to your agent's core logic.
Route your AI agent's LLM calls through the Kurral proxy for automatic observability and security scanning. Zero changes to your agent's core logic.
How It Works
Your Agent ──▶ Kurral Proxy ──▶ LLM Provider (OpenAI / Anthropic / Gemini)
│
Captures every call:
tokens, cost, latency,
request/response content
Instead of calling api.openai.com or api.anthropic.com directly, your agent calls the Kurral proxy. The proxy authenticates the request, forwards it to the real provider, captures observability data, and returns the response unchanged.
Prerequisites
- A Kurral API key (
kr_live_...) from the dashboard - An agent registered in the dashboard (note the agent key)
- Your LLM provider API key (OpenAI, Anthropic, or Gemini)
Quick Start by Provider
OpenAI
Before (direct):
from openai import OpenAI
client = OpenAI() # calls api.openai.com
After (through proxy):
from openai import OpenAI
client = OpenAI(
base_url="https://kurral-api.onrender.com/api/proxy/openai/v1",
api_key="sk-your-openai-key", # still your real OpenAI key
default_headers={
"X-Kurral-API-Key": "kr_live_your-kurral-key",
"x-kurral-agent": "your-agent-key",
},
)
Anthropic
Before (direct):
import anthropic
client = anthropic.Anthropic() # calls api.anthropic.com
After (through proxy):
import anthropic
client = anthropic.Anthropic(
base_url="https://kurral-api.onrender.com/api/proxy/anthropic",
api_key="sk-ant-your-anthropic-key", # still your real Anthropic key
default_headers={
"X-Kurral-API-Key": "kr_live_your-kurral-key",
"x-kurral-agent": "your-agent-key",
},
)
Gemini
Replace the base URL in your HTTP calls:
Before:
POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=YOUR_KEY
After:
POST https://kurral-api.onrender.com/api/proxy/google/v1beta/models/gemini-2.0-flash:generateContent?key=YOUR_KEY
Headers:
X-Kurral-API-Key: kr_live_your-kurral-key
x-kurral-agent: your-agent-key
Environment Variables
Keep your config clean:
# .env
KURRAL_API_KEY=kr_live_your-kurral-key
KURRAL_API_URL=https://kurral-api.onrender.com
KURRAL_AGENT_KEY=your-agent-key
# Provider key (unchanged)
ANTHROPIC_API_KEY=sk-ant-your-key
# or
OPENAI_API_KEY=sk-your-key
import os
KURRAL_API_URL = os.getenv("KURRAL_API_URL", "https://kurral-api.onrender.com")
KURRAL_API_KEY = os.getenv("KURRAL_API_KEY")
KURRAL_AGENT_KEY = os.getenv("KURRAL_AGENT_KEY")
client = anthropic.Anthropic(
base_url=f"{KURRAL_API_URL}/api/proxy/anthropic",
default_headers={
"X-Kurral-API-Key": KURRAL_API_KEY,
"x-kurral-agent": KURRAL_AGENT_KEY,
},
)
Required Headers
Every proxy request must include:
| Header | Value | Purpose |
|---|---|---|
X-Kurral-API-Key | kr_live_... | Kurral authentication (use this) |
X-API-Key | kr_live_... | Kurral authentication (legacy, deprecated Q3 2026) |
x-kurral-agent | Agent key (min 3 chars) | Identifies which agent made the call |
| Provider auth | Varies (see below) | Forwarded to the LLM provider |
Note:X-API-Keyis supported for backward compatibility but will be deprecated in Q3 2026. UseX-Kurral-API-Keyto avoid header collisions, especially with Anthropic which usesx-api-keyfor provider auth.
Provider authentication is passed through unchanged:
- OpenAI:
Authorization: Bearer sk-...(set automatically by the SDK) - Anthropic:
x-api-key: sk-ant-...(set automatically by the SDK) - Gemini:
x-goog-api-keyheader or?key=query parameter
Note: Kurral strips its own headers before forwarding to the provider. Your provider API key is never stored by Kurral.
Optional Headers
| Header | Values | Default | Purpose |
|---|---|---|---|
x-kurral-session-id | Any UUID | Auto-generated | Group multiple LLM calls into one session |
x-kurral-retention | none, metadata, full | metadata | Controls what request/response content is stored |
x-kurral-env | production, staging, development | production | Environment tag for filtering |
Session Grouping
By default, each LLM call creates its own session. To group a multi-turn conversation into a single session:
import uuid
session_id = str(uuid.uuid4())
# All calls with this session_id appear as one session in the dashboard
response1 = client.messages.create(
...,
extra_headers={"x-kurral-session-id": session_id},
)
response2 = client.messages.create(
...,
extra_headers={"x-kurral-session-id": session_id},
)
Data Retention
Control what Kurral stores with the x-kurral-retention header:
| Level | What's Stored |
|---|---|
none | Only metadata (model, tokens, cost, latency) |
metadata | Metadata + truncated content summaries |
full | Complete request and response bodies |
Streaming
Streaming works transparently. The proxy forwards chunks in real-time while capturing the full response for observability.
# Streaming works exactly the same as direct calls
with client.messages.stream(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}],
) as stream:
for text in stream.text_stream:
print(text, end="")
For streaming calls, Kurral also captures time-to-first-token (TTFT) in addition to total latency.
Proxy Endpoints Reference
| Proxy Path | Upstream Provider |
|---|---|
/api/proxy/openai/v1/chat/completions | OpenAI Chat Completions |
/api/proxy/openai/v1/responses | OpenAI Responses API |
/api/proxy/anthropic/v1/messages | Anthropic Messages API |
/api/proxy/google/v1beta/models/{model}:generateContent | Gemini |
/api/proxy/google/v1beta/models/{model}:streamGenerateContent | Gemini (streaming) |
/api/proxy/health | Health check |
What Gets Captured
Every LLM call through the proxy automatically records:
- Token usage — input tokens, output tokens, total
- Cost — calculated from model-specific pricing
- Latency — total request time and time-to-first-token (streaming)
- Model — which model was used
- Agent — which agent made the call
- Session — grouped conversation context
- Content — request/response bodies (based on retention setting)
- Tool interactions — tool definitions, tool call arguments (from LLM responses), and tool results (from follow-up requests) are all part of the LLM conversation and captured automatically
Because tool calling in OpenAI, Anthropic, and Gemini flows through the messages API — the LLM requests a tool call, the agent executes it, and the result is sent back in the next message — the proxy sees the full tool interaction loop without any additional instrumentation.
All of this appears in the Kurral dashboard under the agent's session list.
Rate Limits
The proxy enforces per-user rate limits:
| Limit | Default |
|---|---|
| Requests per minute | 60 |
| Max concurrent requests | 10 |
| Max request body size | 10 MB |
| Request timeout | 120 seconds |
What Stays the Same
After rewiring to the proxy:
- All
client.messages.create()/client.chat.completions.create()calls — unchanged - Streaming calls — unchanged
- Tool use / function calling — unchanged
- Your MCP server — unchanged (tool execution happens locally; the proxy captures tool interactions via the LLM conversation)
What You Can Remove
Once routed through the proxy, manual observability code becomes unnecessary:
- Manual session upload functions — proxy captures this automatically
- Manual token counting — proxy extracts tokens from provider responses
- Manual cost calculation — proxy calculates cost per model
POST /api/sessions/calls — replaced by automatic proxy capture
Troubleshooting
| Error | Cause | Fix |
|---|---|---|
401 Unauthorized | Missing or invalid Kurral API key | Use X-Kurral-API-Key (preferred) with your kr_live_... key |
400 x-kurral-agent header required | Missing agent header | Add x-kurral-agent to default headers |
429 Rate limit exceeded | Too many requests per minute | Reduce request rate or contact support |
502 Bad Gateway | Upstream provider error | Check your provider API key is valid |
| Provider auth error | Provider key not forwarded | Ensure your provider API key is set (SDK reads from env) |