Integration

Proxy Integration

Route your AI agent's LLM calls through the Kurral proxy for automatic observability and security scanning. Zero changes to your agent's core logic.

How It Works

Your Agent  ──▶  Kurral Proxy  ──▶  LLM Provider (OpenAI / Anthropic / Gemini)
                    │
             Captures every call:
             tokens, cost, latency,
             request/response content

Instead of calling api.openai.com or api.anthropic.com directly, your agent calls the Kurral proxy. The proxy authenticates the request, forwards it to the real provider, captures observability data, and returns the response unchanged.

Prerequisites

A Kurral API key (kr_live_...) from the dashboard
An agent registered in the dashboard (note the agent key)
Your LLM provider API key (OpenAI, Anthropic, or Gemini)

Quick Start by Provider

OpenAI

Before (direct):

from openai import OpenAI
client = OpenAI()  # calls api.openai.com

After (through proxy):

from openai import OpenAI

client = OpenAI(
    base_url="https://kurral-api.onrender.com/api/proxy/openai/v1",
    api_key="sk-your-openai-key",  # still your real OpenAI key
    default_headers={
        "X-Kurral-API-Key": "kr_live_your-kurral-key",
        "x-kurral-agent": "your-agent-key",
    },
)

Anthropic

Before (direct):

import anthropic
client = anthropic.Anthropic()  # calls api.anthropic.com

After (through proxy):

import anthropic

client = anthropic.Anthropic(
    base_url="https://kurral-api.onrender.com/api/proxy/anthropic",
    api_key="sk-ant-your-anthropic-key",  # still your real Anthropic key
    default_headers={
        "X-Kurral-API-Key": "kr_live_your-kurral-key",
        "x-kurral-agent": "your-agent-key",
    },
)

Gemini

Replace the base URL in your HTTP calls:

Before:

POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=YOUR_KEY

After:

POST https://kurral-api.onrender.com/api/proxy/google/v1beta/models/gemini-2.0-flash:generateContent?key=YOUR_KEY

Headers:
  X-Kurral-API-Key: kr_live_your-kurral-key
  x-kurral-agent: your-agent-key

Environment Variables

Keep your config clean:

# .env
KURRAL_API_KEY=kr_live_your-kurral-key
KURRAL_API_URL=https://kurral-api.onrender.com
KURRAL_AGENT_KEY=your-agent-key

# Provider key (unchanged)
ANTHROPIC_API_KEY=sk-ant-your-key
# or
OPENAI_API_KEY=sk-your-key

import os

KURRAL_API_URL = os.getenv("KURRAL_API_URL", "https://kurral-api.onrender.com")
KURRAL_API_KEY = os.getenv("KURRAL_API_KEY")
KURRAL_AGENT_KEY = os.getenv("KURRAL_AGENT_KEY")

client = anthropic.Anthropic(
    base_url=f"{KURRAL_API_URL}/api/proxy/anthropic",
    default_headers={
        "X-Kurral-API-Key": KURRAL_API_KEY,
        "x-kurral-agent": KURRAL_AGENT_KEY,
    },
)

Required Headers

Every proxy request must include:

Header	Value	Purpose
`X-Kurral-API-Key`	`kr_live_...`	Kurral authentication (use this)
`X-API-Key`	`kr_live_...`	Kurral authentication (legacy, deprecated Q3 2026)
`x-kurral-agent`	Agent key (min 3 chars)	Identifies which agent made the call
Provider auth	Varies (see below)	Forwarded to the LLM provider

Note: X-API-Key is supported for backward compatibility but will be deprecated in Q3 2026. Use X-Kurral-API-Key to avoid header collisions, especially with Anthropic which uses x-api-key for provider auth.

Provider authentication is passed through unchanged:

OpenAI: Authorization: Bearer sk-... (set automatically by the SDK)
Anthropic: x-api-key: sk-ant-... (set automatically by the SDK)
Gemini: x-goog-api-key header or ?key= query parameter

Note: Kurral strips its own headers before forwarding to the provider. Your provider API key is never stored by Kurral.

Optional Headers

Header	Values	Default	Purpose
`x-kurral-session-id`	Any UUID	Auto-generated	Group multiple LLM calls into one session
`x-kurral-retention`	`none`, `metadata`, `full`	`metadata`	Controls what request/response content is stored
`x-kurral-env`	`production`, `staging`, `development`	`production`	Environment tag for filtering

Session Grouping

By default, each LLM call creates its own session. To group a multi-turn conversation into a single session:

import uuid

session_id = str(uuid.uuid4())

# All calls with this session_id appear as one session in the dashboard
response1 = client.messages.create(
    ...,
    extra_headers={"x-kurral-session-id": session_id},
)

response2 = client.messages.create(
    ...,
    extra_headers={"x-kurral-session-id": session_id},
)

Data Retention

Control what Kurral stores with the x-kurral-retention header:

Level	What's Stored
`none`	Only metadata (model, tokens, cost, latency)
`metadata`	Metadata + truncated content summaries
`full`	Complete request and response bodies

Streaming

Streaming works transparently. The proxy forwards chunks in real-time while capturing the full response for observability.

# Streaming works exactly the same as direct calls
with client.messages.stream(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
) as stream:
    for text in stream.text_stream:
        print(text, end="")

For streaming calls, Kurral also captures time-to-first-token (TTFT) in addition to total latency.

Proxy Endpoints Reference

Proxy Path	Upstream Provider
`/api/proxy/openai/v1/chat/completions`	OpenAI Chat Completions
`/api/proxy/openai/v1/responses`	OpenAI Responses API
`/api/proxy/anthropic/v1/messages`	Anthropic Messages API
`/api/proxy/google/v1beta/models/{model}:generateContent`	Gemini
`/api/proxy/google/v1beta/models/{model}:streamGenerateContent`	Gemini (streaming)
`/api/proxy/health`	Health check

What Gets Captured

Every LLM call through the proxy automatically records:

Token usage — input tokens, output tokens, total
Cost — calculated from model-specific pricing
Latency — total request time and time-to-first-token (streaming)
Model — which model was used
Agent — which agent made the call
Session — grouped conversation context
Content — request/response bodies (based on retention setting)
Tool interactions — tool definitions, tool call arguments (from LLM responses), and tool results (from follow-up requests) are all part of the LLM conversation and captured automatically

Because tool calling in OpenAI, Anthropic, and Gemini flows through the messages API — the LLM requests a tool call, the agent executes it, and the result is sent back in the next message — the proxy sees the full tool interaction loop without any additional instrumentation.

All of this appears in the Kurral dashboard under the agent's session list.

Rate Limits

The proxy enforces per-user rate limits:

Limit	Default
Requests per minute	60
Max concurrent requests	10
Max request body size	10 MB
Request timeout	120 seconds

What Stays the Same

After rewiring to the proxy:

All client.messages.create() / client.chat.completions.create() calls — unchanged
Streaming calls — unchanged
Tool use / function calling — unchanged
Your MCP server — unchanged (tool execution happens locally; the proxy captures tool interactions via the LLM conversation)

What You Can Remove

Once routed through the proxy, manual observability code becomes unnecessary:

Manual session upload functions — proxy captures this automatically
Manual token counting — proxy extracts tokens from provider responses
Manual cost calculation — proxy calculates cost per model
POST /api/sessions/ calls — replaced by automatic proxy capture

Troubleshooting

Error	Cause	Fix
`401 Unauthorized`	Missing or invalid Kurral API key	Use `X-Kurral-API-Key` (preferred) with your `kr_live_...` key
`400 x-kurral-agent header required`	Missing agent header	Add `x-kurral-agent` to default headers
`429 Rate limit exceeded`	Too many requests per minute	Reduce request rate or contact support
`502 Bad Gateway`	Upstream provider error	Check your provider API key is valid
Provider auth error	Provider key not forwarded	Ensure your provider API key is set (SDK reads from env)

Getting Started

Metrics & Observability