Metrics & Observability
Kurral captures latency, cost, token usage, and reliability metrics for every LLM call across all your agents. Use the dashboard to monitor performance, identify regressions, and control spending.
Kurral captures latency, cost, token usage, and reliability metrics for every LLM call across all your agents. Use the dashboard to monitor performance, identify regressions, and control spending.
What Gets Measured
Every LLM call through the Kurral proxy records:
| Metric | Description |
|---|---|
| Latency | Total request time (ms) |
| TTFT | Time to first token for streaming calls (ms) |
| Input tokens | Tokens in the request |
| Output tokens | Tokens in the response |
| Cost | Calculated from provider-specific pricing per model |
| Model | Which model was used |
| Status | Success, error, timeout |
Dashboard Views
Overview Metrics
Top-level cards showing:
- Total sessions — across all agents
- Total tokens — input + output
- Total cost — aggregated across all providers
- Avg latency — mean response time
- P50 / P90 / P99 latency — latency percentiles
- Active models — models in use across agents
Latency Breakdown
- By model — compare latency across GPT-4o, Claude Sonnet, Gemini, etc.
- By use case — latency grouped by semantic bucket (if using SDK tracing)
- Percentiles — P50, P90, P99 distribution
Cost Analysis
- By model — which models are consuming the most budget
- By environment — production vs. staging vs. development spend
- Time series — daily cost trend
Usage Patterns
- Top consumers — which agents use the most tokens
- Top use cases — which workflows drive the most cost
Agent-Level Metrics
Each agent has its own metrics accessible via the dashboard or API.
Dashboard
Go to Agents → click an agent → Overview tab. Shows:
- 7-day session volume bar chart (real data)
- Total sessions, tokens, cost
- Average and percentile latency
- Security score average
API
GET /api/web/agents/{agent_id}/metrics?days=7
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
days | integer | 7 | Lookback period (1-90) |
Response:
{
"period_days": 7,
"total_sessions": 142,
"total_tokens": 89420,
"total_cost": 1.2340,
"avg_latency_ms": 1250,
"p50_latency_ms": 980,
"p90_latency_ms": 2100,
"p99_latency_ms": 4500,
"avg_security_score": 78.5,
"total_scans": 3,
"daily_breakdown": {
"2025-01-08": {"count": 18, "tokens": 12400, "cost": 0.18},
"2025-01-09": {"count": 22, "tokens": 14200, "cost": 0.21}
}
}
Flight Recorder
The Flight Recorder is Kurral's session detail view. Click any session to see a timeline of every event in the agent execution:
Timeline Lanes
| Lane | What it Shows |
|---|---|
| User Input | The initial query or message |
| LLM | Model calls with input/output |
| Tool Execution | Tool calls with parameters and results |
| Policy | Safety gate blocks, rate limits |
| Outcome | Final agent response |
Event Details
Click any event in the timeline to see:
- Full input/output data
- Timing (start, duration)
- Token usage for LLM events
- Tool parameters and return values
- Error messages and stack traces
Bookmarks
Mark important events for quick reference during investigation.
Observability Summary API
GET /api/web/observe/summary?range=7d
Returns aggregated observability metrics:
{
"total_runs": 452,
"success_rate": 94.2,
"error_rate": 5.8,
"avg_latency_ms": 1340,
"p95_latency_ms": 3200,
"total_tokens": 234000,
"total_cost": 12.45,
"reliability_score": 87,
"top_tools_by_error": [
{"tool": "search_orders", "error_rate": 2.1}
],
"cost_by_model": [
{"model": "gpt-4o", "cost": 8.20},
{"model": "claude-sonnet-4-5", "cost": 4.25}
]
}
Cost Calculation
Kurral calculates cost using provider-specific pricing tables. Costs are computed per-call based on:
- Model used
- Input token count
- Output token count
- Provider pricing at time of call
Cost is shown in USD and broken down at the session, agent, and account level.