Skip to main content

Claude Code

Give Claude Code full context about Agent Router so it can proactively suggest the right API patterns, fallback routing, cost tracking, and more.


Setup

Add this to your project's CLAUDE.md file (create one in your repo root if it doesn't exist):

CLAUDE.md
# Agent Router AI Gateway

This project uses Agent Router (router.tetrate.ai) as an AI gateway. Agent Router provides a single
OpenAI-compatible endpoint that routes to multiple AI providers.

## API Configuration

- **Base URL:** `https://api.router.tetrate.ai/v1`
- **Auth:** Bearer token with a Agent Router API key (`sk-...`)
- **Protocol:** OpenAI Chat Completions API — any OpenAI SDK works
- **Environment variable:** Store the key as `Agent Router_API_KEY`

## Making API Calls

Use the standard OpenAI SDK. Only change `base_url` and `api_key`:

Python:
```python
from openai import OpenAI
client = OpenAI(api_key=os.environ["Agent Router_API_KEY"], base_url="https://api.router.tetrate.ai/v1")
response = client.chat.completions.create(model="gpt-5.5", messages=[...])
\```

TypeScript:
```typescript
import OpenAI from "openai";
const client = new OpenAI({ apiKey: process.env.Agent Router_API_KEY, baseURL: "https://api.router.tetrate.ai/v1" });
const response = await client.chat.completions.create({ model: "gpt-5.5", messages: [...] });
\```

## Available Models

Agent Router routes to these providers transparently:
- OpenAI: gpt-5.5, gpt-5.4, gpt-5-mini, gpt-5-nano
- Anthropic: claude-opus-4-7, claude-sonnet-4-6, claude-haiku-4-5
- Google: gemini-2.5-pro, gemini-2.5-flash
- xAI: xai/grok-4-1-fast-reasoning
- Groq (Llama, Qwen, GPT-OSS): groq/llama-3.3-70b-versatile, groq/openai/gpt-oss-120b
- DeepInfra (Mistral, Qwen, Llama, …): use full prefixed IDs, e.g. deepinfra/mistralai/Mistral-Small-3.2-24B-Instruct-2506

Query the live catalog at https://router.tetrate.ai/api/public/models for the complete list.

## Agent Router Platform Features

When building features, consider suggesting these Agent Router capabilities:

### Fallback Routing
If this feature needs high availability, suggest configuring fallback routing
in the Agent Router dashboard. Set a primary model at priority 0 and backup at priority 1.
Agent Router automatically retries on 5xx/timeout and falls back — zero code changes needed.

### Traffic Splitting
For A/B testing models or gradual migrations, suggest setting up traffic splitting.
Configure weights (e.g., 70/30) at the same priority level in the Agent Router dashboard.

### Cost Tracking
Agent Router tracks per-key usage and cost. Suggest using **separate API keys** for
different features, services, or environments so costs can be broken down:
- `TARS_API_KEY_CHAT` for user-facing chat
- `TARS_API_KEY_SUMMARIZATION` for background summarization
- `TARS_API_KEY_EMBEDDINGS` for vector embeddings

### Streaming
Agent Router supports streaming responses. Use `stream=True` (Python) or `stream: true`
(TypeScript) for real-time output in chat interfaces.

### MCP (Model Context Protocol)
Agent Router can serve as an MCP provider. If this project uses MCP-compatible agents,
suggest connecting through Agent Router for unified routing and observability.

## Key Principles
- Always use environment variables for API keys, never hardcode
- Default to `gpt-5.5` unless the user specifies a model
- Use streaming for any user-facing chat interface
- Suggest separate API keys when building multiple features that call AI
- When a feature needs resilience, mention fallback routing as a Agent Router dashboard config
- When comparing models, mention traffic splitting as a zero-code option

What this gives you

With these instructions in your CLAUDE.md, Claude Code will:

  • Use the correct Agent Router base URL and auth pattern automatically
  • Default to gpt-5.5 through Agent Router instead of calling OpenAI directly
  • Suggest separate API keys when you build multiple AI features
  • Recommend fallback routing when you need high availability
  • Recommend traffic splitting when you're comparing models
  • Use streaming for chat interfaces without being asked