Supported APIs

Agent Router supports three API formats: the OpenAI Chat Completions API, the OpenAI Responses API, and the Anthropic Messages API. All gateway features (routing, fallback policies, traffic splitting, cost tracking, and observability) apply equally regardless of the format chosen. Applications send requests in one format, and the gateway handles provider translation transparently, normalising responses and errors back to the format that was requested. For new projects with no existing SDK preference, Chat Completions offers the widest ecosystem compatibility.

Endpoint summary

Format	Path	SDK Method	Streaming
OpenAI Chat Completions	`/v1/chat/completions`	`client.chat.completions.create()`	`stream=True`
OpenAI Responses	`/v1/responses`	`client.responses.create()`	`stream=True`
Anthropic Messages	`/v1/messages`	`client.messages.create()`	`stream=True`

In the examples below, replace PROXY_URL with the proxy endpoint shown on the Console Dashboard (for example, https://proxy.poc.tetrate.ai/v1) and YOUR_API_KEY with a key from API Keys.

Chat Completions API (`/v1/chat/completions`)

The most widely supported format, compatible with OpenAI and most third-party SDKs.

Non-streaming

curl https://PROXY_URL/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello, world!"}]
  }'

from openai import OpenAI

client = OpenAI(
    base_url="https://PROXY_URL/v1",
    api_key="YOUR_API_KEY",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello, world!"}],
)

print(response.choices[0].message.content)

Streaming

curl https://PROXY_URL/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello, world!"}],
    "stream": true
  }'

from openai import OpenAI

client = OpenAI(
    base_url="https://PROXY_URL/v1",
    api_key="YOUR_API_KEY",
)

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello, world!"}],
    stream=True,
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

SSE format

Chat Completions streaming uses data-only SSE. Each event is a data: line containing a JSON object, terminated by data: [DONE]:

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"}}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{"content":" world"}}]}

data: [DONE]

To receive token usage in the stream, add "stream_options": {"include_usage": true} to the request. Usage appears in the final chunk before [DONE]:

{"prompt_tokens": 10, "completion_tokens": 5, "total_tokens": 15}

Responses API (`/v1/responses`)

The newer OpenAI Responses API provides a simplified interface with semantic streaming events.

Non-streaming

curl https://PROXY_URL/v1/responses \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "input": "Hello, world!"
  }'

from openai import OpenAI

client = OpenAI(
    base_url="https://PROXY_URL/v1",
    api_key="YOUR_API_KEY",
)

response = client.responses.create(
    model="gpt-4o",
    input="Hello, world!",
)

print(response.output_text)

Streaming

curl https://PROXY_URL/v1/responses \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "input": "Hello, world!",
    "stream": true
  }'

from openai import OpenAI

client = OpenAI(
    base_url="https://PROXY_URL/v1",
    api_key="YOUR_API_KEY",
)

stream = client.responses.create(
    model="gpt-4o",
    input="Hello, world!",
    stream=True,
)

for event in stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="", flush=True)

SSE format

Responses API streaming uses semantic event: plus data: lines. Each event has a named type describing what happened:

event: response.created
data: {"id":"resp_...","object":"response","status":"in_progress"}

event: response.output_item.added
data: {"item":{"id":"msg_...","type":"message","role":"assistant"}}

event: response.output_text.delta
data: {"delta":"Hello"}

event: response.output_text.delta
data: {"delta":" world"}

event: response.output_text.done
data: {"text":"Hello world"}

event: response.completed
data: {"id":"resp_...","status":"completed","usage":{"input_tokens":10,"output_tokens":5}}

Differences from Chat Completions

Aspect	Chat Completions	Responses API
Input field	`messages` array	`input` (string or array)
Usage fields	`prompt_tokens` / `completion_tokens`	`input_tokens` / `output_tokens`
SSE format	Data-only (`data: {...}`) with `data: [DONE]` sentinel	Semantic events (`event: response.created`, etc.)
Stream usage	Opt-in via `stream_options.include_usage`	Always in `response.completed` event
Response access	`response.choices[0].message.content`	`response.output_text`

Anthropic Messages API (`/v1/messages`)

For applications built with the Anthropic SDK. The gateway accepts standard Authorization: Bearer headers; the Anthropic-native x-api-key header is not required.

Non-streaming

curl https://PROXY_URL/v1/messages \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-20250514",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Hello, world!"}]
  }'

from anthropic import Anthropic

client = Anthropic(
    base_url="https://PROXY_URL/v1",
    auth_token="YOUR_API_KEY",
)

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello, world!"}],
)

print(response.content[0].text)

Streaming

curl https://PROXY_URL/v1/messages \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-20250514",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Hello, world!"}],
    "stream": true
  }'

from anthropic import Anthropic

client = Anthropic(
    base_url="https://PROXY_URL/v1",
    auth_token="YOUR_API_KEY",
)

with client.messages.stream(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello, world!"}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

SSE format

Anthropic Messages streaming uses semantic event: plus data: lines with block-level granularity:

event: message_start
data: {"type":"message_start","message":{"id":"msg_...","role":"assistant","model":"claude-sonnet-4-20250514"}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" world"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":5}}

event: message_stop
data: {"type":"message_stop"}

note

When using the gateway, authenticate with Authorization: Bearer YOUR_API_KEY instead of the Anthropic-native x-api-key header. The gateway translates the auth header before forwarding to the provider.

Supported endpoint types

The gateway handles eight endpoint types. All use the same model-based routing logic; fallback policies and traffic splitting apply equally across every endpoint.

Endpoint	Path	Description
Chat Completions	`/v1/chat/completions`	Standard chat interface (OpenAI-compatible)
Completions	`/v1/completions`	Legacy text completions
Responses	`/v1/responses`	OpenAI Responses API
Messages	`/v1/messages`	Anthropic Messages API
Embeddings	`/v1/embeddings`	Text embeddings
Images	`/v1/images`	Image generation
Rerank	`/v1/rerank`	Reranking
Models	`/v1/models`	List available models

Provider translation

The gateway automatically translates between the canonical (OpenAI-compatible) schema and 25+ provider-specific APIs. Applications send requests in one format, and the gateway handles all conversions transparently.

Translated elements:

Request body: field names, structure, and defaults adjusted per provider
Path: endpoint paths mapped to provider conventions
Headers: authentication and provider-specific headers set automatically
Response format: provider responses normalised back to the format that was requested

For example, an OpenAI Chat Completions request that routes to Anthropic Claude is translated to the Anthropic Messages format before being forwarded, and the response is translated back to Chat Completions format. The application never sees the difference.

No configuration is needed; translation is built into the gateway. For how errors are normalised across providers, see Gateway Behavior.

Protocols

The gateway supports REST (HTTPS) for all inference traffic. This is the only protocol needed to use any endpoint.

gRPC is used internally for OpenTelemetry (OTLP) telemetry export. See OpenTelemetry Export for the configuration surface. WebSocket and gRPC for inference are not currently supported.

Choosing an API format

Use case	Recommended format
Widest SDK and tool compatibility	Chat Completions
New OpenAI projects with the simplified interface	Responses API
Anthropic Claude-native applications	Anthropic Messages
Agent frameworks (LangChain, CrewAI, and similar)	Chat Completions
Code assistants (Cursor, Cline, Aider)	Chat Completions
Streaming with semantic events	Responses API or Anthropic Messages

All three formats support the same gateway features. The choice is driven by SDK preference and provider ecosystem alignment.

Integrate the Gateway with an App: developer-side integration patterns for each SDK
Gateway Behavior: request handling, error semantics, and routing resolution rules

Where to go next

API reference

Full endpoint, request, and response reference.

Gateway behavior

How requests are handled, normalised, and routed.

Endpoint summary​

Chat Completions API (/v1/chat/completions)​

Non-streaming​

Streaming​

SSE format​

Responses API (/v1/responses)​

Non-streaming​

Streaming​

SSE format​

Differences from Chat Completions​

Anthropic Messages API (/v1/messages)​

Non-streaming​

Streaming​

SSE format​

Supported endpoint types​

Provider translation​

Protocols​

Choosing an API format​

Related​

Endpoint summary

Chat Completions API (`/v1/chat/completions`)

Non-streaming

Streaming

SSE format

Responses API (`/v1/responses`)

Non-streaming

Streaming

SSE format

Differences from Chat Completions

Anthropic Messages API (`/v1/messages`)

Non-streaming

Streaming

SSE format

Supported endpoint types

Provider translation

Protocols

Choosing an API format

Related