Supported APIs
Agent Router supports three API formats: the OpenAI Chat Completions API, the OpenAI Responses API, and the Anthropic Messages API. All gateway features (routing, fallback policies, traffic splitting, cost tracking, and observability) apply equally regardless of the format chosen. Applications send requests in one format, and the gateway handles provider translation transparently, normalising responses and errors back to the format that was requested. For new projects with no existing SDK preference, Chat Completions offers the widest ecosystem compatibility.
Endpoint summary
| Format | Path | SDK Method | Streaming |
|---|---|---|---|
| OpenAI Chat Completions | /v1/chat/completions | client.chat.completions.create() | stream=True |
| OpenAI Responses | /v1/responses | client.responses.create() | stream=True |
| Anthropic Messages | /v1/messages | client.messages.create() | stream=True |
In the examples below, replace PROXY_URL with the proxy endpoint shown on the Console Dashboard (for example, https://proxy.poc.tetrate.ai/v1) and YOUR_API_KEY with a key from API Keys.
Chat Completions API (/v1/chat/completions)
The most widely supported format, compatible with OpenAI and most third-party SDKs.
Non-streaming
curl https://PROXY_URL/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello, world!"}]
}'
from openai import OpenAI
client = OpenAI(
base_url="https://PROXY_URL/v1",
api_key="YOUR_API_KEY",
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello, world!"}],
)
print(response.choices[0].message.content)
Streaming
curl https://PROXY_URL/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello, world!"}],
"stream": true
}'
from openai import OpenAI
client = OpenAI(
base_url="https://PROXY_URL/v1",
api_key="YOUR_API_KEY",
)
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello, world!"}],
stream=True,
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)
SSE format
Chat Completions streaming uses data-only SSE. Each event is a data: line containing a JSON object, terminated by data: [DONE]:
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"}}]}
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{"content":" world"}}]}
data: [DONE]
To receive token usage in the stream, add "stream_options": {"include_usage": true} to the request. Usage appears in the final chunk before [DONE]:
{"prompt_tokens": 10, "completion_tokens": 5, "total_tokens": 15}
Responses API (/v1/responses)
The newer OpenAI Responses API provides a simplified interface with semantic streaming events.
Non-streaming
curl https://PROXY_URL/v1/responses \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"input": "Hello, world!"
}'
from openai import OpenAI
client = OpenAI(
base_url="https://PROXY_URL/v1",
api_key="YOUR_API_KEY",
)
response = client.responses.create(
model="gpt-4o",
input="Hello, world!",
)
print(response.output_text)
Streaming
curl https://PROXY_URL/v1/responses \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"input": "Hello, world!",
"stream": true
}'
from openai import OpenAI
client = OpenAI(
base_url="https://PROXY_URL/v1",
api_key="YOUR_API_KEY",
)
stream = client.responses.create(
model="gpt-4o",
input="Hello, world!",
stream=True,
)
for event in stream:
if event.type == "response.output_text.delta":
print(event.delta, end="", flush=True)
SSE format
Responses API streaming uses semantic event: plus data: lines. Each event has a named type describing what happened:
event: response.created
data: {"id":"resp_...","object":"response","status":"in_progress"}
event: response.output_item.added
data: {"item":{"id":"msg_...","type":"message","role":"assistant"}}
event: response.output_text.delta
data: {"delta":"Hello"}
event: response.output_text.delta
data: {"delta":" world"}
event: response.output_text.done
data: {"text":"Hello world"}
event: response.completed
data: {"id":"resp_...","status":"completed","usage":{"input_tokens":10,"output_tokens":5}}
Differences from Chat Completions
| Aspect | Chat Completions | Responses API |
|---|---|---|
| Input field | messages array | input (string or array) |
| Usage fields | prompt_tokens / completion_tokens | input_tokens / output_tokens |
| SSE format | Data-only (data: {...}) with data: [DONE] sentinel | Semantic events (event: response.created, etc.) |
| Stream usage | Opt-in via stream_options.include_usage | Always in response.completed event |
| Response access | response.choices[0].message.content | response.output_text |
Anthropic Messages API (/v1/messages)
For applications built with the Anthropic SDK. The gateway accepts standard Authorization: Bearer headers; the Anthropic-native x-api-key header is not required.
Non-streaming
curl https://PROXY_URL/v1/messages \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello, world!"}]
}'
from anthropic import Anthropic
client = Anthropic(
base_url="https://PROXY_URL/v1",
auth_token="YOUR_API_KEY",
)
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello, world!"}],
)
print(response.content[0].text)
Streaming
curl https://PROXY_URL/v1/messages \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello, world!"}],
"stream": true
}'
from anthropic import Anthropic
client = Anthropic(
base_url="https://PROXY_URL/v1",
auth_token="YOUR_API_KEY",
)
with client.messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello, world!"}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
SSE format
Anthropic Messages streaming uses semantic event: plus data: lines with block-level granularity:
event: message_start
data: {"type":"message_start","message":{"id":"msg_...","role":"assistant","model":"claude-sonnet-4-20250514"}}
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" world"}}
event: content_block_stop
data: {"type":"content_block_stop","index":0}
event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":5}}
event: message_stop
data: {"type":"message_stop"}
When using the gateway, authenticate with Authorization: Bearer YOUR_API_KEY instead of the Anthropic-native x-api-key header. The gateway translates the auth header before forwarding to the provider.
Supported endpoint types
The gateway handles eight endpoint types. All use the same model-based routing logic; fallback policies and traffic splitting apply equally across every endpoint.
| Endpoint | Path | Description |
|---|---|---|
| Chat Completions | /v1/chat/completions | Standard chat interface (OpenAI-compatible) |
| Completions | /v1/completions | Legacy text completions |
| Responses | /v1/responses | OpenAI Responses API |
| Messages | /v1/messages | Anthropic Messages API |
| Embeddings | /v1/embeddings | Text embeddings |
| Images | /v1/images | Image generation |
| Rerank | /v1/rerank | Reranking |
| Models | /v1/models | List available models |
Provider translation
The gateway automatically translates between the canonical (OpenAI-compatible) schema and 25+ provider-specific APIs. Applications send requests in one format, and the gateway handles all conversions transparently.
Translated elements:
- Request body: field names, structure, and defaults adjusted per provider
- Path: endpoint paths mapped to provider conventions
- Headers: authentication and provider-specific headers set automatically
- Response format: provider responses normalised back to the format that was requested
For example, an OpenAI Chat Completions request that routes to Anthropic Claude is translated to the Anthropic Messages format before being forwarded, and the response is translated back to Chat Completions format. The application never sees the difference.
No configuration is needed; translation is built into the gateway. For how errors are normalised across providers, see Gateway Behavior.
Protocols
The gateway supports REST (HTTPS) for all inference traffic. This is the only protocol needed to use any endpoint.
gRPC is used internally for OpenTelemetry (OTLP) telemetry export. See OpenTelemetry Export for the configuration surface. WebSocket and gRPC for inference are not currently supported.
Choosing an API format
| Use case | Recommended format |
|---|---|
| Widest SDK and tool compatibility | Chat Completions |
| New OpenAI projects with the simplified interface | Responses API |
| Anthropic Claude-native applications | Anthropic Messages |
| Agent frameworks (LangChain, CrewAI, and similar) | Chat Completions |
| Code assistants (Cursor, Cline, Aider) | Chat Completions |
| Streaming with semantic events | Responses API or Anthropic Messages |
All three formats support the same gateway features. The choice is driven by SDK preference and provider ecosystem alignment.
Related
- Integrate the Gateway with an App: developer-side integration patterns for each SDK
- Gateway Behavior: request handling, error semantics, and routing resolution rules
Where to go next