Skip to main content

Supported APIs

Agent Router supports three API formats: the OpenAI Chat Completions API, the OpenAI Responses API, and the Anthropic Messages API. All gateway features (routing, fallback policies, traffic splitting, cost tracking, and observability) apply equally regardless of the format chosen. Applications send requests in one format, and the gateway handles provider translation transparently, normalising responses and errors back to the format that was requested. For new projects with no existing SDK preference, Chat Completions offers the widest ecosystem compatibility.


Endpoint summary

FormatPathSDK MethodStreaming
OpenAI Chat Completions/v1/chat/completionsclient.chat.completions.create()stream=True
OpenAI Responses/v1/responsesclient.responses.create()stream=True
Anthropic Messages/v1/messagesclient.messages.create()stream=True

In the examples below, replace PROXY_URL with the proxy endpoint shown on the Console Dashboard (for example, https://proxy.poc.tetrate.ai/v1) and YOUR_API_KEY with a key from API Keys.


Chat Completions API (/v1/chat/completions)

The most widely supported format, compatible with OpenAI and most third-party SDKs.

Non-streaming

curl https://PROXY_URL/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello, world!"}]
}'
from openai import OpenAI

client = OpenAI(
base_url="https://PROXY_URL/v1",
api_key="YOUR_API_KEY",
)

response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello, world!"}],
)

print(response.choices[0].message.content)

Streaming

curl https://PROXY_URL/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello, world!"}],
"stream": true
}'
from openai import OpenAI

client = OpenAI(
base_url="https://PROXY_URL/v1",
api_key="YOUR_API_KEY",
)

stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello, world!"}],
stream=True,
)

for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)

SSE format

Chat Completions streaming uses data-only SSE. Each event is a data: line containing a JSON object, terminated by data: [DONE]:

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"}}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{"content":" world"}}]}

data: [DONE]

To receive token usage in the stream, add "stream_options": {"include_usage": true} to the request. Usage appears in the final chunk before [DONE]:

{"prompt_tokens": 10, "completion_tokens": 5, "total_tokens": 15}

Responses API (/v1/responses)

The newer OpenAI Responses API provides a simplified interface with semantic streaming events.

Non-streaming

curl https://PROXY_URL/v1/responses \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"input": "Hello, world!"
}'
from openai import OpenAI

client = OpenAI(
base_url="https://PROXY_URL/v1",
api_key="YOUR_API_KEY",
)

response = client.responses.create(
model="gpt-4o",
input="Hello, world!",
)

print(response.output_text)

Streaming

curl https://PROXY_URL/v1/responses \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"input": "Hello, world!",
"stream": true
}'
from openai import OpenAI

client = OpenAI(
base_url="https://PROXY_URL/v1",
api_key="YOUR_API_KEY",
)

stream = client.responses.create(
model="gpt-4o",
input="Hello, world!",
stream=True,
)

for event in stream:
if event.type == "response.output_text.delta":
print(event.delta, end="", flush=True)

SSE format

Responses API streaming uses semantic event: plus data: lines. Each event has a named type describing what happened:

event: response.created
data: {"id":"resp_...","object":"response","status":"in_progress"}

event: response.output_item.added
data: {"item":{"id":"msg_...","type":"message","role":"assistant"}}

event: response.output_text.delta
data: {"delta":"Hello"}

event: response.output_text.delta
data: {"delta":" world"}

event: response.output_text.done
data: {"text":"Hello world"}

event: response.completed
data: {"id":"resp_...","status":"completed","usage":{"input_tokens":10,"output_tokens":5}}

Differences from Chat Completions

AspectChat CompletionsResponses API
Input fieldmessages arrayinput (string or array)
Usage fieldsprompt_tokens / completion_tokensinput_tokens / output_tokens
SSE formatData-only (data: {...}) with data: [DONE] sentinelSemantic events (event: response.created, etc.)
Stream usageOpt-in via stream_options.include_usageAlways in response.completed event
Response accessresponse.choices[0].message.contentresponse.output_text

Anthropic Messages API (/v1/messages)

For applications built with the Anthropic SDK. The gateway accepts standard Authorization: Bearer headers; the Anthropic-native x-api-key header is not required.

Non-streaming

curl https://PROXY_URL/v1/messages \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello, world!"}]
}'
from anthropic import Anthropic

client = Anthropic(
base_url="https://PROXY_URL/v1",
auth_token="YOUR_API_KEY",
)

response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello, world!"}],
)

print(response.content[0].text)

Streaming

curl https://PROXY_URL/v1/messages \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello, world!"}],
"stream": true
}'
from anthropic import Anthropic

client = Anthropic(
base_url="https://PROXY_URL/v1",
auth_token="YOUR_API_KEY",
)

with client.messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello, world!"}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)

SSE format

Anthropic Messages streaming uses semantic event: plus data: lines with block-level granularity:

event: message_start
data: {"type":"message_start","message":{"id":"msg_...","role":"assistant","model":"claude-sonnet-4-20250514"}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" world"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":5}}

event: message_stop
data: {"type":"message_stop"}
note

When using the gateway, authenticate with Authorization: Bearer YOUR_API_KEY instead of the Anthropic-native x-api-key header. The gateway translates the auth header before forwarding to the provider.


Supported endpoint types

The gateway handles eight endpoint types. All use the same model-based routing logic; fallback policies and traffic splitting apply equally across every endpoint.

EndpointPathDescription
Chat Completions/v1/chat/completionsStandard chat interface (OpenAI-compatible)
Completions/v1/completionsLegacy text completions
Responses/v1/responsesOpenAI Responses API
Messages/v1/messagesAnthropic Messages API
Embeddings/v1/embeddingsText embeddings
Images/v1/imagesImage generation
Rerank/v1/rerankReranking
Models/v1/modelsList available models

Provider translation

The gateway automatically translates between the canonical (OpenAI-compatible) schema and 25+ provider-specific APIs. Applications send requests in one format, and the gateway handles all conversions transparently.

Translated elements:

  • Request body: field names, structure, and defaults adjusted per provider
  • Path: endpoint paths mapped to provider conventions
  • Headers: authentication and provider-specific headers set automatically
  • Response format: provider responses normalised back to the format that was requested

For example, an OpenAI Chat Completions request that routes to Anthropic Claude is translated to the Anthropic Messages format before being forwarded, and the response is translated back to Chat Completions format. The application never sees the difference.

No configuration is needed; translation is built into the gateway. For how errors are normalised across providers, see Gateway Behavior.


Protocols

The gateway supports REST (HTTPS) for all inference traffic. This is the only protocol needed to use any endpoint.

gRPC is used internally for OpenTelemetry (OTLP) telemetry export. See OpenTelemetry Export for the configuration surface. WebSocket and gRPC for inference are not currently supported.


Choosing an API format

Use caseRecommended format
Widest SDK and tool compatibilityChat Completions
New OpenAI projects with the simplified interfaceResponses API
Anthropic Claude-native applicationsAnthropic Messages
Agent frameworks (LangChain, CrewAI, and similar)Chat Completions
Code assistants (Cursor, Cline, Aider)Chat Completions
Streaming with semantic eventsResponses API or Anthropic Messages

All three formats support the same gateway features. The choice is driven by SDK preference and provider ecosystem alignment.