Platform capabilities
Agent Router passes through AI capabilities (structured outputs, multimodal inputs, and function calling) to any provider that supports them. Requests are forwarded as-is when the input format already matches the target provider's native schema, and translated automatically when cross-provider mappings exist.
Capabilities covered on this page
Structured outputs
Constrain model responses to a JSON schema. Supported in passthrough and (where available) translated modes across providers.
Go to section →Multimodal
Image generation and vision inputs. Formats are translated automatically when routing across provider boundaries.
Go to section →Function calling
Tool definitions and tool call responses forwarded unchanged. Full examples for OpenAI and Anthropic formats.
Go to section →Structured outputs
Structured outputs
Structured outputs constrain model responses to a specific JSON schema, guaranteeing that the model's reply conforms to a defined data shape. This eliminates the need to parse free-form text, reduces the risk of malformed responses in production, and makes AI outputs directly usable in downstream systems without an intermediate validation step.
Agent Router supports structured outputs in two modes:
- Passthrough: the request is forwarded to the provider unchanged, because the request format already matches the provider's native structured output schema
- Translated: Agent Router adapts the request to the target provider's structured output format when routing across provider boundaries
| Input Format | Backend | Mode | Status |
|---|---|---|---|
OpenAI /v1/chat/completions | OpenAI | Passthrough | Supported |
OpenAI /v1/chat/completions | GCP Anthropic (Vertex) | Translated | Coming in next EAG release |
Anthropic /v1/messages | Anthropic | Passthrough | Supported |
Anthropic /v1/messages | GCP Anthropic (Vertex) | Passthrough | Supported |
Example: OpenAI structured output
The following request asks the model to return a list of planets as a JSON object conforming to a strict schema. Setting "strict": true enables guaranteed schema adherence: the model will not return fields outside the defined schema.
curl https://PROXY_URL/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "List 3 planets"}],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "planets",
"strict": true,
"schema": {
"type": "object",
"properties": {
"planets": {
"type": "array",
"items": { "type": "string" }
}
},
"required": ["planets"],
"additionalProperties": false
}
}
}
}'
When using the Anthropic Messages API (/v1/messages), use the output_config.format field for structured output configuration. The older output_format field is deprecated.
Cross-provider structured output translation (e.g., sending an OpenAI-format structured output request that the gateway routes to an Anthropic backend) is being added in the next AI gateway release. Track progress in envoyproxy/ai-gateway#1846. Until this lands, structured outputs work in passthrough mode only; the request format must match the target provider's native schema.
Multimodal
Multimodal
Agent Router supports multimodal AI capabilities including image generation and vision (image understanding). Payloads are forwarded to the backend as-is, and Agent Router translates formats automatically when routing across providers, enabling, for example, an OpenAI-format vision request to be served by an Anthropic Claude backend without any changes to the client application.
Image generation
Use the /v1/images/generations endpoint to generate images from text prompts. Requests are routed to the configured image generation provider and model (e.g., OpenAI's gpt-image-1). The response contains the generated image as base64-encoded data.
curl -H "Content-Type: application/json" \
-H "Authorization: Bearer $TARE_API_KEY" \
-d '{
"model": "gpt-image-1",
"prompt": "a serene mountain landscape at sunrise in watercolor",
"size": "1024x1024",
"n": 1
}' \
-X POST $TARE_BASE_URL/images/generations | jq -r '.data[0].b64_json' | base64 -d > output.png
The example above decodes the base64 payload and saves it to output.png:

Vision (image understanding)
To send an image to a model for analysis, include it in the messages array using the OpenAI content parts format: a content array that combines text items and image items in a single message. This works with any vision-capable model regardless of provider.
Images can be supplied as inline base64 data URIs (for local files) or as publicly accessible URLs.
Inline base64 (local file):
IMAGE_BASE64=$(base64 < image.png)
curl -H "Content-Type: application/json" \
-H "Authorization: Bearer $TARE_API_KEY" \
-d '{
"model": "claude-sonnet-4-5",
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "What is in this image?" },
{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,'"$IMAGE_BASE64"'"
}
}
]
}
]
}' \
-X POST $TARE_BASE_URL/v1/chat/completions
URL reference (publicly accessible image):
curl -H "Content-Type: application/json" \
-H "Authorization: Bearer $TARE_API_KEY" \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "Describe this image in detail." },
{
"type": "image_url",
"image_url": { "url": "https://example.com/image.png" }
}
]
}
]
}' \
-X POST $TARE_BASE_URL/v1/chat/completions
Agent Router translates the content parts format automatically when routing across providers; an OpenAI-format vision request can be routed to an Anthropic Claude backend transparently. For a full list of models that support vision inputs, see the Model Catalog.
Function calling
Function calling
Function calling (also called tool use) allows the model to request that your application execute a function and return the result. This enables the model to take actions and retrieve information it cannot access on its own, such as querying a database, calling an API, or reading a file. Agent Router forwards tool definitions and tool call responses unchanged to providers that support native function calling.
The interaction follows a two-round pattern:
- Round 1: Your application sends a request with a
toolsarray defining the available functions. If the model decides it needs to call a tool to answer the question, it returns a response withfinish_reason: "tool_calls"(OpenAI) orstop_reason: "tool_use"(Anthropic) instead of a final answer. - Round 2: Your application executes the function, then sends a follow-up request containing the original messages, the model's tool call, and the function result. The model uses this context to generate its final response.
OpenAI format
Use the tools array and optional tool_choice parameter with the Chat Completions API.
Round 1, request with tool definition:
{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "What's the weather in NYC?"}],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a city",
"parameters": {
"type": "object",
"properties": { "location": { "type": "string" } },
"required": ["location"]
}
}
}
]
}
Round 1, model response requesting a tool call:
{
"choices": [{
"finish_reason": "tool_calls",
"message": {
"role": "assistant",
"tool_calls": [{
"id": "call_abc123",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\": \"NYC\"}"
}
}]
}
}]
}
Round 2, follow-up request with the function result:
{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "What's the weather in NYC?"},
{
"role": "assistant",
"tool_calls": [{"id": "call_abc123", "type": "function", "function": {"name": "get_weather", "arguments": "{\"location\": \"NYC\"}"}}]
},
{
"role": "tool",
"tool_call_id": "call_abc123",
"content": "{\"temperature\": 72, \"condition\": \"sunny\"}"
}
],
"tools": [...]
}
Anthropic format
Use the tools array in the Anthropic Messages API (/v1/messages).
Round 1, request with tool definition:
{
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "What's the weather in NYC?"}],
"tools": [
{
"name": "get_weather",
"description": "Get the current weather for a city",
"input_schema": {
"type": "object",
"properties": { "location": { "type": "string" } },
"required": ["location"]
}
}
]
}
Round 1, model response requesting tool use:
{
"stop_reason": "tool_use",
"content": [{
"type": "tool_use",
"id": "toolu_01abc",
"name": "get_weather",
"input": {"location": "NYC"}
}]
}
Round 2, follow-up request with the tool result:
{
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "What's the weather in NYC?"},
{"role": "assistant", "content": [{"type": "tool_use", "id": "toolu_01abc", "name": "get_weather", "input": {"location": "NYC"}}]},
{"role": "user", "content": [{"type": "tool_result", "tool_use_id": "toolu_01abc", "content": "{\"temperature\": 72, \"condition\": \"sunny\"}"}]}
],
"tools": [...]
}
Native function calling (described here) is application-side: your code defines the tools, detects tool call responses, executes the function, and sends the result back in a second request. MCP Profiles provide gateway-side tool execution: Agent Router connects models to external MCP servers directly, so your application sends a single request and receives a final answer without implementing the tool execution loop.
Known limitations
Known limitations
Current constraints
- Cross-provider structured output translation is not yet supported. Structured outputs currently work in passthrough mode only; the request format must match the target provider's native schema. Translation is coming in the next AI gateway release (envoyproxy/ai-gateway#1846).
- Embeddings, images, and rerank endpoints accept requests in passthrough mode only. No provider translation is applied; the request must already match the target provider's native format.
- Function calling cross-provider translation follows the same support matrix as the base API formats. Routing an OpenAI-format tool call request to an Anthropic backend is supported for the standard message format; check Supported APIs for current coverage.
What's next