Platform capabilities

Agent Router passes through AI capabilities (structured outputs, multimodal inputs, and function calling) to any provider that supports them. Requests are forwarded as-is when the input format already matches the target provider's native schema, and translated automatically when cross-provider mappings exist.

Capabilities covered on this page

Structured outputs

Constrain model responses to a JSON schema. Supported in passthrough and (where available) translated modes across providers.

Go to section →

Multimodal

Image generation and vision inputs. Formats are translated automatically when routing across provider boundaries.

Go to section →

Function calling

Tool definitions and tool call responses forwarded unchanged. Full examples for OpenAI and Anthropic formats.

Go to section →

Structured outputs

Structured outputs constrain model responses to a specific JSON schema, guaranteeing that the model's reply conforms to a defined data shape. This eliminates the need to parse free-form text, reduces the risk of malformed responses in production, and makes AI outputs directly usable in downstream systems without an intermediate validation step.

Agent Router supports structured outputs in two modes:

Passthrough: the request is forwarded to the provider unchanged, because the request format already matches the provider's native structured output schema
Translated: Agent Router adapts the request to the target provider's structured output format when routing across provider boundaries

Input Format	Backend	Mode	Status
OpenAI `/v1/chat/completions`	OpenAI	Passthrough	Supported
OpenAI `/v1/chat/completions`	GCP Anthropic (Vertex)	Translated	Coming in next EAG release
Anthropic `/v1/messages`	Anthropic	Passthrough	Supported
Anthropic `/v1/messages`	GCP Anthropic (Vertex)	Passthrough	Supported

Example: OpenAI structured output

The following request asks the model to return a list of planets as a JSON object conforming to a strict schema. Setting "strict": true enables guaranteed schema adherence: the model will not return fields outside the defined schema.

curl https://PROXY_URL/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "List 3 planets"}],
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "planets",
        "strict": true,
        "schema": {
          "type": "object",
          "properties": {
            "planets": {
              "type": "array",
              "items": { "type": "string" }
            }
          },
          "required": ["planets"],
          "additionalProperties": false
        }
      }
    }
  }'

Anthropic Structured Outputs

When using the Anthropic Messages API (/v1/messages), use the output_config.format field for structured output configuration. The older output_format field is deprecated.

Cross-Provider Translation

Cross-provider structured output translation (e.g., sending an OpenAI-format structured output request that the gateway routes to an Anthropic backend) is being added in the next AI gateway release. Track progress in envoyproxy/ai-gateway#1846. Until this lands, structured outputs work in passthrough mode only; the request format must match the target provider's native schema.

Multimodal

Agent Router supports multimodal AI capabilities including image generation and vision (image understanding). Payloads are forwarded to the backend as-is, and Agent Router translates formats automatically when routing across providers, enabling, for example, an OpenAI-format vision request to be served by an Anthropic Claude backend without any changes to the client application.

Image generation

Use the /v1/images/generations endpoint to generate images from text prompts. Requests are routed to the configured image generation provider and model (e.g., OpenAI's gpt-image-1). The response contains the generated image as base64-encoded data.

curl -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TARE_API_KEY" \
  -d '{
    "model": "gpt-image-1",
    "prompt": "a serene mountain landscape at sunrise in watercolor",
    "size": "1024x1024",
    "n": 1
  }' \
  -X POST $TARE_BASE_URL/images/generations | jq -r '.data[0].b64_json' | base64 -d > output.png

The example above decodes the base64 payload and saves it to output.png:

Example generated image, watercolor mountain landscape at sunrise

Vision (image understanding)

To send an image to a model for analysis, include it in the messages array using the OpenAI content parts format: a content array that combines text items and image items in a single message. This works with any vision-capable model regardless of provider.

Images can be supplied as inline base64 data URIs (for local files) or as publicly accessible URLs.

Inline base64 (local file):

IMAGE_BASE64=$(base64 < image.png)

curl -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TARE_API_KEY" \
  -d '{
    "model": "claude-sonnet-4-5",
    "messages": [
      {
        "role": "user",
        "content": [
          { "type": "text", "text": "What is in this image?" },
          {
            "type": "image_url",
            "image_url": {
              "url": "data:image/png;base64,'"$IMAGE_BASE64"'"
            }
          }
        ]
      }
    ]
  }' \
  -X POST $TARE_BASE_URL/v1/chat/completions

URL reference (publicly accessible image):

curl -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TARE_API_KEY" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": [
          { "type": "text", "text": "Describe this image in detail." },
          {
            "type": "image_url",
            "image_url": { "url": "https://example.com/image.png" }
          }
        ]
      }
    ]
  }' \
  -X POST $TARE_BASE_URL/v1/chat/completions

Agent Router translates the content parts format automatically when routing across providers; an OpenAI-format vision request can be routed to an Anthropic Claude backend transparently. For a full list of models that support vision inputs, see the Model Catalog.

Function calling

Function calling (also called tool use) allows the model to request that your application execute a function and return the result. This enables the model to take actions and retrieve information it cannot access on its own, such as querying a database, calling an API, or reading a file. Agent Router forwards tool definitions and tool call responses unchanged to providers that support native function calling.

The interaction follows a two-round pattern:

Round 1: Your application sends a request with a tools array defining the available functions. If the model decides it needs to call a tool to answer the question, it returns a response with finish_reason: "tool_calls" (OpenAI) or stop_reason: "tool_use" (Anthropic) instead of a final answer.
Round 2: Your application executes the function, then sends a follow-up request containing the original messages, the model's tool call, and the function result. The model uses this context to generate its final response.

OpenAI format

Use the tools array and optional tool_choice parameter with the Chat Completions API.

Round 1, request with tool definition:

{
  "model": "gpt-4o",
  "messages": [{"role": "user", "content": "What's the weather in NYC?"}],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get the current weather for a city",
        "parameters": {
          "type": "object",
          "properties": { "location": { "type": "string" } },
          "required": ["location"]
        }
      }
    }
  ]
}

Round 1, model response requesting a tool call:

{
  "choices": [{
    "finish_reason": "tool_calls",
    "message": {
      "role": "assistant",
      "tool_calls": [{
        "id": "call_abc123",
        "type": "function",
        "function": {
          "name": "get_weather",
          "arguments": "{\"location\": \"NYC\"}"
        }
      }]
    }
  }]
}

Round 2, follow-up request with the function result:

{
  "model": "gpt-4o",
  "messages": [
    {"role": "user", "content": "What's the weather in NYC?"},
    {
      "role": "assistant",
      "tool_calls": [{"id": "call_abc123", "type": "function", "function": {"name": "get_weather", "arguments": "{\"location\": \"NYC\"}"}}]
    },
    {
      "role": "tool",
      "tool_call_id": "call_abc123",
      "content": "{\"temperature\": 72, \"condition\": \"sunny\"}"
    }
  ],
  "tools": [...]
}

Anthropic format

Use the tools array in the Anthropic Messages API (/v1/messages).

Round 1, request with tool definition:

{
  "model": "claude-sonnet-4-20250514",
  "max_tokens": 1024,
  "messages": [{"role": "user", "content": "What's the weather in NYC?"}],
  "tools": [
    {
      "name": "get_weather",
      "description": "Get the current weather for a city",
      "input_schema": {
        "type": "object",
        "properties": { "location": { "type": "string" } },
        "required": ["location"]
      }
    }
  ]
}

Round 1, model response requesting tool use:

{
  "stop_reason": "tool_use",
  "content": [{
    "type": "tool_use",
    "id": "toolu_01abc",
    "name": "get_weather",
    "input": {"location": "NYC"}
  }]
}

Round 2, follow-up request with the tool result:

{
  "model": "claude-sonnet-4-20250514",
  "max_tokens": 1024,
  "messages": [
    {"role": "user", "content": "What's the weather in NYC?"},
    {"role": "assistant", "content": [{"type": "tool_use", "id": "toolu_01abc", "name": "get_weather", "input": {"location": "NYC"}}]},
    {"role": "user", "content": [{"type": "tool_result", "tool_use_id": "toolu_01abc", "content": "{\"temperature\": 72, \"condition\": \"sunny\"}"}]}
  ],
  "tools": [...]
}

Function Calling vs. MCP

Native function calling (described here) is application-side: your code defines the tools, detects tool call responses, executes the function, and sends the result back in a second request. MCP Profiles provide gateway-side tool execution: Agent Router connects models to external MCP servers directly, so your application sends a single request and receives a final answer without implementing the tool execution loop.

Known limitations

Current constraints

Cross-provider structured output translation is not yet supported. Structured outputs currently work in passthrough mode only; the request format must match the target provider's native schema. Translation is coming in the next AI gateway release (envoyproxy/ai-gateway#1846).
Embeddings, images, and rerank endpoints accept requests in passthrough mode only. No provider translation is applied; the request must already match the target provider's native format.
Function calling cross-provider translation follows the same support matrix as the base API formats. Routing an OpenAI-format tool call request to an Anthropic backend is supported for the standard message format; check Supported APIs for current coverage.

What's next

Supported APIs

Full reference for all three API formats with streaming examples.

MCP Profiles

Gateway-side tool execution via MCP servers.

Model Catalog

Browse model capabilities and multimodal support across providers.

Structured outputs​

Example: OpenAI structured output​

Multimodal​

Image generation​

Vision (image understanding)​

Function calling​

OpenAI format​

Anthropic format​

Known limitations​

Structured outputs

Example: OpenAI structured output

Multimodal

Image generation

Vision (image understanding)

Function calling

OpenAI format

Anthropic format

Known limitations