Gateway behavior

The gateway provides consistent, predictable behaviour across all AI providers. It normalises provider errors into a single format, attaches correlation IDs to every request, and emits structured observability data regardless of which upstream provider handles the request. This page covers the error semantics, the correlation-ID model, and the data captured per request.

Error handling

Every AI provider has its own error schema, status code semantics, and message vocabulary. When a request routes through the gateway (whether to a primary provider, a fallback target, or one leg of a traffic split), every error response is normalised into a single OpenAI-compatible format before being returned to the client. A single error-handling path is sufficient on the application side even as providers change behind the scenes.

Error response format

All errors, whether they originate at the gateway itself or at an upstream provider, are returned with the following JSON structure:

{
  "error": {
    "message": "A human-readable description of the error",
    "type": "error_type",
    "param": null,
    "code": "error_code"
  }
}

Field	Description
`message`	A human-readable explanation. Useful for logging and surfacing context during development.
`type`	A machine-readable error category (`rate_limit_error`, `timeout_error`, `invalid_request_error`, and similar). Use this field for programmatic error handling.
`param`	The specific request parameter that caused the error, when applicable. Often `null` for provider-level and gateway-level errors.
`code`	A specific error code within the error type. Distinguishes subtypes (`rate_limit_exceeded` vs. `model_not_found`, for example).

Error origin

Errors originate at two distinct points in the request lifecycle:

Gateway errors are produced by the gateway before the request reaches a provider. These include authentication failures, malformed request bodies, unknown model names, and policy violations. HTTP status codes are typically in the 4xx range.
Provider errors are returned by an upstream provider and normalised by the gateway before being forwarded to the client. These include rate limits, model overload conditions, and provider outages. The HTTP status code reflects the nature of the provider failure.

In both cases, the response body uses the same JSON format and the same X-Request-ID correlation header is present.

HTTP status codes

HTTP Status	Meaning	Typical cause
400	Bad Request	Malformed request body, unsupported parameters, or missing required fields
401	Unauthorized	Invalid or missing API key
403	Forbidden	API key lacks permission for the requested model or endpoint
404	Not Found	Unknown endpoint path or model not present in the catalogue
429	Too Many Requests	Rate limit exceeded at the client level or by the upstream provider
500	Internal Server Error	Unexpected error within the gateway
502	Bad Gateway	The upstream provider returned an invalid or unparseable response
503	Service Unavailable	The upstream provider is temporarily unavailable or returning server errors
504	Gateway Timeout	The upstream provider did not respond within the configured timeout window

Retryable vs. non-retryable errors

Status	Retryable?	Recommended action
400	No	Inspect the `message` and `param` fields; fix the request before retrying
401	No	Verify and rotate the API key
403	No	Confirm the model or endpoint is enabled for the API key
404	No	Check the model name against the model catalogue
429	Yes	Retry with exponential backoff; honour any `Retry-After` header if present
500	Maybe	Retry once; if the error persists, investigate using Request Logs
502	Yes	Retry; the provider response was malformed but may succeed on a subsequent attempt
503	Yes	Retry with backoff; the provider is temporarily unavailable
504	Yes	Retry; consider adjusting timeout settings if this error occurs consistently

tip

When fallback policies are configured, the gateway automatically retries retryable errors (5xx and 429) against the next provider in the fallback chain, transparent to the calling application. The application only receives an error response if every provider in the chain is exhausted or if the error is non-retryable.

Error examples

Upstream rate limit (provider returns 429):

{
  "error": {
    "message": "Rate limit exceeded. Please retry after 30 seconds.",
    "type": "rate_limit_error",
    "param": null,
    "code": "rate_limit_exceeded"
  }
}

Provider timeout (no response within the timeout window):

{
  "error": {
    "message": "Upstream provider did not respond in time.",
    "type": "timeout_error",
    "param": null,
    "code": "timeout"
  }
}

Invalid or missing API key:

{
  "error": {
    "message": "Invalid API key. Please check your credentials.",
    "type": "authentication_error",
    "param": null,
    "code": "invalid_api_key"
  }
}

Model not present in the catalogue:

{
  "error": {
    "message": "The model 'example-model' is not available in this gateway.",
    "type": "not_found_error",
    "param": "model",
    "code": "model_not_found"
  }
}

Streaming errors

For streaming requests, the gateway attempts to detect and translate provider errors before the stream is opened. If a provider returns an error mid-stream, the gateway closes the stream and, where possible, emits a final error event. Client applications should handle stream interruptions and check for error payloads at stream termination.

Request tracking

The gateway attaches correlation IDs to every request. These IDs link the HTTP response the application receives to the detailed record stored in Request Logs and to the spans emitted to the OpenTelemetry backend.

Correlation headers

Header	Direction	Description
`X-Request-ID`	Response	A gateway-generated UUID attached to every response. The primary identifier for looking up the request in Request Logs and OTel traces.
`X-Client-Request-ID`	Response	Echoed back from the `X-Request-ID` header sent by the client, if present. Allows correlation between gateway records and application-side request identifiers.

When a client sends an X-Request-ID header, the gateway preserves it as X-Client-Request-ID in the response and generates its own X-Request-ID. Both IDs appear in Request Logs and OpenTelemetry traces.

Sending a correlation ID

curl -i https://PROXY_URL/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -H "X-Request-ID: my-session-abc-123" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Response headers:

HTTP/2 200
x-request-id: 8f14e45f-ceea-467f-a8f0-6b2a1c3d4e5f
x-client-request-id: my-session-abc-123
content-type: application/json

The x-request-id value (8f14e45f-...) is the gateway's canonical identifier. The x-client-request-id (my-session-abc-123) is the original client ID, preserved for joining gateway records with application-side logs.

Data captured per request

Every request processed by the gateway produces a record containing:

Field	Description
Request ID	The gateway-assigned `X-Request-ID` UUID
Client Request ID	The echoed `X-Client-Request-ID`, if provided by the client
Timestamp	When the request was received by the gateway
Model	The model identifier specified in the request
Provider	The upstream provider that served the response
HTTP Status	The final HTTP status code returned to the client
Latency	Total round-trip time from gateway receipt to completed response
Token usage	Prompt tokens, completion tokens, and total tokens
Request / response body	Full payloads, subject to the deployment's data retention settings

This data is searchable in Request Logs and is also available as OpenTelemetry span attributes in the tracing backend.

Correlating in observability tools

Request Logs. Search by X-Request-ID (gateway-assigned) or X-Client-Request-ID (the application's ID) to retrieve the complete request record: provider used, latency breakdown, token counts, and raw request and response payloads.
OTel traces. Both IDs are emitted as span attributes on every trace. Search for them in the tracing backend (Jaeger, Grafana Tempo, Honeycomb, Datadog, and others) to view the full execution path including any fallback retries.

tip

Custom application-defined headers (agent-session-id, user-id, workflow-run-id, and similar) are forwarded as OpenTelemetry span attributes. This allows traces to be grouped or filtered by any application-level concept (conversation ID, agent run, team, deployment) without affecting how the gateway routes requests.

Debugging workflow

A typical end-to-end debugging workflow using correlation IDs:

Capture the response header. Log the x-request-id value from every response in the application. For error responses, also capture the response body.
Search Request Logs. Paste the x-request-id value into the search field in Request Logs to retrieve the full request record: which provider was used, total latency, token counts, fallback attempts, and raw payloads.
Inspect the trace. If OTel export is configured, search for the same x-request-id as a span attribute in the tracing backend to see provider-level timing and any retry hops.
Join with application logs. If a custom X-Request-ID was sent, use the echoed x-client-request-id in the gateway record to join against application logs and reconstruct the full request context.

Supported APIs: API formats and endpoint reference
OTel Metrics: metric names, types, and labels exported by the gateway
Audit Log Events: event schema for administrative actions

Where to go next

Supported APIs

API formats and endpoint reference.

Monitor traffic and usage

Find requests in Request Logs by correlation ID.

Error handling​

Error response format​

Error origin​

HTTP status codes​

Retryable vs. non-retryable errors​

Error examples​

Streaming errors​

Request tracking​

Correlation headers​

Sending a correlation ID​

Data captured per request​

Correlating in observability tools​

Debugging workflow​

Related​