Skip to main content

Gateway behavior

The gateway provides consistent, predictable behaviour across all AI providers. It normalises provider errors into a single format, attaches correlation IDs to every request, and emits structured observability data regardless of which upstream provider handles the request. This page covers the error semantics, the correlation-ID model, and the data captured per request.


Error handling

Every AI provider has its own error schema, status code semantics, and message vocabulary. When a request routes through the gateway (whether to a primary provider, a fallback target, or one leg of a traffic split), every error response is normalised into a single OpenAI-compatible format before being returned to the client. A single error-handling path is sufficient on the application side even as providers change behind the scenes.

Error response format

All errors, whether they originate at the gateway itself or at an upstream provider, are returned with the following JSON structure:

{
"error": {
"message": "A human-readable description of the error",
"type": "error_type",
"param": null,
"code": "error_code"
}
}
FieldDescription
messageA human-readable explanation. Useful for logging and surfacing context during development.
typeA machine-readable error category (rate_limit_error, timeout_error, invalid_request_error, and similar). Use this field for programmatic error handling.
paramThe specific request parameter that caused the error, when applicable. Often null for provider-level and gateway-level errors.
codeA specific error code within the error type. Distinguishes subtypes (rate_limit_exceeded vs. model_not_found, for example).

Error origin

Errors originate at two distinct points in the request lifecycle:

  • Gateway errors are produced by the gateway before the request reaches a provider. These include authentication failures, malformed request bodies, unknown model names, and policy violations. HTTP status codes are typically in the 4xx range.
  • Provider errors are returned by an upstream provider and normalised by the gateway before being forwarded to the client. These include rate limits, model overload conditions, and provider outages. The HTTP status code reflects the nature of the provider failure.

In both cases, the response body uses the same JSON format and the same X-Request-ID correlation header is present.

HTTP status codes

HTTP StatusMeaningTypical cause
400Bad RequestMalformed request body, unsupported parameters, or missing required fields
401UnauthorizedInvalid or missing API key
403ForbiddenAPI key lacks permission for the requested model or endpoint
404Not FoundUnknown endpoint path or model not present in the catalogue
429Too Many RequestsRate limit exceeded at the client level or by the upstream provider
500Internal Server ErrorUnexpected error within the gateway
502Bad GatewayThe upstream provider returned an invalid or unparseable response
503Service UnavailableThe upstream provider is temporarily unavailable or returning server errors
504Gateway TimeoutThe upstream provider did not respond within the configured timeout window

Retryable vs. non-retryable errors

StatusRetryable?Recommended action
400NoInspect the message and param fields; fix the request before retrying
401NoVerify and rotate the API key
403NoConfirm the model or endpoint is enabled for the API key
404NoCheck the model name against the model catalogue
429YesRetry with exponential backoff; honour any Retry-After header if present
500MaybeRetry once; if the error persists, investigate using Request Logs
502YesRetry; the provider response was malformed but may succeed on a subsequent attempt
503YesRetry with backoff; the provider is temporarily unavailable
504YesRetry; consider adjusting timeout settings if this error occurs consistently
tip

When fallback policies are configured, the gateway automatically retries retryable errors (5xx and 429) against the next provider in the fallback chain, transparent to the calling application. The application only receives an error response if every provider in the chain is exhausted or if the error is non-retryable.

Error examples

Upstream rate limit (provider returns 429):

{
"error": {
"message": "Rate limit exceeded. Please retry after 30 seconds.",
"type": "rate_limit_error",
"param": null,
"code": "rate_limit_exceeded"
}
}

Provider timeout (no response within the timeout window):

{
"error": {
"message": "Upstream provider did not respond in time.",
"type": "timeout_error",
"param": null,
"code": "timeout"
}
}

Invalid or missing API key:

{
"error": {
"message": "Invalid API key. Please check your credentials.",
"type": "authentication_error",
"param": null,
"code": "invalid_api_key"
}
}

Model not present in the catalogue:

{
"error": {
"message": "The model 'example-model' is not available in this gateway.",
"type": "not_found_error",
"param": "model",
"code": "model_not_found"
}
}

Streaming errors

For streaming requests, the gateway attempts to detect and translate provider errors before the stream is opened. If a provider returns an error mid-stream, the gateway closes the stream and, where possible, emits a final error event. Client applications should handle stream interruptions and check for error payloads at stream termination.


Request tracking

The gateway attaches correlation IDs to every request. These IDs link the HTTP response the application receives to the detailed record stored in Request Logs and to the spans emitted to the OpenTelemetry backend.

Correlation headers

HeaderDirectionDescription
X-Request-IDResponseA gateway-generated UUID attached to every response. The primary identifier for looking up the request in Request Logs and OTel traces.
X-Client-Request-IDResponseEchoed back from the X-Request-ID header sent by the client, if present. Allows correlation between gateway records and application-side request identifiers.

When a client sends an X-Request-ID header, the gateway preserves it as X-Client-Request-ID in the response and generates its own X-Request-ID. Both IDs appear in Request Logs and OpenTelemetry traces.

Sending a correlation ID

curl -i https://PROXY_URL/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-H "X-Request-ID: my-session-abc-123" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello"}]
}'

Response headers:

HTTP/2 200
x-request-id: 8f14e45f-ceea-467f-a8f0-6b2a1c3d4e5f
x-client-request-id: my-session-abc-123
content-type: application/json

The x-request-id value (8f14e45f-...) is the gateway's canonical identifier. The x-client-request-id (my-session-abc-123) is the original client ID, preserved for joining gateway records with application-side logs.

Data captured per request

Every request processed by the gateway produces a record containing:

FieldDescription
Request IDThe gateway-assigned X-Request-ID UUID
Client Request IDThe echoed X-Client-Request-ID, if provided by the client
TimestampWhen the request was received by the gateway
ModelThe model identifier specified in the request
ProviderThe upstream provider that served the response
HTTP StatusThe final HTTP status code returned to the client
LatencyTotal round-trip time from gateway receipt to completed response
Token usagePrompt tokens, completion tokens, and total tokens
Request / response bodyFull payloads, subject to the deployment's data retention settings

This data is searchable in Request Logs and is also available as OpenTelemetry span attributes in the tracing backend.

Correlating in observability tools

  • Request Logs. Search by X-Request-ID (gateway-assigned) or X-Client-Request-ID (the application's ID) to retrieve the complete request record: provider used, latency breakdown, token counts, and raw request and response payloads.
  • OTel traces. Both IDs are emitted as span attributes on every trace. Search for them in the tracing backend (Jaeger, Grafana Tempo, Honeycomb, Datadog, and others) to view the full execution path including any fallback retries.
tip

Custom application-defined headers (agent-session-id, user-id, workflow-run-id, and similar) are forwarded as OpenTelemetry span attributes. This allows traces to be grouped or filtered by any application-level concept (conversation ID, agent run, team, deployment) without affecting how the gateway routes requests.

Debugging workflow

A typical end-to-end debugging workflow using correlation IDs:

  1. Capture the response header. Log the x-request-id value from every response in the application. For error responses, also capture the response body.
  2. Search Request Logs. Paste the x-request-id value into the search field in Request Logs to retrieve the full request record: which provider was used, total latency, token counts, fallback attempts, and raw payloads.
  3. Inspect the trace. If OTel export is configured, search for the same x-request-id as a span attribute in the tracing backend to see provider-level timing and any retry hops.
  4. Join with application logs. If a custom X-Request-ID was sent, use the echoed x-client-request-id in the gateway record to join against application logs and reconstruct the full request context.