Gateway behavior
The gateway provides consistent, predictable behaviour across all AI providers. It normalises provider errors into a single format, attaches correlation IDs to every request, and emits structured observability data regardless of which upstream provider handles the request. This page covers the error semantics, the correlation-ID model, and the data captured per request.
Error handling
Every AI provider has its own error schema, status code semantics, and message vocabulary. When a request routes through the gateway (whether to a primary provider, a fallback target, or one leg of a traffic split), every error response is normalised into a single OpenAI-compatible format before being returned to the client. A single error-handling path is sufficient on the application side even as providers change behind the scenes.
Error response format
All errors, whether they originate at the gateway itself or at an upstream provider, are returned with the following JSON structure:
{
"error": {
"message": "A human-readable description of the error",
"type": "error_type",
"param": null,
"code": "error_code"
}
}
| Field | Description |
|---|---|
message | A human-readable explanation. Useful for logging and surfacing context during development. |
type | A machine-readable error category (rate_limit_error, timeout_error, invalid_request_error, and similar). Use this field for programmatic error handling. |
param | The specific request parameter that caused the error, when applicable. Often null for provider-level and gateway-level errors. |
code | A specific error code within the error type. Distinguishes subtypes (rate_limit_exceeded vs. model_not_found, for example). |
Error origin
Errors originate at two distinct points in the request lifecycle:
- Gateway errors are produced by the gateway before the request reaches a provider. These include authentication failures, malformed request bodies, unknown model names, and policy violations. HTTP status codes are typically in the 4xx range.
- Provider errors are returned by an upstream provider and normalised by the gateway before being forwarded to the client. These include rate limits, model overload conditions, and provider outages. The HTTP status code reflects the nature of the provider failure.
In both cases, the response body uses the same JSON format and the same X-Request-ID correlation header is present.
HTTP status codes
| HTTP Status | Meaning | Typical cause |
|---|---|---|
| 400 | Bad Request | Malformed request body, unsupported parameters, or missing required fields |
| 401 | Unauthorized | Invalid or missing API key |
| 403 | Forbidden | API key lacks permission for the requested model or endpoint |
| 404 | Not Found | Unknown endpoint path or model not present in the catalogue |
| 429 | Too Many Requests | Rate limit exceeded at the client level or by the upstream provider |
| 500 | Internal Server Error | Unexpected error within the gateway |
| 502 | Bad Gateway | The upstream provider returned an invalid or unparseable response |
| 503 | Service Unavailable | The upstream provider is temporarily unavailable or returning server errors |
| 504 | Gateway Timeout | The upstream provider did not respond within the configured timeout window |
Retryable vs. non-retryable errors
| Status | Retryable? | Recommended action |
|---|---|---|
| 400 | No | Inspect the message and param fields; fix the request before retrying |
| 401 | No | Verify and rotate the API key |
| 403 | No | Confirm the model or endpoint is enabled for the API key |
| 404 | No | Check the model name against the model catalogue |
| 429 | Yes | Retry with exponential backoff; honour any Retry-After header if present |
| 500 | Maybe | Retry once; if the error persists, investigate using Request Logs |
| 502 | Yes | Retry; the provider response was malformed but may succeed on a subsequent attempt |
| 503 | Yes | Retry with backoff; the provider is temporarily unavailable |
| 504 | Yes | Retry; consider adjusting timeout settings if this error occurs consistently |
When fallback policies are configured, the gateway automatically retries retryable errors (5xx and 429) against the next provider in the fallback chain, transparent to the calling application. The application only receives an error response if every provider in the chain is exhausted or if the error is non-retryable.
Error examples
Upstream rate limit (provider returns 429):
{
"error": {
"message": "Rate limit exceeded. Please retry after 30 seconds.",
"type": "rate_limit_error",
"param": null,
"code": "rate_limit_exceeded"
}
}
Provider timeout (no response within the timeout window):
{
"error": {
"message": "Upstream provider did not respond in time.",
"type": "timeout_error",
"param": null,
"code": "timeout"
}
}
Invalid or missing API key:
{
"error": {
"message": "Invalid API key. Please check your credentials.",
"type": "authentication_error",
"param": null,
"code": "invalid_api_key"
}
}
Model not present in the catalogue:
{
"error": {
"message": "The model 'example-model' is not available in this gateway.",
"type": "not_found_error",
"param": "model",
"code": "model_not_found"
}
}
Streaming errors
For streaming requests, the gateway attempts to detect and translate provider errors before the stream is opened. If a provider returns an error mid-stream, the gateway closes the stream and, where possible, emits a final error event. Client applications should handle stream interruptions and check for error payloads at stream termination.
Request tracking
The gateway attaches correlation IDs to every request. These IDs link the HTTP response the application receives to the detailed record stored in Request Logs and to the spans emitted to the OpenTelemetry backend.
Correlation headers
| Header | Direction | Description |
|---|---|---|
X-Request-ID | Response | A gateway-generated UUID attached to every response. The primary identifier for looking up the request in Request Logs and OTel traces. |
X-Client-Request-ID | Response | Echoed back from the X-Request-ID header sent by the client, if present. Allows correlation between gateway records and application-side request identifiers. |
When a client sends an X-Request-ID header, the gateway preserves it as X-Client-Request-ID in the response and generates its own X-Request-ID. Both IDs appear in Request Logs and OpenTelemetry traces.
Sending a correlation ID
curl -i https://PROXY_URL/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-H "X-Request-ID: my-session-abc-123" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello"}]
}'
Response headers:
HTTP/2 200
x-request-id: 8f14e45f-ceea-467f-a8f0-6b2a1c3d4e5f
x-client-request-id: my-session-abc-123
content-type: application/json
The x-request-id value (8f14e45f-...) is the gateway's canonical identifier. The x-client-request-id (my-session-abc-123) is the original client ID, preserved for joining gateway records with application-side logs.
Data captured per request
Every request processed by the gateway produces a record containing:
| Field | Description |
|---|---|
| Request ID | The gateway-assigned X-Request-ID UUID |
| Client Request ID | The echoed X-Client-Request-ID, if provided by the client |
| Timestamp | When the request was received by the gateway |
| Model | The model identifier specified in the request |
| Provider | The upstream provider that served the response |
| HTTP Status | The final HTTP status code returned to the client |
| Latency | Total round-trip time from gateway receipt to completed response |
| Token usage | Prompt tokens, completion tokens, and total tokens |
| Request / response body | Full payloads, subject to the deployment's data retention settings |
This data is searchable in Request Logs and is also available as OpenTelemetry span attributes in the tracing backend.
Correlating in observability tools
- Request Logs. Search by
X-Request-ID(gateway-assigned) orX-Client-Request-ID(the application's ID) to retrieve the complete request record: provider used, latency breakdown, token counts, and raw request and response payloads. - OTel traces. Both IDs are emitted as span attributes on every trace. Search for them in the tracing backend (Jaeger, Grafana Tempo, Honeycomb, Datadog, and others) to view the full execution path including any fallback retries.
Custom application-defined headers (agent-session-id, user-id, workflow-run-id, and similar) are forwarded as OpenTelemetry span attributes. This allows traces to be grouped or filtered by any application-level concept (conversation ID, agent run, team, deployment) without affecting how the gateway routes requests.
Debugging workflow
A typical end-to-end debugging workflow using correlation IDs:
- Capture the response header. Log the
x-request-idvalue from every response in the application. For error responses, also capture the response body. - Search Request Logs. Paste the
x-request-idvalue into the search field in Request Logs to retrieve the full request record: which provider was used, total latency, token counts, fallback attempts, and raw payloads. - Inspect the trace. If OTel export is configured, search for the same
x-request-idas a span attribute in the tracing backend to see provider-level timing and any retry hops. - Join with application logs. If a custom
X-Request-IDwas sent, use the echoedx-client-request-idin the gateway record to join against application logs and reconstruct the full request context.
Related
- Supported APIs: API formats and endpoint reference
- OTel Metrics: metric names, types, and labels exported by the gateway
- Audit Log Events: event schema for administrative actions
Where to go next