Skip to main content

Manage budgets for users and groups

Spend questions rarely arrive as questions about API keys. They arrive as questions about people and the parts of the business those people belong to. A finance partner wants to know what the data-science organisation spent last month, not what a particular service credential consumed. A business owner wants a ceiling on a department's monthly AI bill, expressed in money rather than tokens. A platform operator, sitting between the two, needs a way to answer both at the dimension the conversation actually happens in: the user and the group.


Working with budgets establishes the per-key budgeting discipline: rate limits as the inline ceiling, usage analytics as the after-the-fact view, and the API-key-per-purpose convention that keeps both precise. This guide extends that same discipline to the user and group dimensions. The mechanisms are the ones already described there; what changes is the unit of analysis, the reporting that finance and business owners actually consume, and the decision about whether a budget should warn or block when it is exceeded. The work assumes Tetrate Agent Router is already in operation, that the per-key conventions are in place, and that group identity flows in from the organisation's identity provider.

Persona: Platform operator working in the Admin Dashboard, in partnership with the finance and business owners who set the spend ceilings and consume the reports.

Estimated time: 20--30 minutes for an initial setup with one or two groups; ongoing as the group structure and workloads evolve.

When this guide applies

This guide is the right surface whenever spend needs to be tracked, capped, or reported against people and organisational units rather than against individual credentials:

SituationApproach that fits
A department's monthly AI spend should stay under a ceiling agreed with financeA group budget with the appropriate enforcement mode
A single user's consumption should be bounded regardless of how many keys they holdA per-user budget aggregating across that user's keys
Finance needs a monthly statement of spend per business function for chargebackGroup-based consumption reports, exported per billing cycle
A business owner wants to be warned, not blocked, when a team approaches its ceilingA group budget in soft-enforcement mode with alerting
A capped evaluation budget for a team must hard-stop when exhaustedA group budget in hard-enforcement mode
Spend must be attributed to cost centres that do not map cleanly to individual keysGroup budgets aligned to the business-function group mapping

For the per-credential view (bounding a single integration, a leaked key, or a CI pipeline) the per-key patterns in Working with budgets remain the right tool. The two dimensions are complementary: a key budget bounds a credential, a user budget bounds a person, and a group budget bounds an organisational unit.

Outcomes

By the end of this guide:

  • Usage and cost, in both tokens and money, can be read per user and per group over a chosen period.
  • At least one per-user and one per-group monthly budget limit has been set, with an enforcement mode chosen deliberately.
  • The distinction between soft enforcement (warn and monitor) and hard enforcement (block on exhaustion) is understood, along with when each is appropriate.
  • A user- or group-based consumption report has been produced and exported in a form suitable for chargeback or showback.
  • The relationship between budgets and rate limits, a spend cap versus flow control, is clear, and the two are configured to complement rather than duplicate each other.

Prerequisites

  • Administrator access to the Admin Dashboard, typically the super_admin or billing_admin role.
  • API keys that follow the per-purpose convention, so that usage attributed to a user or group is precise rather than blurred across shared credentials. The convention is established in Onboard developers and issue keys.
  • Group identity flowing in from the identity provider. Group budgets are only as meaningful as the group membership behind them; the mapping from identity-provider groups to business functions is covered in Map Entra ID groups to business functions.
  • Agreement with finance or the relevant business owners on the ceilings themselves and on what should happen when a ceiling is reached. A budget is a policy decision before it is a configuration; the operator implements the decision rather than originating it.

Step 1: track usage and cost per user and per group

Before a budget can be set sensibly, the existing consumption has to be visible at the same dimension the budget will use. Usage Analytics is the surface for this, and the By User breakdown is the entry point.

  1. Sign in to the Admin Dashboard and open Usage Analytics.
  2. Apply a time range that matches the budgeting cadence: last 30 days or the last billing cycle is the usual choice for a monthly budget.
  3. Switch to the By User breakdown and review the ranked list. Each row carries the request count, token totals, and estimated cost for that user across all of their keys.
  4. Select a user row to drill into that user's individual usage: the specific keys, models, and costs behind the row. This is the fastest way to move from "this person is an outlier" to the reason why.

Both units of measure matter, and they answer different questions. Token totals describe the workload, how much the user or group is actually asking the models to do, and are stable across pricing changes. Cost translates that workload into money using the configured model pricing, and is the unit finance and business owners think in. A budget conversation is almost always conducted in money; a sizing or capacity conversation is usually conducted in tokens. Both are available in the same breakdown.

The group dimension is the aggregate of its members. Where the deployment surfaces a group breakdown directly, it is read the same way as the By User view. Where it does not, group spend is derived by summing the cost of the users mapped to that group, which is exactly why the group mapping in Map Entra ID groups to business functions has to be correct before group budgets carry any weight. A user counted in the wrong group is spend charged to the wrong cost centre.

Step 2: set a monthly budget limit per user and per group

A budget is a spend ceiling expressed over a period, typically a calendar month, attached to a user or a group rather than to a single key. Setting one is the act of recording the ceiling agreed with finance and selecting how the platform should behave as the ceiling is approached and crossed.

  1. Open the budget configuration for the target user or group from the Admin Dashboard.
  2. Set the period to match the agreed cadence: a monthly budget aligned to the billing cycle is the common case.
  3. Enter the ceiling in the unit finance agreed to. A money ceiling is the usual choice for a user or group budget, because the conversation behind it was conducted in money.
  4. Choose the enforcement mode, soft or hard, as described in Step 3.
  5. Record the reasoning behind the ceiling somewhere durable. The audit trail in Audit platform activity captures that the budget changed and who changed it; the business justification for the number belongs alongside it.

A per-user budget bounds a person across every key they hold, which is the right unit when the concern is an individual's consumption regardless of how their work is split across credentials. A per-group budget bounds an organisational unit across all of its members, which is the right unit when finance owns a cost centre and wants a single ceiling over everyone charged to it.

The two layers coexist. A group can carry an overall ceiling while individual members carry their own, tighter ceilings underneath it; the most restrictive applicable limit is the one that governs. Sizing the numbers follows the same judgment as sizing a rate limit: derive the baseline from the historical spend read in Step 1, set the ceiling comfortably above normal consumption but well below any level that would be a problem, and tighten a new budget that has no history rather than leaving it loose.

Step 3: choose soft or hard enforcement

The single most consequential decision in a budget is what happens when it is exceeded. There are two modes, and they suit different situations.

ModeBehaviour at the ceilingWhen it is appropriate
Soft enforcementSpend is tracked against the ceiling and an alert is raised as the budget is approached or crossed, but requests continue to be servedProduction workloads where cutting traffic off mid-month is a worse outcome than a temporary overspend; budgets that are forecasting and accountability tools rather than hard contractual caps
Hard enforcementRequests are blocked once the ceiling is reached, until the period resets or the ceiling is raisedCapped evaluation or experimentation budgets; teams with a genuinely fixed allocation; any situation where overspend is unacceptable and a hard stop is preferable to an invoice surprise

Soft enforcement is the safer default for anything serving real users. A budget overrun is rarely an emergency, whereas a sudden block on a production path looks like an outage to the business it serves. The alert gives the operator and the business owner time to decide deliberately (raise the ceiling, investigate the cause, or accept the overspend) rather than having the platform decide for them.

Hard enforcement is the right tool when the ceiling is a real boundary rather than a guideline. An evaluation budget that is meant to be spent down and then stop, a team whose allocation genuinely cannot be exceeded without approval, a cost centre under a strict freeze: in each of these, a block is the intended behaviour and a silent overrun would defeat the purpose. The cost of hard enforcement is that legitimate work can be cut off at an inconvenient moment, so it pairs best with an alert raised well before the ceiling, so that the block is anticipated rather than a surprise.

Many deployments combine the two across the group structure: production-facing groups run soft so that customer traffic is never cut, while evaluation and discretionary groups run hard so that a runaway is stopped automatically. The choice is a business decision; the operator's role is to implement it and to make sure the alerting behind a soft budget is actually wired to someone who will act on it.

Step 4: produce user- and group-based consumption reports

Reporting is where budgets earn their keep with finance and business owners. The same Usage Analytics breakdowns used to track spend in Step 1 are the basis for the periodic statement.

  1. Open Usage Analytics and set the time range to the reporting period, the closed billing cycle for a monthly statement.
  2. Select the By User breakdown for a per-person statement, or the group view for a per-business-function statement.
  3. Sort by cost descending so that the largest consumers sit at the top, which is the order most reviews want to read in.
  4. Review the figures against the budgets set in Step 2. A consumption report read next to the ceiling it was measured against is far more useful than either in isolation.

A consumption report is a point-in-time statement of what was spent; a budget is the ceiling it is measured against. Presenting them together (spend, ceiling, and the gap between them, per user and per group) is what turns raw usage data into something a business owner can act on. For the combined investigation pattern, where an unexpected line item in a report is traced back to the change that caused it, the spot-the-anomaly-then-find-the-cause workflow in Audit platform activity is the companion surface.

Step 5: export data for chargeback and showback

Chargeback (billing each business unit for its own consumption) and showback (reporting that consumption without actually moving money) both depend on getting the spend data out of the platform and into whatever financial system the organisation already runs.

  1. Apply the time range and the user or group breakdown that match the chargeback period and the cost-centre structure.
  2. Use the Export function to download the filtered data, typically as CSV.
  3. Load the exported file into the billing, finance, or reporting system that owns the chargeback or showback process.

The exported rows carry the dimensions chargeback needs (user, cost, token totals, and the period they cover) in a form a downstream system can join to its own cost-centre records. The platform's role ends at producing accurate, attributable data; the allocation of that data to ledgers and the decision to charge or merely to show belongs to the finance process consuming it. For organisations that need always-on cost dashboards rather than a periodic export, the OpenTelemetry export described in Working with budgets carries the same per-request dimensions into an external observability stack.

Step 6: attribute cost with caller-set tags

User and group are fixed dimensions: spend is grouped by who issued the request and which organisational unit they belong to. A chargeback conversation frequently runs along a different axis (the team that owns a workload, the application a request belongs to, or the project a cost should be booked against) and those dimensions do not always map cleanly onto identity-provider groups. Attribution tags supply that missing axis. A tag is a label the caller sets on a request (a team, an app, a project, or whatever cost dimension the business reports along) so that cost can be grouped beyond the fixed user and group breakdowns.

  1. Agree the tag dimensions with finance and the business owners before any are set. A tag is only useful for chargeback if everyone producing requests and everyone reading reports uses the same set of names; an ungoverned tag space produces a long tail of near-duplicate labels that no report can sum across.
  2. Establish how each tag is carried on the request, typically as request metadata the calling code attaches, alongside the credential that already identifies the user. The convention is documented for the teams that own the workloads in Monitor traffic and usage, so that the tags a report depends on are set consistently at the source.
  3. Constrain the tags by policy where the attribution has to be trustworthy. A policy can require that a tag is present before a request is served and can restrict its value to an agreed set, so that a caller can neither omit the attribution nor book spend against a team or project that is not theirs. Without this, a tag is a hint rather than a billable fact.
  4. Group a consumption report by a tag the same way Step 4 groups by user or group, by team or by project, and read it against the budgets the tagged workloads sit under.

Tag-based attribution coexists with the user and group dimensions rather than replacing them. A single request carries its user, its group membership, and whatever tags the caller set, so the same spend can be reported per person, per cost centre, and per project without re-running the workload. The fixed dimensions answer "who spent this"; the tags answer "what should it be booked against", and the two are most useful read together.

Where the attribution feeds chargeback, the policy constraint is what makes it defensible. A report grouped by a tag that any caller could set to any value invites disputes about whose budget a line item belongs to; a report grouped by a tag the platform required and validated carries the same authority as the user dimension behind it. That a tagging or attribution policy changed, and who changed it, is recorded in the audit trail described in Audit platform activity, alongside the budget changes from Step 2.

Step 7: export grouped cost reports for finance and team leads

A tag dimension earns its keep the same way the user and group dimensions do, in a report finance and team leads actually consume. The export described in Step 5 carries the tag dimensions out of the platform alongside the user, cost, and token totals already covered.

  1. Apply the time range, the tag grouping (by team or by project), and any user or group filter that match the chargeback period and the cost-centre structure.
  2. Use the Export function to download the grouped data, typically as CSV, with one row per tag value over the period.
  3. Hand the team- or project-grouped file to the finance process or the team lead who owns that cost, or load it into the billing system that owns the chargeback allocation.

A tag-grouped export gives a team lead a statement of their own team's spend without exposing the wider organisation's figures, and gives finance a per-project breakdown that maps onto the cost centres a user or group view cannot express. The platform's role still ends at producing accurate, attributable rows; the grouped file joins to the same downstream ledgers as the user- and group-based exports, along the dimension the business reports against.

How budgets relate to rate limits

Budgets and rate limits are easy to conflate because both are ceilings, but they answer different questions and operate on different axes.

BudgetRate limit
ControlsTotal spend over a period, a spend capRequest or token throughput over a short window, flow control
UnitMoney or cumulative tokens, per user or groupTokens per rolling window, per key
Question answered"How much may this person or team spend this month?""How fast may this credential consume right now?"
Failure it preventsA cumulative overrun discovered at the end of the periodA burst or runaway happening in real time

A rate limit is flow control: it bounds how fast a single credential consumes within a short window, and it fires the moment traffic crosses that rate. A budget is a spend cap: it bounds cumulative consumption over a whole period, and it tracks toward exhaustion across many requests and many keys. A rate limit set generously can still allow a month-long drift that quietly exhausts a budget; a budget can be intact while a single misconfigured key bursts hard enough to need a rate limit. The two are configured to complement each other rather than duplicate: the rate limit catches the fast runaway at the key, and the budget catches the slow accumulation at the user or group. Configuring rate limits is covered in Set rate limits.

What to do next

  • Set rate limits: the flow-control companion to the spend cap this guide configures. See Set rate limits.
  • Working with budgets: the per-key budgeting discipline that this guide extends to the user and group dimensions. See Working with budgets.
  • Map Entra ID groups to business functions: the group mapping that makes group budgets and chargeback meaningful. See Map Entra ID groups to business functions.
  • Audit platform activity: the analytics and audit surface this guide reads from and records against. See Audit platform activity.
  • Monitor traffic and usage: the developer-side view of the same consumption data, for the teams that own the spend. See Monitor traffic and usage.