Cost control and chargeback
Accurate GPU utilization, reclaimed idle capacity, and chargeback that holds up
The most common state of an expensive GPU cluster is high allocation and low utilization. Allocation says the cluster is full; DCGM says GPUs are at 15%. We help you see where capacity is going, reclaim what is idle, and produce chargeback records attributing usage to the user, project, or team that incurred it.
What we deliver
Visibility, reclamation, and attribution
In that order. First you need accurate utilization data. Then you reclaim the idle capacity. Then you attribute what was used so the teams incurring cost have an incentive to use capacity efficiently.
Accurate utilization data
DCGM and Prometheus measuring actual GPU utilization, memory, and SM activity per pod, per node, and per job, joined to the user and project that owns the workload. The gap between allocation and use, made visible.
Idle detection and reclamation
Workloads that hold GPUs while doing nothing get flagged and reclaimed: idle interactive sessions culled after a configurable window, zombie pods cleaned up, abandoned jobs terminated. Capacity returns to the queue.
Quota and fair-share scheduling
Quota so one team cannot exhaust the cluster, and fair-share so a team that has used more than its budget yields to one that has not. The queue stays fair without manual arbitration.
Per-user, per-project chargeback records
GPU-hours attributed to the user, project, and cost center that consumed them, exported in a format your finance or FinOps tooling can ingest. We document the attribution methodology so the numbers can survive a dispute.
Reserved vs on-demand analysis
If you rent GPUs, the split between reserved commitments and on-demand capacity is a significant cost variable. We model your actual usage patterns against your pricing and recommend an allocation.
Utilization reporting
Dashboards and exportable reports that answer the questions leadership asks: what was spent, who spent it, what fraction produced useful work, and how the trend is moving.
Two different problems
Internal chargeback vs operator invoicing
Splitting one cluster across your own teams
If you run a shared GPU cluster inside one organization and need to attribute usage to business units, that is workload-level chargeback: who ran what, for how long, sourced from DCGM, job records, and workspace lifecycle. This is what this page covers.
Billing external tenants for hardware
If you operate a GPU cloud and invoice external customers, the billing unit is hardware-time, not workload-time, and the data source is the allocation ledger, not DCGM. That is a different model covered in the tenant platform service line and the operator chargeback page.
Showback before chargeback
Most organizations start with showback: show each team what they consumed and what it cost, with no money changing hands. The visibility alone changes behavior. Real internal chargeback comes once the numbers are trusted.
Attribution methodology matters
Chargeback only holds up if the team being charged believes the number. We attribute carefully across whole-GPU, MIG slice, shared node, and idle time cases, and document the methodology so it survives the first dispute.
Where GPU spend goes to waste
Common sources, and what we do about each
| Source | What we do |
|---|---|
| Idle interactive sessions | Auto-shutdown after a configurable idle window, so a notebook left open overnight stops holding a GPU. |
| Zombie and abandoned pods | Detection and reaping of pods that hold GPUs without running useful work. |
| Over-allocated jobs | Right-sizing: identifying where a MIG slice would suffice, or where fewer GPUs are warranted based on observed utilization. |
| Capacity hoarding | Quota and fair-share so capacity is shared rather than held by whichever team submits first. |
| Wrong reserved/on-demand split | Usage modeling against your pricing to minimize total cost. |
| No accountability | Chargeback that attributes cost to the teams that incurred it, which changes behavior without manual enforcement. |