Cost control and chargeback

Accurate GPU utilization, reclaimed idle capacity, and chargeback that holds up

The most common state of an expensive GPU cluster is high allocation and low utilization. Allocation says the cluster is full; DCGM says GPUs are at 15%. We help you see where capacity is going, reclaim what is idle, and produce chargeback records attributing usage to the user, project, or team that incurred it.

Talk to an engineer All services

What we deliver

Visibility, reclamation, and attribution

In that order. First you need accurate utilization data. Then you reclaim the idle capacity. Then you attribute what was used so the teams incurring cost have an incentive to use capacity efficiently.

Accurate utilization data

DCGM and Prometheus measuring actual GPU utilization, memory, and SM activity per pod, per node, and per job, joined to the user and project that owns the workload. The gap between allocation and use, made visible.

Idle detection and reclamation

Workloads that hold GPUs while doing nothing get flagged and reclaimed: idle interactive sessions culled after a configurable window, zombie pods cleaned up, abandoned jobs terminated. Capacity returns to the queue.

Quota and fair-share scheduling

Quota so one team cannot exhaust the cluster, and fair-share so a team that has used more than its budget yields to one that has not. The queue stays fair without manual arbitration.

Per-user, per-project chargeback records

GPU-hours attributed to the user, project, and cost center that consumed them, exported in a format your finance or FinOps tooling can ingest. We document the attribution methodology so the numbers can survive a dispute.

Reserved vs on-demand analysis

If you rent GPUs, the split between reserved commitments and on-demand capacity is a significant cost variable. We model your actual usage patterns against your pricing and recommend an allocation.

Utilization reporting

Dashboards and exportable reports that answer the questions leadership asks: what was spent, who spent it, what fraction produced useful work, and how the trend is moving.

Two different problems

Internal chargeback vs operator invoicing

Splitting one cluster across your own teams

If you run a shared GPU cluster inside one organization and need to attribute usage to business units, that is workload-level chargeback: who ran what, for how long, sourced from DCGM, job records, and workspace lifecycle. This is what this page covers.

Billing external tenants for hardware

If you operate a GPU cloud and invoice external customers, the billing unit is hardware-time, not workload-time, and the data source is the allocation ledger, not DCGM. That is a different model covered in the tenant platform service line and the operator chargeback page.

Showback before chargeback

Most organizations start with showback: show each team what they consumed and what it cost, with no money changing hands. The visibility alone changes behavior. Real internal chargeback comes once the numbers are trusted.

Attribution methodology matters

Chargeback only holds up if the team being charged believes the number. We attribute carefully across whole-GPU, MIG slice, shared node, and idle time cases, and document the methodology so it survives the first dispute.

Where GPU spend goes to waste

Common sources, and what we do about each

Source	What we do
Idle interactive sessions	Auto-shutdown after a configurable idle window, so a notebook left open overnight stops holding a GPU.
Zombie and abandoned pods	Detection and reaping of pods that hold GPUs without running useful work.
Over-allocated jobs	Right-sizing: identifying where a MIG slice would suffice, or where fewer GPUs are warranted based on observed utilization.
Capacity hoarding	Quota and fair-share so capacity is shared rather than held by whichever team submits first.
Wrong reserved/on-demand split	Usage modeling against your pricing to minimize total cost.
No accountability	Chargeback that attributes cost to the teams that incurred it, which changes behavior without manual enforcement.