GPU Clouds, Aggregators, and the New Economics of AI Compute

Saturn Cloud CTO, Hugo Shi, recently joined the AI Engineering Podcast to discuss the GPU cloud landscape – how the market is structured, what services different providers offer, and how teams should think about choosing between them. You can listen to the full episode here.

This post distills key insights from that conversation.

The Market Has Three Distinct Tiers

Hyperscalers (AWS, GCP, Azure, Oracle): Deep managed service ecosystems, but GPU pricing around $10/hour for H100s.

GPU Clouds: Specialists with significantly better pricing – typically $1.50 to $4/hour for H100s. This tier splits further:

Full-service clouds (Lambda Labs, CoreWeave, Nebius, Crusoe) offer managed Kubernetes, VPCs, load balancers, and growing managed service catalogs
Bare metal/concierge clouds work more like traditional hardware deals – talk to sales, wire money, receive IP addresses

GPU Aggregators (Shadeform, Vast AI, RunPod, FluidStack, SF Compute, BrevDev): Single interface to GPUs across multiple providers. Some operate their own hardware too; others purely aggregate.

The aggregator model runs counter to how enterprises typically approach the cloud – by staying with one provider. But GPU scarcity and pricing have made multi-provider access practical in ways that didn’t make sense for general compute.

Service Availability Varies More Than You’d Expect

Teams coming from hyperscalers often assume GPU clouds offer similar service depth. They don’t. Here’s the rough hierarchy of what to expect:

Universal: Machines with GPUs. That’s the baseline.

Common but not guaranteed: Managed Kubernetes or Slurm clusters.

Varies significantly: Storage. Some clouds only have ephemeral on-node storage. Others offer block storage, shared file systems, and object storage. Many now partner with Vast Data or Weka for high-performance storage that dramatically outperforms NFS, which matters for training data throughput.

Often missing: Load balancers and VPCs. You take these for granted in hyperscalers, but bare-metal providers may expose every machine directly to the internet.

Rare but growing: Managed services like MLflow or Postgres. GPU clouds are moving up the stack, but this layer is still thin.

For clouds missing required services, the workaround is to deploy open-source Kubernetes equivalents into the cluster and point workloads to those endpoints instead.

Kubernetes Portability Is Real, With Caveats

Helm charts and Kubernetes manifests generally transfer across providers without major changes. The containers and configurations work. That’s the Kubernetes promise, actually delivering.

But there’s a distinction worth making. “Cloud native” and “Kubernetes native” aren’t the same thing. Terraform orchestrating ECS is cloud native, but moving to GCP means rewriting everything – those APIs don’t transfer. Kubernetes-native deployments are genuinely more portable.

The harder problem is data gravity. If your data lives in AWS, accessing it from a GPU cloud means egress costs and latency. Options are limited: copy once and absorb the egress cost, or cache continuously and try to minimize it.

One counterintuitive option: most GPU clouds have free egress. Storing data in the GPU cloud and accessing it from a hyperscaler is technically viable – though few teams do this because they view hyperscalers as a home base.

Specialization Makes Things Easier

GPU clouds can be easier to use than hyperscalers for GPU-specific work. This seems counterintuitive, but the specialization helps.

On EKS, teams manage CSI drivers for EBS, configure the NVIDIA device plugin, and handle various integrations to make GPUs work properly. On providers like Nebius or Crusoe, managed Kubernetes includes GPU operators preconfigured. Everything works out of the box.

This is possible because GPUs are the primary use case. Everyone using these clusters needs InfiniBand, so it’s enabled by default – not buried as an opt-in. Permission models are simpler because there’s less to manage.

Security and Compliance Have Caught Up

Getting security approval for a GPU cloud used to be difficult. That’s changed.

Most established GPU clouds have SOC 2 and ISO 27001. Many have HIPAA. Security reviews are still work – onboarding any new cloud vendor is, but the compliance documentation exists.

For aggregators, there’s an additional consideration. It’s not that they’re insecure – they’re built by capable engineers. But working through an aggregator adds links to the chain, and more links mean more surface area. Teams with strict security requirements may prefer full-service GPU clouds.

How to Choose

The decision framework comes down to two factors:

High security needs + extensive managed services: Full-service GPU clouds (CoreWeave, Nebius, Crusoe, Vultr)

Lighter security requirements + minimal managed service needs: Aggregators offer the best pricing (SF Compute, Vast.ai)

Need to stay integrated with existing hyperscaler infrastructure: Keep training there; consider moving inference to GPU clouds

That last point matters. Training workloads touch sensitive data, call internal APIs, and require tight integration. Inference is more portable – a public endpoint that calls out for context but runs relatively isolated. Many GPU clouds now offer dedicated managed inference services for this reason.

Availability Is Improving

GPU scarcity is real, but less severe than 2023. The emerging dynamic: reserved demand concentrates on the newest hardware. When GB300s ship, contracts roll toward them, freeing H100 capacity for on-demand use.

H100s are notably easier to get now – not trivial, but better. This should enable GPU clouds to offer spot instances and capacity reservations, tools hyperscalers have used for years.

Some providers are already experimenting. SF Compute runs a financial marketplace model in which teams can reserve GPUs at specific future dates, with pricing set by projected supply and demand.

AMD Is Becoming Viable

AMD GPUs had real reliability and software issues early on. That’s improved substantially.

The ROCm/PyTorch integration is maturing, and AMD is contributing directly to the PyTorch open-source project – not maintaining a separate fork as they did with TensorFlow. That’s the right approach for ecosystem adoption.

NVIDIA’s software investment remains unmatched, particularly around CUDA and Python. AMD won’t close that gap completely. But “good enough for PyTorch and JAX workloads” is achievable, and having multiple viable hardware options benefits the ecosystem.

What’s Coming

More on-demand capacity: As demand shifts to newer GPU generations, previous generations become available for on-demand and spot.

Consolidation: There are arguably 100 GPU clouds, counting loosely. That won’t last. Mergers and failures will leave fewer, more capable providers.

GPU clouds moving up the stack: CoreWeave acquired Weights & Biases, Digital Ocean acquired Paperspace, and Lightning AI merged with Voltage Park. Raw compute isn’t enough; developer experience matters.

Continued service expansion: Managed Kubernetes, storage, and networking are becoming table stakes. Managed ML services are next.

The Takeaway

The GPU cloud market is genuinely different from the hyperscaler world most teams know. Providers vary dramatically in capabilities. Pricing can be 5-7x cheaper. The specialization can make GPU-specific work easier, not harder.

But it requires homework. Check what services actually exist. Understand the security model. Plan for data gravity. The market is maturing fast, and for the right workloads, the economics are hard to ignore.