GPU Cloud Comparison: 16 Neoclouds for AI Training in 2025

A technical comparison of GPU cloud providers beyond AWS, GCP, and Azure, covering pricing, InfiniBand networking, storage options, and platform maturity for AI training workloads.

If you’re running AI training workloads and hitting GPU availability limits or cost walls on AWS, GCP, or Azure, a wave of specialized GPU cloud providers (often called “neoclouds”) offer an alternative. These providers focus exclusively on GPU infrastructure, often with simpler pricing, immediate availability, and hardware optimized for AI workloads.

This guide compares 16 GPU cloud providers across the dimensions that matter for production AI training: GPU pricing, InfiniBand networking, storage options, and platform capabilities. We focus on what’s publicly documented, noting where information requires sales contact.

The Neocloud Landscape

The term “neocloud” refers to cloud providers primarily offering GPU-as-a-Service (GPUaaS). Unlike hyperscalers with broad service portfolios, neoclouds focus on delivering GPU compute with high-speed interconnects for AI and HPC workloads.

According to McKinsey, between 10-15 neoclouds currently operate at meaningful scale in the US, with footprints growing across Europe, the Middle East, and Asia. The largest, CoreWeave, went public in 2025 with over 250,000 NVIDIA GPUs across 32 data centers.

The value proposition is straightforward: neoclouds price GPUs 30-85% cheaper than hyperscalers, offer faster provisioning (minutes vs weeks for quota approvals), and provide specialized infrastructure configurations with InfiniBand networking standard on GPU nodes.

GPU Pricing Comparison

All providers offer NVIDIA H100 80GB GPUs. Pricing varies significantly based on whether you’re renting individual GPUs, full nodes (typically 8 GPUs), or multi-node clusters with InfiniBand.

On-Demand GPU Pricing

ProviderH100H200B200GB200Source
Vast.ai$0.90-1.87/hrVariesVariesLink
SF Compute$1.43-1.77/hrAvailableLink
Hyperstack$1.90-2.40/hr$3.50/hrContactContactLink
GMI Cloud$2.10/hr$2.50-3.50/hrPre-orderPre-orderLink
DataCrunch/Verda$1.99/hr$2.59/hr$3.79/hrLink
Voltage Park$1.99-2.49/hrContactContactContactLink
Lambda$2.29-2.99/hr$2.99/hrLink
RunPod$1.99-2.69/hr$3.59/hr$5.98/hrLink
FluidStack$2.89/hrContactContactContactLink
Nebius$2.95/hr$3.50/hr$5.50/hrPre-orderLink
Vultr$2.99/hr$2.99/hr$2.89/hrLink
OVHcloud$2.99/hrLink
Crusoe$3.90/hr$4.29/hrContactContactLink
CoreWeave~$6.15/hr~$6.30/hr$8.60/hr$10.50/hrLink
TensorWaveN/A
NscaleContactContactContactLink

AMD GPU Availability

Only four providers currently offer AMD Instinct GPUs:

ProviderMI300X PriceMI325X PriceMI355X PriceSource
TensorWave$1.50/hr$2.25/hrLink
Vultr$1.85/hr$2.00/hr$2.59/hrLink
Crusoe$3.45/hrContactContactLink
NscaleContactContactContactLink

GPU Model Availability

GPU selection varies significantly by provider. Here’s what each offers:

NVIDIA Hopper (H100/H200/B200/GB200):

  • Nebius: H100, H200, B200, GB200 (pre-order), L40S
  • CoreWeave: H100, H200, B200, GB200, A100, L40S, RTX A-series
  • Crusoe: H100, H200, B200, GB200, A100, L40S
  • GMI Cloud: H100, H200, GB200 NVL72, HGX B200 (coming soon)
  • Lambda: H100, B200, A100
  • Voltage Park: H100 (H200/B200/GB200 require sales contact)
  • FluidStack: H100, H200, A100, L40S (B200/GB200 require sales contact)
  • RunPod: H100, H200, B200, A100, L40S, RTX 3090/4090
  • Hyperstack: H100, H200, A100, L40S (B200/GB200 require sales contact)
  • DataCrunch/Verda: H100, H200, B200, A100, L40S
  • Vultr: H100, H200, B200, A100, L40S
  • OVHcloud: H100, A100, L40S
  • Nscale: H100, H200, GB200 (contact), A100
  • Vast.ai: H100, H200, B200, A100, L40S (plus full range of consumer GPUs)
  • SF Compute: H100, H200

AMD Instinct:

  • TensorWave: MI300X, MI355X (AMD-only, no NVIDIA)
  • Vultr: MI300X, MI325X, MI355X
  • Crusoe: MI300X (MI325X/MI355X require sales contact)
  • RunPod: MI300X
  • Nscale: MI300X (contact for pricing)
  • Vast.ai: MI300X

Infrastructure Ownership Models

Understanding whether a provider owns their infrastructure or aggregates from others matters for reliability, support, and pricing stability.

Ownership Model Comparison

ProviderModelDescriptionSource
CrusoeOwnerVertically integrated; manufactures own modular DCs via Easter-Owens Electric acquisitionLink
OVHcloudOwnerFully vertically integrated; designs/manufactures servers, builds/manages own DCsLink
GMI CloudOwnerFull-stack ownership; offshoot of Realtek/GMI Tech with Taiwan supply chain advantageLink
NebiusOwner + ColoOwns DCs in Finland and NJ (300 MW); colocation in Kansas City, Iceland, ParisLink
CoreWeaveOwnerAcquired Core Scientific ($9B, 1.3 GW) and NEST DC ($322M); 250K+ GPUs across 32 DCsLink
NscaleOwnerOwns 60MW Glomfjord DC; JV with Aker for 230MW Stargate Norway facilityLink
FluidStackOwner + Aggregator62% owned infrastructure, 38% marketplace; $10B GPU asset financing via MacquarieLink
LambdaOwner (colo)Owns GPU hardware; colocation in SF and Texas; Nvidia leases back GPUs ($1.5B deal)Link
Voltage ParkOwner (colo)Owns 24K H100s ($500M) across 6 Tier 3 DCs in TX, VA, WALink
HyperstackOwner (colo)Owns 13K GPUs; long-term agreements with hyperscalers and renewable energy DCsLink
DataCrunch/VerdaOwner (colo)Owns GPUs in 4 Nordic colos (3x Helsinki, 1x Iceland); building own DCs in 2025Link
VultrOwner (colo)Owns hardware across 32 global colocation facilities (Sabey, Singtel partnerships)Link
TensorWaveOwner (colo)Owns 8K AMD GPUs; leases 1 GW capacity across TECfusions portfolio (AZ, VA, PA)Link
RunPodOwner + AggregatorSecure Cloud (Tier 3/4 partners) + Community Cloud (aggregated third-party hosts)Link
Vast.aiAggregatorPure marketplace connecting 10K+ GPUs from individuals to datacentersLink
SF ComputeAggregatorTwo-sided marketplace (“Airbnb for GPUs”); manages $100M+ hardware, ~10% feeLink

What this means for you:

  • Owner: Full control over hardware and facilities; consistent performance but finite capacity
  • Owner (colo): Owns GPUs/servers but rents data center space; good control with geographic flexibility
  • Owner + Colo: Mix of owned and colocated data centers
  • Owner + Aggregator: Mix of owned infrastructure and marketplace aggregation
  • Aggregator: No owned infrastructure; maximum price competition but variable quality

InfiniBand and High-Speed Networking

For multi-node distributed training, network bandwidth between GPUs is critical. InfiniBand provides lower latency and higher bandwidth than Ethernet, with RDMA (Remote Direct Memory Access) enabling GPU-to-GPU communication without CPU involvement.

InfiniBand Availability

ProviderInfiniBandSpeed (per GPU)AvailabilityTopologySource
NebiusYes400Gb/s (Quantum-2)All GPU nodesRail-optimizedLink
CoreWeaveYes400Gb/s (Quantum-2)H100/H200 clustersNon-blocking fat-treeLink
CrusoeYes400Gb/sAll GPU nodesPartition-based isolationLink
DataCrunch/VerdaYes400Gb/s (NDR)Instant clustersRail-optimizedLink
GMI CloudYes400Gb/s (NDR)All GPU nodesNot documentedLink
Voltage ParkYes400Gb/s (Quantum-2)IB tier ($2.49/hr)Rail-optimizedLink
FluidStackYes400Gb/sClustersNot documentedLink
VultrYes400Gb/s (Quantum-2)H100/H200 clustersNon-blockingLink
LambdaClusters only400Gb/s (Quantum-2)1-Click ClustersRail-optimizedLink
RunPodClusters only200-400Gb/sInstant ClustersNot documentedLink
HyperstackSupercloud only400Gb/sH100/H200 SXMQuantum-2Link
Vast.aiBy requestNot specifiedCustom clustersVaries by hostLink
OVHcloudCustom onlyNot documentedH100 SXM (sales)Not documentedLink
TensorWaveRoCE only400Gb EthernetAll nodesAviz ONES fabricLink
NscaleRoCE onlyNot documentedAll nodesBroadcom-basedLink
SF ComputeYes400Gb/sK8s clusters onlyNot documentedLink

Key observations:

  • 400Gb/s NDR InfiniBand is now standard (per GPU) among providers with InfiniBand. Each GPU has its own 400Gb/s NIC. No provider publicly documents 800Gb/s availability yet.
  • Rail-optimized topology minimizes hops for all-reduce operations by connecting each GPU’s NIC to a different leaf switch.
  • TensorWave and Nscale use RoCE (RDMA over Converged Ethernet) instead of InfiniBand. RoCE provides RDMA capabilities over standard Ethernet, with lower cost but potentially higher latency under congestion.
  • Single-GPU instances typically don’t include InfiniBand at Lambda, RunPod, and Hyperstack. You need to provision cluster configurations.

Storage Options

Training workloads need three types of storage: block storage for OS and application data, object storage for datasets and checkpoints, and shared filesystems for multi-node data access.

Storage Comparison

ProviderBlock StorageObject StorageShared FSTechnologySource
Nebius$0.05-0.12/GB/moS3 $0.015/GB/mo$0.08/GB/moNFSLink
CoreWeaveYesS3 $0.03-0.06/GB/mo$0.07/GB/moVAST, WEKA, DDNLink
Crusoe$0.08/GB/mo$0.07/GB/moLightbitsLink
LambdaS3 adapter only$0.20/GB/moVAST DataLink
Voltage ParkLocal NVMeVAST S3VAST NFSVAST DataLink
DataCrunch/Verda$0.05-0.20/GB/moComing soon$0.20/GB/moNVMe SFSLink
GMI CloudIntegratedVAST S3VAST NFSVAST Data, GPUDirectLink
RunPod$0.10/GB/moS3 (5 DCs)$0.05-0.07/GB/moNetwork volumesLink
Vultr$0.10/GB/moS3 $0.018-0.10/GB/mo$0.10/GB/moNVMe-backedLink
OVHcloud$0.022/GB/moS3 + egress$120-150/TB/moNetAppLink
Hyperstack~$0.07/GB/moIn developmentWEKA (Supercloud)NVMeLink
FluidStackFilesystem onlyNot documentedLink
Vast.aiPer-hostVariesLink
TensorWaveLocal onlyWEKA (custom)Not documentedLink
NscaleNot documentedNot documented“Parallel FS”Not documentedLink
SF ComputeLocal NVMe only1.5TB+ per nodeLink

Key observations:

  • VAST Data is popular: Lambda, Voltage Park, and CoreWeave all use VAST for high-performance shared storage.
  • Object storage gaps: Crusoe, FluidStack, Vast.ai, TensorWave, and Nscale don’t offer native S3-compatible object storage.
  • Shared filesystem is critical for multi-node training: Without it, you need to copy data to each node’s local storage or stream from object storage.
  • OVHcloud’s shared storage is expensive: NetApp-based Enterprise File Storage at $120-150/TB/month is 10-20x the cost of other providers' offerings.

Storage Performance

Most providers don’t publish detailed storage performance specs. Where documented:

ProviderShared FS ThroughputNotesSource
Nebius12 GBps read, 8 GBps write per 8-GPU VMSignificantly faster than AWS EFSLink
Lambda~11 GB/s per mount (VAST)With nconnect=32 and 100Gb NICLink
DataCrunch/Verda2000 MB/s continuous (NVMe SFS)Per volumeLink

For comparison, AWS EFS maxes out at 1.5 GBps, Azure Premium Files at 10 GBps, and GCP Filestore at 25 GBps.

Egress Pricing

Data transfer costs can significantly impact total cost of ownership, especially for workloads that move large datasets or serve inference traffic.

Egress Comparison

ProviderEgress CostNotesSource
NebiusFreeExplicitly zero egressLink
CoreWeaveFree (object storage)Object storage via LOTALink
CrusoeFreeZero data transfer feesLink
LambdaFreeZero egressLink
Voltage ParkFreeNo hidden costsLink
FluidStackFreeZero egress/ingressLink
RunPodFreeZero data transferLink
HyperstackFreeZero bandwidth chargesLink
Vultr$0.01/GBAfter 2TB/mo freeLink
OVHcloud$0.011/GBObject storage only, compute egress freeLink
Vast.aiVariesPer-host, can be $20+/TBLink
DataCrunch/VerdaNot documentedLink
GMI CloudNot documentedLink
TensorWaveNot documentedClaims “no hidden costs”Link
NscaleNot documentedLink
SF ComputeFreeNo ingress/egress feesLink

Free egress is now standard among GPU neoclouds. This is a significant differentiator from hyperscalers, where egress costs can add 20-40% to monthly bills for data-intensive workloads.

Kubernetes and Orchestration

Most production AI workloads run on Kubernetes. Support varies from fully managed Kubernetes to bring-your-own orchestration.

Kubernetes Support

ProviderManaged K8sSlurmAutoscalingNotesSource
NebiusYesManaged + SoperatorYesFirst Slurm Kubernetes operatorLink
CoreWeaveYes (CKS)SUNKYesBare-metal K8s, no hypervisorLink
CrusoeYes (CMK)YesYesRun:ai integrationLink
LambdaAvailableFocus on 1-Click ClustersLink
Voltage ParkAdd-onHelm/Rook-Ceph guidesLink
DataCrunch/VerdaPre-installedSlurm on clustersLink
GMI CloudYes (Cluster Engine)YesK8s-based orchestrationLink
RunPodYesServerless focusLink
VultrYes (VKE)YesStandard managed K8sLink
OVHcloudYesYesStandard managed K8sLink
HyperstackVMs onlyLink
FluidStackAtlas platformLink
Vast.aiContainer-basedLink
TensorWaveYesPyxis/Enroot containersLink
NscaleYes (NKS)YesLimited docsLink
SF ComputeYesManaged K8s per zoneLink

Key observations:

  • Nebius and CoreWeave have the most mature Kubernetes offerings with GPU-optimized features like pre-installed drivers and topology-aware scheduling.
  • Slurm remains popular for HPC-style workloads. Nebius’s Soperator is notable as the first open-source Kubernetes operator for running Slurm clusters.
  • Serverless/container platforms (RunPod, Vast.ai, FluidStack) trade Kubernetes flexibility for simpler deployment models.

Platform Maturity

Beyond raw GPU access, production workloads need automation, observability, and enterprise features. MLOps capabilities vary significantly across providers.

Terraform and API Support

ProviderTerraform ProviderAPICLISource
NebiusOfficialgRPCYesLink
CoreWeaveOfficial (Feb 2025)YesYesLink
CrusoeOfficialRESTYesLink
LambdaYesYesLink
VultrOfficialRESTYesLink
OVHcloudOfficialRESTYesLink
HyperstackInfrahub APILink
DataCrunch/VerdaYesLink
GMI CloudRESTYesLink
RunPodGraphQLYesLink
Voltage ParkLink
FluidStackAtlas APILink
Vast.aiYesYesLink
TensorWaveLink
NscaleYesLink
SF ComputeYesLink

Self-Service Access

ProviderAccess ModelNotesSource
NebiusSelf-serviceSign up, add $25+, deploy up to 32 GPUs immediatelyLink
LambdaSelf-serviceCreate account and launch GPUs in minutes, pay-as-you-goLink
HyperstackSelf-serviceInstant access, one-click deploymentLink
DataCrunch/VerdaSelf-serviceOrder GPU instances in minutes via dashboard or APILink
GMI CloudSelf-serviceSign up, launch instances in 5-15 minutes via console/APILink
VultrSelf-serviceFree account signup, provision via portal/API/CLILink
OVHcloudSelf-serviceCreate account, $200 free credit for first projectLink
FluidStackSelf-serviceSign up at auth.fluidstack.io, launch in under 5 minutesLink
RunPodSelf-serviceDeploy GPUs in under a minute, no rate limitsLink
Vast.aiSelf-service$5 minimum to start, per-second billingLink
CrusoeSelf-serviceSign up via console, larger deployments contact salesLink
Voltage ParkSelf-serviceOn-demand GPUs available, reserved capacity contact salesLink
NscaleHybridSelf-service for inference only; training clusters require salesLink
SF ComputeSelf-serviceSign up to buy, larger deployments contact salesLink
CoreWeaveSales-gatedRequires organizational approval from sales teamLink
TensorWaveSales-gatedContact sales/solutions engineers to get startedLink

Compliance and Enterprise Features

ProviderSOC 2SSO/SAMLRegionsSource
NebiusType II + HIPAAMicrosoft Entra IDUS, EU (Finland, France, Iceland, UK)Link
CoreWeaveAlignedSAML/OIDCUS, UK, Spain, Sweden, NorwayLink
CrusoeType IIYesUS (TX, VA), Iceland, Norway (soon)Link
LambdaUSLink
Vultr32 global locationsLink
OVHcloudYesGlobalLink
DataCrunch/VerdaISO 27001EU (Finland, Iceland)Link
GMI CloudSOC 2 Type 1, ISO 27001Taiwan, Thailand, Malaysia, US (CA)Link
RunPodType IIMultipleLink
Voltage ParkUS (WA, TX, VA, UT)Link
HyperstackUS, Canada, EULink
FluidStackUS, EULink
Vast.aiVaries by hostLink
TensorWaveUSLink
NscaleNorwayLink
SF ComputeUSLink

Choosing a Provider

For Multi-Node Training at Scale

Best options: Nebius, CoreWeave, Crusoe

These providers offer:

  • InfiniBand on all GPU nodes (not just clusters)
  • Managed Kubernetes with GPU-optimized features
  • High-performance shared storage
  • Free egress
  • Terraform providers for infrastructure-as-code

CoreWeave has the largest scale and was first to market with Blackwell (GB200). Nebius offers the most complete managed service stack (K8s, Slurm, PostgreSQL, MLflow). Crusoe is the only option if you need AMD GPUs with enterprise features.

For Cost-Optimized Experimentation

Best options: Vast.ai, SF Compute, DataCrunch/Verda

These providers offer:

  • Lowest per-GPU pricing (Vast.ai $0.90-1.87/hr, SF Compute $1.43-1.77/hr, DataCrunch $1.99/hr)
  • Per-second billing (Vast.ai, DataCrunch)
  • Quick deployment
  • Community/spot options for non-production workloads

Trade-offs: Vast.ai and SF Compute are marketplace aggregators with variable infrastructure quality. Less documentation and fewer managed services than enterprise-focused providers.

For European Data Sovereignty

Best options: Nebius, DataCrunch/Verda, Nscale, OVHcloud

All operate exclusively or primarily in EU data centers with:

  • GDPR compliance
  • SOC 2 Type II + HIPAA (Nebius), ISO 27001 (DataCrunch)
  • 100% renewable energy (DataCrunch, Nscale)
  • EU-based data centers (Nebius in Finland, DataCrunch in Finland/Iceland, Nscale in Norway, OVHcloud in France)

For AMD GPUs

Best options: TensorWave, Crusoe

TensorWave is AMD-focused with MI300X at $1.50/hr (the cheapest option). Crusoe offers MI300X with enterprise features (SOC 2, managed K8s) at $3.45/hr.

What’s Missing from Public Documentation

Across all providers, several categories consistently require sales contact:

  • Reserved/committed pricing: All providers advertise discounts (30-60%) but don’t publish specific tiers
  • Detailed SLAs: Most mention “99.9% uptime” without specifics on what’s covered
  • Network topology: Rail configurations, oversubscription ratios, switch specifications
  • Maximum cluster sizes: How many GPUs can you provision in a single cluster?
  • Compliance details: Which specific certifications and what’s the audit scope?

Conclusion

The GPU neocloud market has matured significantly. Free egress is now standard, 400Gb/s InfiniBand is table stakes for serious providers, and pricing has compressed to $2-4/hr for H100s (vs $6-12/hr on hyperscalers).

For production AI training, Nebius, CoreWeave, and Crusoe offer the most complete platforms. For cost-sensitive experimentation, Vast.ai, SF Compute, and DataCrunch provide the lowest prices. For European data sovereignty, Nebius and DataCrunch combine EU data centers with enterprise compliance (SOC 2/ISO 27001) and competitive pricing.

The main remaining gap is documentation transparency. Most providers require sales conversations for pricing on reserved capacity, large clusters, and enterprise features. As the market matures, expect more self-service options and published pricing for these categories.


Last updated: December 2025. Pricing and features change frequently. Verify current offerings on provider websites before making decisions.