GPU Cloud Comparison: 16 Neoclouds for AI Training in 2025

If you’re running AI training workloads and hitting GPU availability limits or cost walls on AWS, GCP, or Azure, a wave of specialized GPU cloud providers (often called “neoclouds”) offer an alternative. These providers focus exclusively on GPU infrastructure, often with simpler pricing, immediate availability, and hardware optimized for AI workloads.
This guide compares 16 GPU cloud providers across the dimensions that matter for production AI training: GPU pricing, InfiniBand networking, storage options, and platform capabilities. We focus on what’s publicly documented, noting where information requires sales contact.
The Neocloud Landscape
The term “neocloud” refers to cloud providers primarily offering GPU-as-a-Service (GPUaaS). Unlike hyperscalers with broad service portfolios, neoclouds focus on delivering GPU compute with high-speed interconnects for AI and HPC workloads.
According to McKinsey, between 10-15 neoclouds currently operate at meaningful scale in the US, with footprints growing across Europe, the Middle East, and Asia. The largest, CoreWeave, went public in 2025 with over 250,000 NVIDIA GPUs across 32 data centers.
The value proposition is straightforward: neoclouds price GPUs 30-85% cheaper than hyperscalers, offer faster provisioning (minutes vs weeks for quota approvals), and provide specialized infrastructure configurations with InfiniBand networking standard on GPU nodes.
GPU Pricing Comparison
All providers offer NVIDIA H100 80GB GPUs. Pricing varies significantly based on whether you’re renting individual GPUs, full nodes (typically 8 GPUs), or multi-node clusters with InfiniBand.
On-Demand GPU Pricing
| Provider | H100 | H200 | B200 | GB200 | Source |
|---|---|---|---|---|---|
| Vast.ai | $0.90-1.87/hr | Varies | Varies | — | Link |
| SF Compute | $1.43-1.77/hr | Available | — | — | Link |
| Hyperstack | $1.90-2.40/hr | $3.50/hr | Contact | Contact | Link |
| GMI Cloud | $2.10/hr | $2.50-3.50/hr | Pre-order | Pre-order | Link |
| DataCrunch/Verda | $1.99/hr | $2.59/hr | $3.79/hr | — | Link |
| Voltage Park | $1.99-2.49/hr | Contact | Contact | Contact | Link |
| Lambda | $2.29-2.99/hr | — | $2.99/hr | — | Link |
| RunPod | $1.99-2.69/hr | $3.59/hr | $5.98/hr | — | Link |
| FluidStack | $2.89/hr | Contact | Contact | Contact | Link |
| Nebius | $2.95/hr | $3.50/hr | $5.50/hr | Pre-order | Link |
| Vultr | $2.99/hr | $2.99/hr | $2.89/hr | — | Link |
| OVHcloud | $2.99/hr | — | — | — | Link |
| Crusoe | $3.90/hr | $4.29/hr | Contact | Contact | Link |
| CoreWeave | ~$6.15/hr | ~$6.30/hr | $8.60/hr | $10.50/hr | Link |
| TensorWave | — | — | — | — | N/A |
| Nscale | Contact | Contact | — | Contact | Link |
AMD GPU Availability
Only four providers currently offer AMD Instinct GPUs:
| Provider | MI300X Price | MI325X Price | MI355X Price | Source |
|---|---|---|---|---|
| TensorWave | $1.50/hr | — | $2.25/hr | Link |
| Vultr | $1.85/hr | $2.00/hr | $2.59/hr | Link |
| Crusoe | $3.45/hr | Contact | Contact | Link |
| Nscale | Contact | Contact | Contact | Link |
GPU Model Availability
GPU selection varies significantly by provider. Here’s what each offers:
NVIDIA Hopper (H100/H200/B200/GB200):
- Nebius: H100, H200, B200, GB200 (pre-order), L40S
- CoreWeave: H100, H200, B200, GB200, A100, L40S, RTX A-series
- Crusoe: H100, H200, B200, GB200, A100, L40S
- GMI Cloud: H100, H200, GB200 NVL72, HGX B200 (coming soon)
- Lambda: H100, B200, A100
- Voltage Park: H100 (H200/B200/GB200 require sales contact)
- FluidStack: H100, H200, A100, L40S (B200/GB200 require sales contact)
- RunPod: H100, H200, B200, A100, L40S, RTX 3090/4090
- Hyperstack: H100, H200, A100, L40S (B200/GB200 require sales contact)
- DataCrunch/Verda: H100, H200, B200, A100, L40S
- Vultr: H100, H200, B200, A100, L40S
- OVHcloud: H100, A100, L40S
- Nscale: H100, H200, GB200 (contact), A100
- Vast.ai: H100, H200, B200, A100, L40S (plus full range of consumer GPUs)
- SF Compute: H100, H200
AMD Instinct:
- TensorWave: MI300X, MI355X (AMD-only, no NVIDIA)
- Vultr: MI300X, MI325X, MI355X
- Crusoe: MI300X (MI325X/MI355X require sales contact)
- RunPod: MI300X
- Nscale: MI300X (contact for pricing)
- Vast.ai: MI300X
Infrastructure Ownership Models
Understanding whether a provider owns their infrastructure or aggregates from others matters for reliability, support, and pricing stability.
Ownership Model Comparison
| Provider | Model | Description | Source |
|---|---|---|---|
| Crusoe | Owner | Vertically integrated; manufactures own modular DCs via Easter-Owens Electric acquisition | Link |
| OVHcloud | Owner | Fully vertically integrated; designs/manufactures servers, builds/manages own DCs | Link |
| GMI Cloud | Owner | Full-stack ownership; offshoot of Realtek/GMI Tech with Taiwan supply chain advantage | Link |
| Nebius | Owner + Colo | Owns DCs in Finland and NJ (300 MW); colocation in Kansas City, Iceland, Paris | Link |
| CoreWeave | Owner | Acquired Core Scientific ($9B, 1.3 GW) and NEST DC ($322M); 250K+ GPUs across 32 DCs | Link |
| Nscale | Owner | Owns 60MW Glomfjord DC; JV with Aker for 230MW Stargate Norway facility | Link |
| FluidStack | Owner + Aggregator | 62% owned infrastructure, 38% marketplace; $10B GPU asset financing via Macquarie | Link |
| Lambda | Owner (colo) | Owns GPU hardware; colocation in SF and Texas; Nvidia leases back GPUs ($1.5B deal) | Link |
| Voltage Park | Owner (colo) | Owns 24K H100s ($500M) across 6 Tier 3 DCs in TX, VA, WA | Link |
| Hyperstack | Owner (colo) | Owns 13K GPUs; long-term agreements with hyperscalers and renewable energy DCs | Link |
| DataCrunch/Verda | Owner (colo) | Owns GPUs in 4 Nordic colos (3x Helsinki, 1x Iceland); building own DCs in 2025 | Link |
| Vultr | Owner (colo) | Owns hardware across 32 global colocation facilities (Sabey, Singtel partnerships) | Link |
| TensorWave | Owner (colo) | Owns 8K AMD GPUs; leases 1 GW capacity across TECfusions portfolio (AZ, VA, PA) | Link |
| RunPod | Owner + Aggregator | Secure Cloud (Tier 3/4 partners) + Community Cloud (aggregated third-party hosts) | Link |
| Vast.ai | Aggregator | Pure marketplace connecting 10K+ GPUs from individuals to datacenters | Link |
| SF Compute | Aggregator | Two-sided marketplace (“Airbnb for GPUs”); manages $100M+ hardware, ~10% fee | Link |
What this means for you:
- Owner: Full control over hardware and facilities; consistent performance but finite capacity
- Owner (colo): Owns GPUs/servers but rents data center space; good control with geographic flexibility
- Owner + Colo: Mix of owned and colocated data centers
- Owner + Aggregator: Mix of owned infrastructure and marketplace aggregation
- Aggregator: No owned infrastructure; maximum price competition but variable quality
InfiniBand and High-Speed Networking
For multi-node distributed training, network bandwidth between GPUs is critical. InfiniBand provides lower latency and higher bandwidth than Ethernet, with RDMA (Remote Direct Memory Access) enabling GPU-to-GPU communication without CPU involvement.
InfiniBand Availability
| Provider | InfiniBand | Speed (per GPU) | Availability | Topology | Source |
|---|---|---|---|---|---|
| Nebius | Yes | 400Gb/s (Quantum-2) | All GPU nodes | Rail-optimized | Link |
| CoreWeave | Yes | 400Gb/s (Quantum-2) | H100/H200 clusters | Non-blocking fat-tree | Link |
| Crusoe | Yes | 400Gb/s | All GPU nodes | Partition-based isolation | Link |
| DataCrunch/Verda | Yes | 400Gb/s (NDR) | Instant clusters | Rail-optimized | Link |
| GMI Cloud | Yes | 400Gb/s (NDR) | All GPU nodes | Not documented | Link |
| Voltage Park | Yes | 400Gb/s (Quantum-2) | IB tier ($2.49/hr) | Rail-optimized | Link |
| FluidStack | Yes | 400Gb/s | Clusters | Not documented | Link |
| Vultr | Yes | 400Gb/s (Quantum-2) | H100/H200 clusters | Non-blocking | Link |
| Lambda | Clusters only | 400Gb/s (Quantum-2) | 1-Click Clusters | Rail-optimized | Link |
| RunPod | Clusters only | 200-400Gb/s | Instant Clusters | Not documented | Link |
| Hyperstack | Supercloud only | 400Gb/s | H100/H200 SXM | Quantum-2 | Link |
| Vast.ai | By request | Not specified | Custom clusters | Varies by host | Link |
| OVHcloud | Custom only | Not documented | H100 SXM (sales) | Not documented | Link |
| TensorWave | RoCE only | 400Gb Ethernet | All nodes | Aviz ONES fabric | Link |
| Nscale | RoCE only | Not documented | All nodes | Broadcom-based | Link |
| SF Compute | Yes | 400Gb/s | K8s clusters only | Not documented | Link |
Key observations:
- 400Gb/s NDR InfiniBand is now standard (per GPU) among providers with InfiniBand. Each GPU has its own 400Gb/s NIC. No provider publicly documents 800Gb/s availability yet.
- Rail-optimized topology minimizes hops for all-reduce operations by connecting each GPU’s NIC to a different leaf switch.
- TensorWave and Nscale use RoCE (RDMA over Converged Ethernet) instead of InfiniBand. RoCE provides RDMA capabilities over standard Ethernet, with lower cost but potentially higher latency under congestion.
- Single-GPU instances typically don’t include InfiniBand at Lambda, RunPod, and Hyperstack. You need to provision cluster configurations.
Storage Options
Training workloads need three types of storage: block storage for OS and application data, object storage for datasets and checkpoints, and shared filesystems for multi-node data access.
Storage Comparison
| Provider | Block Storage | Object Storage | Shared FS | Technology | Source |
|---|---|---|---|---|---|
| Nebius | $0.05-0.12/GB/mo | S3 $0.015/GB/mo | $0.08/GB/mo | NFS | Link |
| CoreWeave | Yes | S3 $0.03-0.06/GB/mo | $0.07/GB/mo | VAST, WEKA, DDN | Link |
| Crusoe | $0.08/GB/mo | — | $0.07/GB/mo | Lightbits | Link |
| Lambda | — | S3 adapter only | $0.20/GB/mo | VAST Data | Link |
| Voltage Park | Local NVMe | VAST S3 | VAST NFS | VAST Data | Link |
| DataCrunch/Verda | $0.05-0.20/GB/mo | Coming soon | $0.20/GB/mo | NVMe SFS | Link |
| GMI Cloud | Integrated | VAST S3 | VAST NFS | VAST Data, GPUDirect | Link |
| RunPod | $0.10/GB/mo | S3 (5 DCs) | $0.05-0.07/GB/mo | Network volumes | Link |
| Vultr | $0.10/GB/mo | S3 $0.018-0.10/GB/mo | $0.10/GB/mo | NVMe-backed | Link |
| OVHcloud | $0.022/GB/mo | S3 + egress | $120-150/TB/mo | NetApp | Link |
| Hyperstack | ~$0.07/GB/mo | In development | WEKA (Supercloud) | NVMe | Link |
| FluidStack | Filesystem only | — | — | Not documented | Link |
| Vast.ai | Per-host | — | — | Varies | Link |
| TensorWave | Local only | — | WEKA (custom) | Not documented | Link |
| Nscale | Not documented | Not documented | “Parallel FS” | Not documented | Link |
| SF Compute | Local NVMe only | — | — | 1.5TB+ per node | Link |
Key observations:
- VAST Data is popular: Lambda, Voltage Park, and CoreWeave all use VAST for high-performance shared storage.
- Object storage gaps: Crusoe, FluidStack, Vast.ai, TensorWave, and Nscale don’t offer native S3-compatible object storage.
- Shared filesystem is critical for multi-node training: Without it, you need to copy data to each node’s local storage or stream from object storage.
- OVHcloud’s shared storage is expensive: NetApp-based Enterprise File Storage at $120-150/TB/month is 10-20x the cost of other providers' offerings.
Storage Performance
Most providers don’t publish detailed storage performance specs. Where documented:
| Provider | Shared FS Throughput | Notes | Source |
|---|---|---|---|
| Nebius | 12 GBps read, 8 GBps write per 8-GPU VM | Significantly faster than AWS EFS | Link |
| Lambda | ~11 GB/s per mount (VAST) | With nconnect=32 and 100Gb NIC | Link |
| DataCrunch/Verda | 2000 MB/s continuous (NVMe SFS) | Per volume | Link |
For comparison, AWS EFS maxes out at 1.5 GBps, Azure Premium Files at 10 GBps, and GCP Filestore at 25 GBps.
Egress Pricing
Data transfer costs can significantly impact total cost of ownership, especially for workloads that move large datasets or serve inference traffic.
Egress Comparison
| Provider | Egress Cost | Notes | Source |
|---|---|---|---|
| Nebius | Free | Explicitly zero egress | Link |
| CoreWeave | Free (object storage) | Object storage via LOTA | Link |
| Crusoe | Free | Zero data transfer fees | Link |
| Lambda | Free | Zero egress | Link |
| Voltage Park | Free | No hidden costs | Link |
| FluidStack | Free | Zero egress/ingress | Link |
| RunPod | Free | Zero data transfer | Link |
| Hyperstack | Free | Zero bandwidth charges | Link |
| Vultr | $0.01/GB | After 2TB/mo free | Link |
| OVHcloud | $0.011/GB | Object storage only, compute egress free | Link |
| Vast.ai | Varies | Per-host, can be $20+/TB | Link |
| DataCrunch/Verda | Not documented | Link | |
| GMI Cloud | Not documented | Link | |
| TensorWave | Not documented | Claims “no hidden costs” | Link |
| Nscale | Not documented | Link | |
| SF Compute | Free | No ingress/egress fees | Link |
Free egress is now standard among GPU neoclouds. This is a significant differentiator from hyperscalers, where egress costs can add 20-40% to monthly bills for data-intensive workloads.
Kubernetes and Orchestration
Most production AI workloads run on Kubernetes. Support varies from fully managed Kubernetes to bring-your-own orchestration.
Kubernetes Support
| Provider | Managed K8s | Slurm | Autoscaling | Notes | Source |
|---|---|---|---|---|---|
| Nebius | Yes | Managed + Soperator | Yes | First Slurm Kubernetes operator | Link |
| CoreWeave | Yes (CKS) | SUNK | Yes | Bare-metal K8s, no hypervisor | Link |
| Crusoe | Yes (CMK) | Yes | Yes | Run:ai integration | Link |
| Lambda | — | Available | — | Focus on 1-Click Clusters | Link |
| Voltage Park | Add-on | — | — | Helm/Rook-Ceph guides | Link |
| DataCrunch/Verda | — | Pre-installed | — | Slurm on clusters | Link |
| GMI Cloud | Yes (Cluster Engine) | — | Yes | K8s-based orchestration | Link |
| RunPod | — | — | Yes | Serverless focus | Link |
| Vultr | Yes (VKE) | — | Yes | Standard managed K8s | Link |
| OVHcloud | Yes | — | Yes | Standard managed K8s | Link |
| Hyperstack | — | — | — | VMs only | Link |
| FluidStack | — | — | — | Atlas platform | Link |
| Vast.ai | — | — | — | Container-based | Link |
| TensorWave | — | Yes | — | Pyxis/Enroot containers | Link |
| Nscale | Yes (NKS) | Yes | — | Limited docs | Link |
| SF Compute | Yes | — | — | Managed K8s per zone | Link |
Key observations:
- Nebius and CoreWeave have the most mature Kubernetes offerings with GPU-optimized features like pre-installed drivers and topology-aware scheduling.
- Slurm remains popular for HPC-style workloads. Nebius’s Soperator is notable as the first open-source Kubernetes operator for running Slurm clusters.
- Serverless/container platforms (RunPod, Vast.ai, FluidStack) trade Kubernetes flexibility for simpler deployment models.
Platform Maturity
Beyond raw GPU access, production workloads need automation, observability, and enterprise features. MLOps capabilities vary significantly across providers.
Terraform and API Support
| Provider | Terraform Provider | API | CLI | Source |
|---|---|---|---|---|
| Nebius | Official | gRPC | Yes | Link |
| CoreWeave | Official (Feb 2025) | Yes | Yes | Link |
| Crusoe | Official | REST | Yes | Link |
| Lambda | — | Yes | Yes | Link |
| Vultr | Official | REST | Yes | Link |
| OVHcloud | Official | REST | Yes | Link |
| Hyperstack | — | Infrahub API | — | Link |
| DataCrunch/Verda | — | Yes | — | Link |
| GMI Cloud | — | REST | Yes | Link |
| RunPod | — | GraphQL | Yes | Link |
| Voltage Park | — | — | — | Link |
| FluidStack | — | Atlas API | — | Link |
| Vast.ai | — | Yes | Yes | Link |
| TensorWave | — | — | — | Link |
| Nscale | — | Yes | — | Link |
| SF Compute | — | — | Yes | Link |
Self-Service Access
| Provider | Access Model | Notes | Source |
|---|---|---|---|
| Nebius | Self-service | Sign up, add $25+, deploy up to 32 GPUs immediately | Link |
| Lambda | Self-service | Create account and launch GPUs in minutes, pay-as-you-go | Link |
| Hyperstack | Self-service | Instant access, one-click deployment | Link |
| DataCrunch/Verda | Self-service | Order GPU instances in minutes via dashboard or API | Link |
| GMI Cloud | Self-service | Sign up, launch instances in 5-15 minutes via console/API | Link |
| Vultr | Self-service | Free account signup, provision via portal/API/CLI | Link |
| OVHcloud | Self-service | Create account, $200 free credit for first project | Link |
| FluidStack | Self-service | Sign up at auth.fluidstack.io, launch in under 5 minutes | Link |
| RunPod | Self-service | Deploy GPUs in under a minute, no rate limits | Link |
| Vast.ai | Self-service | $5 minimum to start, per-second billing | Link |
| Crusoe | Self-service | Sign up via console, larger deployments contact sales | Link |
| Voltage Park | Self-service | On-demand GPUs available, reserved capacity contact sales | Link |
| Nscale | Hybrid | Self-service for inference only; training clusters require sales | Link |
| SF Compute | Self-service | Sign up to buy, larger deployments contact sales | Link |
| CoreWeave | Sales-gated | Requires organizational approval from sales team | Link |
| TensorWave | Sales-gated | Contact sales/solutions engineers to get started | Link |
Compliance and Enterprise Features
| Provider | SOC 2 | SSO/SAML | Regions | Source |
|---|---|---|---|---|
| Nebius | Type II + HIPAA | Microsoft Entra ID | US, EU (Finland, France, Iceland, UK) | Link |
| CoreWeave | Aligned | SAML/OIDC | US, UK, Spain, Sweden, Norway | Link |
| Crusoe | Type II | Yes | US (TX, VA), Iceland, Norway (soon) | Link |
| Lambda | — | — | US | Link |
| Vultr | — | — | 32 global locations | Link |
| OVHcloud | — | Yes | Global | Link |
| DataCrunch/Verda | ISO 27001 | — | EU (Finland, Iceland) | Link |
| GMI Cloud | SOC 2 Type 1, ISO 27001 | — | Taiwan, Thailand, Malaysia, US (CA) | Link |
| RunPod | Type II | — | Multiple | Link |
| Voltage Park | — | — | US (WA, TX, VA, UT) | Link |
| Hyperstack | — | — | US, Canada, EU | Link |
| FluidStack | — | — | US, EU | Link |
| Vast.ai | — | — | Varies by host | Link |
| TensorWave | — | — | US | Link |
| Nscale | — | — | Norway | Link |
| SF Compute | — | — | US | Link |
Choosing a Provider
For Multi-Node Training at Scale
Best options: Nebius, CoreWeave, Crusoe
These providers offer:
- InfiniBand on all GPU nodes (not just clusters)
- Managed Kubernetes with GPU-optimized features
- High-performance shared storage
- Free egress
- Terraform providers for infrastructure-as-code
CoreWeave has the largest scale and was first to market with Blackwell (GB200). Nebius offers the most complete managed service stack (K8s, Slurm, PostgreSQL, MLflow). Crusoe is the only option if you need AMD GPUs with enterprise features.
For Cost-Optimized Experimentation
Best options: Vast.ai, SF Compute, DataCrunch/Verda
These providers offer:
- Lowest per-GPU pricing (Vast.ai $0.90-1.87/hr, SF Compute $1.43-1.77/hr, DataCrunch $1.99/hr)
- Per-second billing (Vast.ai, DataCrunch)
- Quick deployment
- Community/spot options for non-production workloads
Trade-offs: Vast.ai and SF Compute are marketplace aggregators with variable infrastructure quality. Less documentation and fewer managed services than enterprise-focused providers.
For European Data Sovereignty
Best options: Nebius, DataCrunch/Verda, Nscale, OVHcloud
All operate exclusively or primarily in EU data centers with:
- GDPR compliance
- SOC 2 Type II + HIPAA (Nebius), ISO 27001 (DataCrunch)
- 100% renewable energy (DataCrunch, Nscale)
- EU-based data centers (Nebius in Finland, DataCrunch in Finland/Iceland, Nscale in Norway, OVHcloud in France)
For AMD GPUs
Best options: TensorWave, Crusoe
TensorWave is AMD-focused with MI300X at $1.50/hr (the cheapest option). Crusoe offers MI300X with enterprise features (SOC 2, managed K8s) at $3.45/hr.
What’s Missing from Public Documentation
Across all providers, several categories consistently require sales contact:
- Reserved/committed pricing: All providers advertise discounts (30-60%) but don’t publish specific tiers
- Detailed SLAs: Most mention “99.9% uptime” without specifics on what’s covered
- Network topology: Rail configurations, oversubscription ratios, switch specifications
- Maximum cluster sizes: How many GPUs can you provision in a single cluster?
- Compliance details: Which specific certifications and what’s the audit scope?
Conclusion
The GPU neocloud market has matured significantly. Free egress is now standard, 400Gb/s InfiniBand is table stakes for serious providers, and pricing has compressed to $2-4/hr for H100s (vs $6-12/hr on hyperscalers).
For production AI training, Nebius, CoreWeave, and Crusoe offer the most complete platforms. For cost-sensitive experimentation, Vast.ai, SF Compute, and DataCrunch provide the lowest prices. For European data sovereignty, Nebius and DataCrunch combine EU data centers with enterprise compliance (SOC 2/ISO 27001) and competitive pricing.
The main remaining gap is documentation transparency. Most providers require sales conversations for pricing on reserved capacity, large clusters, and enterprise features. As the market matures, expect more self-service options and published pricing for these categories.
Last updated: December 2025. Pricing and features change frequently. Verify current offerings on provider websites before making decisions.
Saturn Cloud provides customizable, ready-to-use cloud environments
for collaborative data teams.
Try Saturn Cloud and join thousands of users moving to the cloud without having to switch tools.