GPU Cloud Comparison Report: Neoclouds for AI Infrastructure
Executive Summary
This report analyzes 17 GPU cloud providers (neoclouds) across the dimensions that determine production readiness for AI workloads: pricing, networking infrastructure, storage performance, orchestration capabilities, and enterprise compliance.
Key Findings
- Price advantage over hyperscalers: Neoclouds offer H100 instances at $1.45-6.15/hr compared to ~$12/hr on AWS/GCP/Azure (70-88% savings)
- Immediate availability: Self-service provisioning in minutes vs multi-week quota approval processes on hyperscalers
- InfiniBand standard: 400Gb/s per GPU networking is baseline for serious providers; hyperscalers charge premium tiers for equivalent
- Free egress: 14 of 17 providers offer zero egress fees, unlike hyperscalers ($0.08-0.12/GB)
- Enterprise maturity: Top-tier providers (Nebius, CoreWeave, Crusoe) offer SOC 2 Type II, managed Kubernetes, and Slurm
Report Structure:
This report provides comparative analysis tables across all providers, followed by detailed profiles assessing each provider’s infrastructure, strengths, gaps, and optimal use cases. Recommendations by workload type are provided in the “Choosing a Provider” section.
The Neocloud Landscape
The term “neocloud” refers to cloud providers primarily offering GPU-as-a-Service. Unlike hyperscalers with broad service portfolios, neoclouds focus on delivering GPU compute with high-speed interconnects for AI and HPC workloads.
Between 10-15 neoclouds currently operate at meaningful scale in the US, with footprints growing across Europe, the Middle East, and Asia.
Why consider neoclouds over AWS, GCP, or Azure?
The hyperscaler GPU experience involves quota requests, waitlists, and premium pricing:
| Provider | H100 80GB | Availability |
|---|---|---|
| AWS | $6.88/hr (p5.48xlarge) | Quota approval required, multi-week waitlists common |
| Azure | $12.29/hr (ND96isr H100 v5) | Quota requests, capacity constraints |
| GCP | $11.06/hr (a3-highgpu-8g) | Limited regions, quota approval process |
| SF Compute | $1.45-1.50/hr | Self-service signup, provision in minutes |
Even AWS, after cutting prices 44% in June 2025, is still 4.7x more expensive than SF Compute. GCP and Azure are 7-8x more expensive. Beyond price, neoclouds offer self-service provisioning in minutes vs multi-week quota approval processes.
Beyond cost, neoclouds eliminate the friction of quota approvals. On AWS, requesting H100 quota often requires a support ticket explaining your use case, with approval taking days to weeks. GCP and Azure have similar processes. Neoclouds typically offer self-service access: sign up, add payment, deploy GPUs in minutes.
Infrastructure is also optimized differently. Neoclouds treat InfiniBand as standard—400Gb/s per GPU for multi-node training. Hyperscalers charge premium tiers for similar networking (AWS EFA, GCP GPUDirect) and availability varies by region.
Market Segmentation
The GPU cloud market has fragmented into distinct tiers with different business models, capital structures, and target customers. Understanding these tiers helps match your needs to the right provider type.
| Tier | Description | Capital Structure | Best For |
|---|---|---|---|
| Bespoke Wholesale | Multi-year buildouts for frontier labs | Project finance / debt | 100MW+ deployments, $1B+ contracts |
| Sales-Gated Cloud | Standardized infrastructure, approval required | Venture / debt | Large enterprises, consistent workloads |
| Self-Service Neoclouds | On-demand, transparent pricing | Venture / debt | Most AI teams, flexible scaling |
| Marketplaces | Aggregated supply, variable quality | Marketplace fees | Cost optimization, fault-tolerant workloads |
Tier 1: Bespoke Wholesale
Massive, single-tenant infrastructure projects with multi-year, multi-billion dollar contracts. Providers secure 100MW+ of power and build custom data centers to customer specifications.
Known examples: FluidStack (enterprise), Crusoe (Stargate), CoreWeave (enterprise). Most providers will entertain bespoke deals for sufficiently large customers; these arrangements are rarely disclosed publicly.
Characteristics:
- Physical site control and custom network topologies
- Take-or-pay agreements with guaranteed capacity
- Not publicly documented or priced
Tier 2: Sales-Gated Cloud
Standardized multi-tenant infrastructure that requires sales approval to access. Once approved, you use the provider’s standard cloud platform.
Providers: CoreWeave, TensorWave, Nscale (training clusters)
Characteristics:
- Enterprise-grade infrastructure with SLAs
- Requires organizational approval process
- Volume commitments often required
Tier 3: Self-Service Neoclouds
Dedicated GPU clouds with transparent, on-demand pricing via web console. The core of this report.
Providers: Nebius, Lambda, Crusoe, Vultr, Hyperstack, DataCrunch/Verda, RunPod (Secure Cloud), OVHcloud, Voltage Park, GMI Cloud, Hot Aisle, SF Compute
Characteristics:
- Sign up and deploy in minutes
- Published pricing, no sales conversation required
- Bare metal or containerized access
Tier 4: Marketplaces & Aggregators
Software platforms pooling supply from third-party hardware owners. High variability in quality and uptime, but lowest prices.
Providers: Vast.ai, RunPod (Community Cloud), FluidStack (marketplace)
Characteristics:
- Bidding or spot pricing
- Variable hardware quality and reliability
- Ideal for fault-tolerant batch processing
Note: Many providers span multiple tiers. FluidStack operates both wholesale (62% of revenue) and marketplace (38%). RunPod offers both Secure Cloud (Tier 3) and Community Cloud (Tier 4). Crusoe runs self-service cloud while building Stargate for OpenAI. The bespoke wholesale tier is likely larger than publicly known, as these deals are rarely disclosed.
GPU Hardware & Pricing
Your first decision: which GPUs do you need, and what will they cost? This section covers on-demand pricing for NVIDIA and AMD GPUs. Most providers offer reserved capacity discounts of 30-60%, but those require sales conversations.
On-Demand GPU Pricing
| Provider | H100 | H200 | B200 | GB200 | Source |
|---|---|---|---|---|---|
| CoreWeave | PCIe $4.25 / SXM $6.16 | ~$6.30/hr | $8.60/hr | $10.50/hr | Link |
| Crusoe | $3.90/hr | $4.29/hr | Contact | Contact | Link |
| DataCrunch/Verda | $2.29/hr | $2.99/hr | $3.79/hr | — | Link |
| FluidStack | $2.10/hr | $2.30/hr | Contact | Contact | Link |
| GMI Cloud | $2.10/hr | From $2.50/hr | Pre-order | Pre-order | Link |
| Hot Aisle | — | — | — | — | N/A |
| Hyperstack | PCIe $1.90 / SXM $2.40 | $3.50/hr | Contact | Contact | Link |
| Lambda | PCIe $2.49 / SXM $2.99 | — | $4.99/hr | — | Link |
| Nebius | $2.95/hr | $3.50/hr | $5.50/hr | Pre-order | Link |
| Nscale | Contact | Contact | — | Contact | Link |
| OVHcloud | $2.99/hr | — | — | — | Link |
| RunPod | PCIe $1.99-2.39 / SXM $2.69-2.99 | $3.59/hr | $5.19-5.98/hr | — | Link |
| SF Compute | $1.45-1.50/hr | Contact | — | — | Link |
| TensorWave | — | — | — | — | N/A |
| Vast.ai | $1.49-1.87/hr | Varies | Varies | — | Link |
| Voltage Park | From $1.99/hr | Contact | Contact | Contact | Link |
| Vultr | $2.99/hr | Contact | Contact | — | Link |
*Vultr’s B200 is available via 36-month reserved commitment at $2.89/hr ($23.12/hr for 8x B200 HGX system); on-demand pricing not published.
Pricing varies significantly based on whether you’re renting individual GPUs, full nodes (typically 8 GPUs), or multi-node clusters with InfiniBand. Reserved capacity discounts of 30-60% are available from most providers but require sales conversations.
AMD GPU Availability
| Provider | MI300X | MI325X | MI355X | Source |
|---|---|---|---|---|
| Crusoe | $3.45/hr | — | Contact | Link |
| Hot Aisle | $1.99/hr | — | Pre-order | Link |
| Nscale | Pre-order | — | — | Link |
| TensorWave | Sold out | $1.95/hr | $2.85/hr | Link |
| Vultr | $1.85/hr | $2.00/hr | $2.59/hr (on-demand) / $2.29/hr (36-month) | Link |
AMD adoption is growing. Vultr offers one of the cheapest MI300X options at $1.85/hr. Hot Aisle and TensorWave are AMD-only providers.
Training Infrastructure
For multi-node training, your infrastructure determines actual performance regardless of GPU specs. Network bandwidth between GPUs and shared storage throughput are the critical factors. Without proper networking and storage, even the fastest GPUs will sit idle waiting for data.
InfiniBand and High-Speed Networking
For multi-node distributed training, network bandwidth between GPUs is critical. InfiniBand provides lower latency and higher bandwidth than Ethernet, with RDMA enabling GPU-to-GPU communication without CPU involvement.
Note: This table describes publicly available cloud offerings. Bespoke wholesale buildouts (Tier 1) can support arbitrary network configurations. “Not documented” indicates information not publicly available.
| Provider | InfiniBand | Speed (per GPU) | Availability | Topology | Source |
|---|---|---|---|---|---|
| CoreWeave | Yes | 400Gb/s (Quantum-2) | H100/H200 clusters | Non-blocking fat-tree (rail-optimized) | Link |
| Crusoe | Yes | 400Gb/s | H100/H200 instances | Rail-optimized | Link |
| DataCrunch/Verda | Yes | 400Gb/s (NDR) | Instant clusters | Rail-optimized | Link |
| FluidStack | Yes | 400Gb/s | Dedicated clusters | Not documented | Link |
| GMI Cloud | Yes | 400Gb/s | H100/H200 clusters | Not documented | Link |
| Hot Aisle | RoCE only | 400Gb Ethernet | All nodes | Dell/Broadcom | Link |
| Hyperstack | Supercloud only | 400Gb/s (Quantum-2) | H100/H200 SXM | Not documented | Link |
| Lambda | Clusters only | 400Gb/s (Quantum-2) | 1-Click Clusters | Rail-optimized | Link |
| Nebius | Yes | 400Gb/s (Quantum-2) | All GPU nodes | Not documented | Link |
| Nscale | RoCE only | 400Gb Ethernet | All nodes | Nokia 7220 IXR | Link |
| OVHcloud | No | 25Gb Ethernet (Public) / 50-100Gb (Bare Metal) | Public Cloud GPU / Bare Metal | vRack OLA | Link |
| RunPod | Clusters only | 200-400Gb/s | Instant Clusters | Not documented | Link |
| SF Compute | K8s only | 400Gb/s | K8s clusters only | Not documented | Link |
| TensorWave | RoCE only | 400Gb Ethernet | All nodes | Aviz ONES fabric | Link |
| Vast.ai | No | Varies by host | Marketplace | Varies by host | Link |
| Voltage Park | Yes | 400Gb/s | IB tier ($2.49/hr) | Not documented | Link |
| Vultr | Yes | 400Gb/s (Quantum-2) | H100/H200 clusters | Non-blocking | Link |
Key observations:
- 400Gb/s NDR InfiniBand is now standard among providers with InfiniBand. CoreWeave, Crusoe, DataCrunch/Verda, FluidStack, GMI Cloud, Hyperstack (Supercloud), Lambda, Nebius, and Vultr all offer 400Gb/s per GPU.
- Rail-optimized topology minimizes hops for all-reduce operations. Verified at CoreWeave, Crusoe, DataCrunch/Verda, and Lambda. Each GPU’s NIC connects to a different leaf switch.
- RoCE (RDMA over Converged Ethernet) is an alternative to InfiniBand used by TensorWave, Hot Aisle, and Nscale. RoCE provides RDMA capabilities over standard Ethernet with lower cost but potentially higher tail latency under network congestion.
- Single-GPU instances typically don’t include InfiniBand at Lambda, RunPod, and Hyperstack. You need to provision cluster configurations.
Storage Options
Training workloads need three types of storage: block storage for OS and application data, object storage for datasets and checkpoints, and shared filesystems for multi-node data access. This table describes publicly available offerings; bespoke buildouts (Tier 1) integrate customer-specified storage (VAST, WEKA, DDN, etc.).
| Provider | Block Storage | Object Storage | Shared FS | Technology | Source |
|---|---|---|---|---|---|
| CoreWeave | Yes | S3 Hot $0.06 / Warm $0.03 / Cold $0.015 | $0.07/GB/mo | VAST, WEKA, DDN | Link |
| Crusoe | $0.08/GB/mo | — | $0.07/GB/mo | Lightbits | Link |
| DataCrunch/Verda | $0.05-0.20/GB/mo | Coming soon | $0.20/GB/mo | NVMe SFS | Link |
| FluidStack | Filesystem only | — | Not documented | Not documented | Link |
| GMI Cloud | Integrated | VAST S3 | VAST NFS | VAST Data, GPUDirect | Link |
| Hot Aisle | Not documented | — | — | Not documented | Link |
| Hyperstack | ~$0.07/GB/mo | In development | WEKA (Supercloud) | NVMe | Link |
| Lambda | — | S3 adapter only | $0.20/GB/mo | VAST Data | Link |
| Nebius | $0.05-0.12/GB/mo | S3 Standard $0.0147 / Enhanced $0.11 | $0.08/GB/mo | NFS | Link |
| Nscale | Not documented | Not documented | “Parallel FS” | Not documented | Link |
| OVHcloud | $0.022/GB/mo | S3 + egress | $120-150/TB/mo | NetApp | Link |
| RunPod | $0.10/GB/mo | S3 (5 DCs) | $0.05-0.07/GB/mo | Network volumes | Link |
| SF Compute | Local NVMe only | — | — | 1.5TB+ per node | Link |
| TensorWave | Local only | — | Not documented | Not documented | Link |
| Vast.ai | Per-host | — | — | Varies | Link |
| Voltage Park | Local NVMe | VAST S3 | VAST NFS | VAST Data | Link |
| Vultr | $0.10/GB/mo | S3 $0.018-0.10/GB/mo | $0.10/GB/mo | NVMe-backed | Link |
Key observations:
- VAST Data is popular: Lambda, Voltage Park, CoreWeave, and GMI Cloud all use VAST for high-performance shared storage.
- Object storage gaps: Crusoe, FluidStack, Vast.ai, TensorWave, Hot Aisle, and Nscale don’t offer native S3-compatible object storage.
- Shared filesystem is critical for multi-node training: Without it, you need to copy data to each node’s local storage or stream from object storage.
- OVHcloud’s shared storage is expensive: NetApp-based Enterprise File Storage at $120-150/TB/month is 10-20x the cost of other providers' offerings.
Orchestration & Platform
How you’ll actually run workloads matters as much as the hardware. But there’s an important distinction: infrastructure orchestration (Kubernetes, Slurm) vs. the platform layer.
Neoclouds provide Kubernetes or Slurm to schedule containers or jobs on GPU nodes. That’s infrastructure orchestration—it gets your code running on hardware. But production AI teams need more: hosted dev environments where data scientists can iterate, distributed training orchestration that handles multi-node configurations, parallel job scheduling with automatic retries, and cost allocation by user and project.
Most neoclouds stop at infrastructure. The platform layer—the operational tooling that makes GPU infrastructure actually usable for teams—is what you build on top, or what Saturn Cloud provides out of the box.
Kubernetes and Orchestration
| Provider | Managed K8s | Slurm | Autoscaling | Notes | Source |
|---|---|---|---|---|---|
| CoreWeave | Yes (CKS) | SUNK | Yes | Bare-metal K8s, no hypervisor | Link |
| Crusoe | Yes (CMK) | Yes | Yes | Run:ai integration | Link |
| DataCrunch/Verda | — | Yes | — | Slurm on clusters | Link |
| FluidStack | — | — | — | Atlas platform | Link |
| GMI Cloud | Yes (Cluster Engine) | — | Yes | K8s-based orchestration | Link |
| Hot Aisle | — | — | — | Bare-metal focus | Link |
| Hyperstack | Yes (On-Demand K8s) | Not documented | — | API-driven K8s clusters | Link |
| Lambda | Yes (1-Click Clusters) | Available | — | Managed K8s and Slurm | Link |
| Nebius | Yes | Managed + Soperator | Yes | First Slurm Kubernetes operator | Link |
| Nscale | Yes (NKS) | Yes | — | Limited docs | Link |
| OVHcloud | Yes | — | Yes | Standard managed K8s | Link |
| RunPod | — | — | Yes | Serverless focus | Link |
| SF Compute | Yes | — | — | Managed K8s per zone | Link |
| TensorWave | — | Yes | — | Pyxis/Enroot containers | Link |
| Vast.ai | — | — | — | Container-based | Link |
| Voltage Park | Add-on | — | — | Helm/Rook-Ceph guides | Link |
| Vultr | Yes (VKE) | — | Yes | Standard managed K8s | Link |
Key observations:
- Nebius and CoreWeave have the most mature Kubernetes offerings with GPU-optimized features like pre-installed drivers and topology-aware scheduling.
- Slurm remains popular for HPC-style workloads. Nebius’s Soperator is notable as the first open-source Kubernetes operator for running Slurm clusters. CoreWeave’s SUNK supports 32,000+ GPU jobs.
- Serverless/container platforms (RunPod, Vast.ai, FluidStack) trade Kubernetes flexibility for simpler deployment models.
The Platform Layer: Saturn Cloud
The Infrastructure vs. Platform Gap
Neoclouds provide GPU infrastructure (compute, networking, storage, Kubernetes). The platform layer—operational tooling that makes this infrastructure production-ready for AI teams—is not included and must be built or purchased separately.
Common Platform-Layer Requirements
AI organizations deploying on neocloud infrastructure typically implement the following capabilities in-house:
| Capability | Purpose | Typical Build Time |
|---|---|---|
| Hosted Development Environments | JupyterLab/VS Code instances with GPU access for data scientist iteration | 2-3 months |
| Distributed Training Orchestration | Automated multi-node configuration (torchrun, DeepSpeed, InfiniBand, NCCL) | 3-4 months |
| Job Scheduling & Failure Handling | Parallel execution of thousands of experiments with automatic retries | 2-3 months |
| Cost Allocation & Tracking | GPU usage tracking by user/team/project for chargebacks | 1-2 months |
| Idle Resource Detection | Automated shutdown of unused instances to prevent waste | 1-2 months |
Total in-house development: 6-12 months of infrastructure engineering effort. This work is operationally necessary but provides no competitive differentiation—all AI organizations require similar implementations.
Saturn Cloud Platform Capabilities
Saturn Cloud provides platform-layer functionality as a managed service deployable on any Kubernetes cluster (Nebius, Crusoe, CoreWeave, or bare-metal infrastructure).
Core Platform Features:
- Hosted JupyterLab and VS Code environments with one-click GPU provisioning
- Single-click conversion of single-node jobs to multi-node distributed training with automated torchrun/InfiniBand configuration
- Parallel job execution (100s concurrent) with dependency management and automatic retry logic
- Real-time GPU usage dashboards with granular tracking by user, team, and project
- Configurable idle shutdown policies with customizable timeout thresholds
- Enterprise SSO integration (SAML/OIDC) with role-based access control
Deployment Model:
Saturn Cloud deploys via Helm chart to existing Kubernetes clusters. All data remains within customer infrastructure—Saturn Cloud provides only the control plane and user interface layer.
Evaluation Criteria:
Organizations should consider Saturn Cloud when infrastructure teams are allocating significant engineering resources to platform tooling development, when cost allocation and idle detection are required for GPU spend management, or when immediate data scientist productivity is prioritized over custom platform development.
Operational Considerations
Hidden costs and networking capabilities often determine whether a provider works for production deployments. Egress fees can add 20-40% to monthly bills at hyperscalers, while load balancers and VPCs are baseline requirements for inference endpoints.
Egress Pricing
| Provider | Egress Cost | Notes | Source |
|---|---|---|---|
| CoreWeave | Free | Zero egress, ingress, and I/O operations | Link |
| Crusoe | Free | Zero data transfer fees | Link |
| DataCrunch/Verda | Not documented | Link | |
| FluidStack | Free | Zero egress/ingress | Link |
| GMI Cloud | Not documented | Link | |
| Hot Aisle | Not documented | Link | |
| Hyperstack | Free | Zero bandwidth charges | Link |
| Lambda | Free | Zero egress | Link |
| Nebius | Compute free | S3 Standard $0.015/GB egress; S3 Enhanced free egress | Link |
| Nscale | Not documented | Link | |
| OVHcloud | Compute free | Object Storage $0.011/GB egress | Link |
| RunPod | Free | Zero data transfer | Link |
| SF Compute | Free | No ingress/egress fees | Link |
| TensorWave | Not documented | Claims “no hidden costs” | Link |
| Vast.ai | Varies | Per-host, can be $20+/TB | Link |
| Voltage Park | Free | No hidden costs | Link |
| Vultr | $0.01/GB | 2TB/month free, then $0.01/GB | Link |
Free egress is now standard among GPU neoclouds. This is a significant differentiator from hyperscalers, where egress costs can add 20-40% to monthly bills for data-intensive workloads.
Network Services
| Provider | Load Balancer | VPC/Private Network | VPN/Peering | Public IPs | Source |
|---|---|---|---|---|---|
| CoreWeave | Yes (K8s LB) | Yes (VPC) | Direct Connect (Equinix, Megaport) | Yes + BYOIP | Link |
| Crusoe | Yes | Yes (VPC) | Yes (global backbone) | Yes | Link |
| DataCrunch/Verda | Not documented | Not documented | Not documented | Not documented | Link |
| FluidStack | Not documented | Not documented | Not documented | Not documented | Link |
| GMI Cloud | Not documented | Yes (VPC) | Not documented | Yes (Elastic IPs) | Link |
| Hot Aisle | Not documented | Not documented | Not documented | Yes | Link |
| Hyperstack | Not documented | Yes (VPC) | Not documented | Yes | Link |
| Lambda | Not documented | Yes (private network) | Not documented | Yes | Link |
| Nebius | Yes (K8s LB) | Yes | — | Yes | Link |
| Nscale | Not documented | Not documented | Not documented | Not documented | Link |
| OVHcloud | Yes (L4/L7, Octavia) | Yes (vRack) | OVHcloud Connect | Yes (Floating IPs) | Link |
| RunPod | Serverless only | Global networking (Pod-to-Pod) | — | Shared (port mapping) | Link |
| SF Compute | Not documented | Not documented | Not documented | Not documented | Link |
| TensorWave | Not documented | Not documented | Not documented | Not documented | Link |
| Vast.ai | — | — | — | Shared (port mapping) | Link |
| Voltage Park | Not documented | Yes (VPC) | Not documented | Not documented | Link |
| Vultr | Yes (L4, $10/mo) | Yes (VPC 2.0) | — | Yes | Link |
Key observations:
- Managed Direct Connect is rare: Only CoreWeave (Equinix, Megaport) and OVHcloud (OVHcloud Connect) offer managed private connectivity to on-prem or other clouds. Most providers expect you to run your own VPN gateway on a VM.
- Load balancers + VPC is the baseline for production inference: Nebius, CoreWeave, Crusoe, Vultr, and OVHcloud meet this bar.
- Marketplace providers (Vast.ai, RunPod) use port mapping instead of dedicated IPs, which complicates production inference deployments.
Developer Experience & Enterprise Readiness
How easy is it to get started, and does the platform meet enterprise requirements? Terraform providers and APIs enable infrastructure-as-code, self-service access determines time-to-first-GPU, and compliance certifications gate enterprise adoption.
Terraform and API Support
| Provider | Terraform Provider | API | CLI | Source |
|---|---|---|---|---|
| CoreWeave | Official | Yes | Yes | Link |
| Crusoe | Official | REST | Yes | Link |
| DataCrunch/Verda | — | REST | — | Link |
| FluidStack | — | REST | — | Link |
| GMI Cloud | — | REST | — | Link |
| Hot Aisle | — | REST | — | Link |
| Hyperstack | Community | REST | — | Link |
| Lambda | Community | REST | Yes | Link |
| Nebius | Official | Yes | Yes | Link |
| Nscale | Community | REST | Yes | Link |
| OVHcloud | Official | REST | Yes | Link |
| RunPod | Community | GraphQL | Yes | Link |
| SF Compute | — | Yes | Yes | Link |
| TensorWave | — | REST | — | Link |
| Vast.ai | Community | REST | Yes | Link |
| Voltage Park | — | REST | — | Link |
| Vultr | Official | REST | Yes | Link |
Self-Service Access
| Provider | Tier | Access Model | Notes | Source |
|---|---|---|---|---|
| CoreWeave | Sales-Gated | Sales-gated | Requires organizational approval from sales team | Link |
| Crusoe | Neocloud | Self-service | Sign up via console, larger deployments contact sales | Link |
| DataCrunch/Verda | Neocloud | Self-service | Order GPU instances in minutes via dashboard or API | Link |
| FluidStack | Neocloud + Marketplace | Self-service | Sign up at auth.fluidstack.io, launch in under 5 minutes | Link |
| GMI Cloud | Neocloud | Self-service | Sign up, launch instances in 5-15 minutes via console/API | Link |
| Hot Aisle | Neocloud | Self-service | SSH-based signup, credit card, no contracts | Link |
| Hyperstack | Neocloud | Self-service | Instant access, one-click deployment | Link |
| Lambda | Neocloud | Self-service | Create account and launch GPUs in minutes, pay-as-you-go | Link |
| Nebius | Neocloud | Self-service | Sign up, add $25+, deploy up to 32 GPUs immediately | Link |
| Nscale | Sales-Gated | Hybrid | Self-service for inference only; training clusters require sales | Link |
| OVHcloud | Neocloud | Self-service | Create account, $200 free credit for first project | Link |
| RunPod | Neocloud + Marketplace | Self-service | Deploy GPUs in under a minute, no rate limits | Link |
| SF Compute | Marketplace | Self-service | Sign up to buy, larger deployments contact sales | Link |
| TensorWave | Sales-Gated | Sales-gated | Contact sales/solutions engineers to get started | Link |
| Vast.ai | Marketplace | Self-service | $5 minimum to start, per-second billing | Link |
| Voltage Park | Neocloud | Self-service | On-demand GPUs available, reserved capacity contact sales | Link |
| Vultr | Neocloud | Self-service | Free account signup, provision via portal/API/CLI | Link |
Compliance and Enterprise Features
| Provider | Compliance | SSO/SAML | Regions | Source |
|---|---|---|---|---|
| CoreWeave | SOC 2, ISO 27001 | SAML/OIDC/SCIM | US, UK, Spain, Sweden, Norway | Security |
| Crusoe | SOC 2 Type II | Not documented | US (TX, VA), Iceland, Norway (soon) | Link |
| DataCrunch/Verda | ISO 27001 | — | EU (Finland, Iceland) | Link |
| FluidStack | — | — | Not documented | Link |
| GMI Cloud | SOC 2 Type 1, ISO 27001 | — | Not documented | Link |
| Hot Aisle | SOC 2 Type II, HIPAA | — | US (MI) | Link |
| Hyperstack | — | — | Europe, North America | Link |
| Lambda | SOC 2 Type II | Not documented | Not documented | Link |
| Nebius | SOC 2 Type II, HIPAA, ISO 27001 | Yes | US, EU (Finland, France, Iceland) | Regions, Trust Center |
| Nscale | — | — | Norway | Link |
| OVHcloud | SOC 2, ISO 27001, PCI DSS, HDS, SecNumCloud | Not documented | Global (46 DCs) | Infrastructure, Certifications |
| RunPod | SOC 2 Type II | — | Multiple | Link |
| SF Compute | — | — | Not documented | Link |
| TensorWave | — | — | Not documented | Link |
| Vast.ai | — | — | Varies by host | Link |
| Voltage Park | SOC 2 Type II, ISO 27001, HIPAA | — | US (WA, TX, VA, UT) | Infrastructure, Security |
| Vultr | SOC 2 (HIPAA), ISO 27001, PCI DSS | — | 32 global locations | Locations, Compliance |
Infrastructure Ownership Models
Understanding whether a provider owns their infrastructure or aggregates from others matters for reliability, support, and pricing stability. See the Market Segmentation section for how this maps to business model tiers.
| Provider | Model | Description | Source |
|---|---|---|---|
| CoreWeave | Owner | Acquired NEST DC ($322M); 250K+ GPUs across 32 DCs | Link |
| Crusoe | Owner | Vertically integrated; manufactures own modular DCs via Easter-Owens Electric acquisition | Link |
| DataCrunch/Verda | Owner (colo) | Owns GPUs; operates in Iceland and Finland | Link |
| FluidStack | Owner + Aggregator | 62% Private Cloud (custom-built for enterprises like Anthropic, Meta), 38% Marketplace; $10B debt financing from Macquarie | Link |
| GMI Cloud | Owner (colo) | Owns GPU hardware; offshoot of Realtek/GMI Technology | Link |
| Hot Aisle | Owner (colo) | Owns AMD GPUs; colocation at Switch Pyramid Tier 5 DC in Grand Rapids, MI | Link |
| Hyperstack | Owner (colo) | Owns GPU hardware; colocation partnerships | Link |
| Lambda | Owner (colo) | Owns GPU hardware; colocation in SF and Texas; NVIDIA leaseback partnership | Link |
| Nebius | Owner + Colo | Owns DCs in Finland; colocation in US and other regions | Link |
| Nscale | Owner | Owns data centers in Norway (Glomfjord, Stargate Norway JV with Aker) | Link |
| OVHcloud | Owner | Fully vertically integrated; designs/manufactures servers, builds/manages own DCs | Link |
| RunPod | Owner + Aggregator | Secure Cloud (Tier 3/4 partners) + Community Cloud (aggregated third-party hosts) | Link |
| SF Compute | Aggregator | Two-sided marketplace connecting GPU cloud providers | Link |
| TensorWave | Owner (colo) | Owns AMD GPU hardware; colocation across US data centers | Link |
| Vast.ai | Aggregator | Pure marketplace connecting 10K+ GPUs from individuals to datacenters | Link |
| Voltage Park | Owner (colo) | Owns H100 GPU hardware; colocation in Texas, Virginia, Washington | Link |
| Vultr | Colo | Operates across 32 global colocation facilities (Digital Realty, Equinix, QTS partnerships) | Link |
Choosing a Provider
Provider selection should align with workload requirements and organizational constraints. The following recommendations categorize providers by primary use case.
Production Multi-Node Training
Recommended Providers: Nebius, CoreWeave, Crusoe
Selection Criteria:
- InfiniBand networking on all GPU nodes (not cluster-only)
- Managed Kubernetes with GPU-optimized scheduling
- High-performance shared storage (VAST Data or equivalent)
- Enterprise compliance (SOC 2 Type II minimum)
- Terraform provider for infrastructure-as-code
Provider Differentiation:
- CoreWeave: Largest scale (250K+ GPUs, 32 data centers), first to market with GB200
- Nebius: Most complete managed service stack (Kubernetes, Slurm via Soperator, MLflow, PostgreSQL, Spark)
- Crusoe: Only provider offering AMD GPUs with full enterprise features (SOC 2, managed K8s, Slurm)
Cost-Optimized Workloads
Recommended Providers: SF Compute, Vast.ai
Selection Criteria:
- Lowest per-GPU pricing ($1.45-1.87/hr for H100)
- Marketplace models enabling spot pricing
- Flexible reservation windows without long-term contracts
Provider Differentiation:
- SF Compute: Managed Kubernetes marketplace with flexible time-based reservations (book 7 days, 2 weeks, or custom windows at guaranteed pricing). Infrastructure provisioned from vetted partners to your specifications.
- Vast.ai: Pure peer-to-peer marketplace with per-second billing and highly variable pricing/quality
Trade-offs:
- Variable infrastructure quality (aggregated from multiple underlying providers)
- Less comprehensive documentation than enterprise-focused providers
European Data Sovereignty
Recommended Providers: Nebius, DataCrunch/Verda
Selection Criteria:
- EU-based data center operations
- GDPR compliance
- Renewable energy infrastructure (100% hydro/geothermal)
- European regulatory certifications
Provider Differentiation:
- Nebius: SOC 2 Type II + HIPAA + ISO 27001, managed Kubernetes and Slurm, Finland/France/Iceland locations
- DataCrunch/Verda: ISO 27001, Iceland-based with genuine carbon-neutral operations (not carbon offsets)
AMD GPU Access
Recommended Providers: Vultr, Hot Aisle, TensorWave
Selection Criteria:
- AMD Instinct GPU availability (MI300X/MI325X/MI355X)
- ROCm software support
- Competitive pricing vs NVIDIA equivalents
Provider Differentiation:
- Vultr: Cheapest MI300X ($1.85/hr), managed Kubernetes available, 32 global locations
- Hot Aisle/TensorWave: AMD-exclusive specialists with SOC 2 Type II + HIPAA certifications
Provider Profiles
Each profile below covers infrastructure details, strengths, gaps, and best-fit use cases for the provider.
Nebius

Overview
Nebius spun off from Yandex N.V. in 2024 following Russia-related sanctions pressures. The company repositioned from a search conglomerate to a dedicated AI infrastructure provider, led by Yandex co-founder Arkady Volozh. In December 2024, Nebius raised $700M from NVIDIA and Accel, followed by $1B in debt financing in June 2025 for global expansion.
The company reported $105M revenue in Q2 2024, up 625% year-over-year, and targets $900M-$1.1B ARR. A major Microsoft deal worth $17-19B over 5+ years was announced in late 2025.
Infrastructure
Nebius owns data centers in Finland (Mäntsälä, ranked #19 globally for supercomputing) and is building in the US. The Kansas City facility launched Q1 2025 with 5MW base capacity scalable to 40MW (~35K GPUs). Additional sites in Paris, Iceland, and the UK are operational or under development. Target: 1GW+ power capacity by end of 2026.
Hardware: H100 ($2.95/hr), H200 ($3.50/hr), B200 ($5.50/hr), L40S, and GB200 NVL72 (pre-order). All GPU nodes include 400Gb/s Quantum-2 InfiniBand with rail-optimized topology.
Storage performance is a differentiator: 12 GBps read, 8 GBps write per 8-GPU VM on their shared filesystem.
Strengths
- Most complete managed service stack among neoclouds: Kubernetes ($0 for control plane), Slurm via Soperator, MLflow, Spark, PostgreSQL
- Soperator is the first fully-featured open-source Kubernetes operator for Slurm, enabling 20-30 minute cluster deployment
- 20% lower TCO through proprietary hardware design and energy efficiency
- Strong sustainability angle: Mäntsälä facility’s heat recovery covers 65% of local municipality heating
- Competitive pricing at $2.95/hr for H100 vs ~$12/hr on AWS
Gaps
- US presence is new (Kansas City launched Q1 2025); limited footprint compared to CoreWeave
- No Asia-Pacific data centers yet (expansion planned)
- No documented spot/preemptible instance pricing
- As a 2024 spinoff, long-term operational stability still being proven
Best for: Teams wanting a fully-managed platform (K8s + Slurm + MLflow) with competitive pricing and strong European presence.
CoreWeave

Overview
CoreWeave is the largest neocloud by GPU count. Founded in 2017 as Atlantic Crypto (Ethereum mining), the company pivoted to GPU cloud in 2019 and went public on Nasdaq (CRWV) in March 2025, raising ~$4B at a $35B valuation. The company operates 250,000 GPUs across 32 data centers.
Major customers include OpenAI ($22.4B total contract), Microsoft (62% of 2024 revenue), Mistral AI, IBM, and Databricks. 2024 revenue was $1.92B with projected $8B in 2025.
Infrastructure
CoreWeave has expanded through organic growth and strategic acquisitions, including the March 2025 acquisition of NEST Data Center ($322M) in New Jersey.
Locations span the US (New Jersey, Texas, Pennsylvania, North Dakota, Georgia, Kentucky, North Carolina, Alabama, Oklahoma), UK (Crawley, London Docklands), and planned European expansion (Norway, Sweden, Spain by end 2025).
CoreWeave was first to deploy NVIDIA Blackwell at scale: 110,000 Blackwell GPUs with Quantum-2 InfiniBand, GB200 NVL72 systems (April 2025), and Blackwell Ultra GB300 NVL72 (July 2025).
Strengths
- Largest GPU fleet and first-mover on new NVIDIA architectures
- SUNK (Slurm on Kubernetes) supports 32,000+ GPU jobs with GitOps deployment via ArgoCD
- Non-blocking fat-tree InfiniBand topology with NVIDIA SHARP (2x effective bandwidth)
- NVIDIA’s top cloud partner with exclusive hardware co-design relationship
- Published pricing with up to 60% discounts for committed usage
Gaps
- Sales-gated access: requires organizational approval, no self-service signup
- Extreme customer concentration: Microsoft was 62% of 2024 revenue, top two customers 77%
- Material weaknesses in internal controls disclosed in SEC S-1; remediation expected through 2026
- High debt load (~$14B) with $310M quarterly interest expense (6x operating profit)
- Stock fell 30% in November 2025 after guidance cut due to data center construction delays
- Documentation gaps: custom configurations and large-scale pricing require sales conversations
Best for: Large enterprises needing massive scale, latest NVIDIA hardware, and willingness to work through sales process.
Crusoe
Overview
Crusoe was founded in 2018 with a unique angle: converting stranded natural gas (flared at oil wells) into computational power. Their Digital Flare Mitigation technology captures methane with 99.9% combustion efficiency, reducing emissions by ~99% compared to regular flaring.
The company raised $1.375B in Series E (October 2024) at $10B+ valuation, with investors including NVIDIA, Mubadala, Founders Fund, Fidelity, and Tiger Global. Total funding: $2.64B across 13 rounds.
In March 2025, Crusoe divested its Bitcoin mining operations to NYDIG (which had been 55% of 2024 revenue) to focus purely on AI infrastructure. They’re now the lead developer on the Stargate project’s flagship Abilene campus (OpenAI/Oracle/SoftBank’s $500B AI initiative).
Infrastructure
Crusoe operates 22 data centers across 6 regions with 1.6+ GW under operations/construction and 10+ GW in development. The Abilene, Texas Stargate campus will total 1.2 GW across ~4M sq ft when Phase 2 completes (mid-2026), designed for up to 50,000 GB200 NVL72 GPUs per building. A 1.8 GW Wyoming campus is under development.
European presence includes Iceland (57 MW, 100% geothermal/hydro) and Norway (12 MW, 100% hydroelectric).
Vertical integration through the 2022 Easter-Owens acquisition gives Crusoe in-house data center design and manufacturing capability.
Hardware: GB200 NVL72 ($10.50/hr per GPU), B200 ($8.60/hr), H200 (~$6.30/hr), H100 (PCIe $4.25/hr, SXM $6.16/hr), L40S ($1.45/hr), and AMD MI300X ($3.45/hr). First major cloud to virtualize AMD MI300X on Linux KVM.
Strengths
- Energy-first model provides long-term cost predictability and genuine sustainability credentials
- Vertical integration from power generation through hardware to software orchestration
- Full platform: Managed Kubernetes (CMK), Slurm, Run:ai integration, Kubeflow
- SemiAnalysis ClusterMAX 2.0 “Gold” rating
- Strong AMD GPU support alongside NVIDIA
- 99.98% uptime SLA
Gaps
- No native managed object storage; customers must self-manage MinIO or integrate VAST Data/Lightbits
- Limited geographic footprint (22 data centers vs 30+ for hyperscalers)
- Energy price volatility exposure: Texas grid crisis (March 2025) saw costs spike 40%
- Stranded gas supply may decline as world transitions away from fossil fuels
- Certain GPU types in certain regions require sales discussion
Best for: Teams prioritizing sustainability, AMD GPU access, or participation in Stargate-class infrastructure.
Lambda

Overview
Lambda was founded in 2012 by brothers Stephen and Michael Balaban. The company is known for its developer-friendly approach and the Lambda Stack (pre-configured PyTorch/TensorFlow/CUDA environment) used by 100K+ users.
Funding has accelerated: $320M Series C (February 2024), $480M Series D (February 2025) at $2.5B valuation, and over $1.5B Series E (November 2025) led by TWG Global. NVIDIA is a major investor and strategic partner. Lambda is targeting an IPO in H1 2026.
The NVIDIA relationship is notably deep: a September 2024 $1.5B GPU leaseback deal has NVIDIA leasing 18,000 GPUs from Lambda over 4 years, with NVIDIA researchers using the capacity. Lambda is targeting an IPO in H1 2026.
Infrastructure
Lambda operates on a pure colocation model (no owned facilities). Current locations: San Francisco, Allen (TX), Plano (TX), with additional sites across North America, Australia, and Japan (6 total data centers).
The May 2025 Aligned Data Centers partnership added a liquid-cooled facility in Plano, TX (~$700M investment) designed for Blackwell and Blackwell Ultra.
Hardware: HGX B200 ($4.99/hr), HGX H200, HGX H100 (PCIe $2.49/hr, SXM $2.99/hr). 1-Click Clusters scale from 16 to 2,040+ nodes. All clusters include Quantum-2 InfiniBand (400Gb/s per GPU) and VAST Data storage integration.
Strengths
- 1-Click Clusters: instant multi-node provisioning with one-week minimum (no long-term contracts)
- Simple, transparent pricing: $2.99-4.49/GPU/hour depending on generation
- Pre-installed Lambda Stack eliminates environment configuration
- VAST Data partnership for petabyte-scale shared storage with S3 API
- No egress/ingress fees
- SOC 2 Type II certified
Gaps
- GPU availability issues during peak demand (“out of stock” messages common)
- No free tier or trial
- No built-in cost allocation/usage tracking by team or project
- Limited European presence
- Cross-data-center networking falls back to Ethernet (degrades vs single-cluster InfiniBand)
Best for: Teams wanting fast, simple cluster provisioning without long-term commitments, comfortable with SSH/terminal workflows.
Voltage Park

Overview
Voltage Park was founded in 2023 with an unusual structure: it’s backed by a $1B grant from Navigation Fund, a nonprofit founded by Jed McCaleb (Stellar co-founder, Ripple co-founder). The mission is democratizing AI infrastructure access.
Leadership includes Ozan Kaya (CEO, ex-CarLotz President) and Saurabh Giri (Chief Product & Technology Officer, ex-Amazon Bedrock lead). The company has ~80 employees and 100+ customers including Cursor, Phind, Dream.3D, Luma AI, and Caltech researchers.
In March 2025, Voltage Park acquired TensorDock (GPU cloud marketplace), expanding their portfolio beyond first-party H100s.
Infrastructure
Voltage Park owns 24,000 H100 GPUs (80GB HBM3e, SXM5) deployed across 6 Tier 3+ data centers in Washington, Texas, Virginia, and Utah. The Quincy, WA facility runs on hydro and wind power.
Hardware runs on Dell PowerEdge XE9680 servers: 8 H100s per node with NVLink, 1TB RAM, dual Intel Xeon Platinum 8470 (52-core each). Quantum-2 InfiniBand provides 400Gb/s per GPU, scaling in 8,176 GPU increments.
Next-gen hardware (B200, GB200, B300, GB300) is available for pre-lease with capacity reserved ahead of public release.
June 2025 brought two major updates: VAST Data partnership for enterprise storage and managed Kubernetes launch.
Strengths
- Competitive pricing: $1.99/hr for H100s with 15-minute spinup, no contracts required
- Bare-metal access claimed to provide 40% acceleration for LLM training vs managed services
- VAST AI OS integration: unified file/object/block storage, multi-tenant security
- SOC 2 Type II, ISO 27001, and HIPAA certified (details at trust.voltagepark.com)
- Only neocloud partner in NSF NAIRR pilot; donated 1M H100 GPU hours for research
- 99.982% uptime SLA
- Nonprofit backing suggests mission-driven rather than pure profit optimization
Gaps
- Only Ubuntu Server 22.04 LTS supported (no alternative OS, no GUI, SSH only)
- VM instances limited to 100Gbps Ethernet (vs bare-metal InfiniBand at 400Gb/s per GPU)
- No data recovery after instance termination; customers must backup externally
- Historically focused on H100s only (TensorDock acquisition broadens selection)
- Limited documentation depth
- Managed Kubernetes only launched June 2025; VM support still in development
Best for: Researchers, startups, and teams wanting low-cost H100 access with VAST Data storage, especially those eligible for NAIRR research allocations.
GMI Cloud

Overview
GMI Cloud was founded in 2023 as an offshoot of Realtek Semiconductors and GMI Technology. The company is headquartered in Mountain View, California and raised $82M in Series A funding (October 2024). GMI Cloud has approximately 120 employees.
GMI Cloud is an NVIDIA Cloud Partner, providing access to the latest GPU architectures including H200 and upcoming Blackwell systems.
Infrastructure
Regions include the US (primary) and Asia (Taiwan, Singapore, with Tokyo and Malaysia planned). GMI operates 9 global data centers with capacity for multi-tenant workloads.
Hardware: H100 ($2.10/hr), H200 (from $2.50/hr), B200 HGX (pre-order), GB200 (pre-order). All training clusters include 400Gb/s InfiniBand.
Storage is powered by VAST Data, providing S3-compatible object storage and NFS shared filesystems with GPUDirect integration for high-throughput data loading.
Strengths
- GMI Cluster Engine provides managed Kubernetes orchestration for GPU workloads
- VAST Data partnership delivers enterprise-grade storage with GPUDirect
- Strong Asia-Pacific presence through regional data centers
- B200 and GB200 available for pre-order
- $82M Series A funding provides runway for expansion
Gaps
- Limited public documentation and pricing transparency
- Smaller footprint than major neoclouds
- Less established brand recognition in North America and Europe
- No Slurm offering documented
- Compliance certifications not prominently published
- Early-stage company (founded 2023)
Best for: Teams seeking H200/B200 access with VAST Data storage and managed Kubernetes, especially with Asia-Pacific presence needs.
RunPod

Overview
RunPod was founded in 2022 and is headquartered in New Jersey. The company raised $20M in seed funding (May 2024) co-led by Intel Capital and Dell Technologies Capital, following an earlier $18.5M round in November 2023. Total funding is $38.5M. RunPod has approximately 80 employees.
The platform serves 500,000+ developers, from individual researchers to enterprise teams. RunPod’s differentiator is simplicity: GPU instances launch in under a minute with pre-configured ML environments.
Infrastructure
RunPod operates 31+ data centers globally with a mix of first-party and partner infrastructure. The platform offers three deployment models:
- Pods: GPU VMs with persistent storage, available on-demand or spot (up to 80% cheaper)
- Serverless: Auto-scaling inference endpoints billed per-second
- Community Cloud: Marketplace of third-party GPU capacity at lower prices
Hardware: H100 (PCIe $1.99-2.39/hr, SXM $2.69-2.99/hr), H200 ($3.59/hr), B200 ($5.19-5.98/hr), A100 80GB, L40S, RTX 4090/3090. InfiniBand is available for dedicated clusters only; standard instances use Ethernet.
Storage: Network volumes ($0.10/GB/mo standard, $0.05-0.07/GB/mo shared), S3-compatible object storage available in 5 data centers.
Strengths
- Sub-minute instance launch times with one-click templates for PyTorch, TensorFlow, Stable Diffusion
- Serverless inference with pay-per-second billing and automatic scaling
- Spot instances at 50-80% discount for interruptible workloads
- Simple, transparent pricing with no hidden fees
- Active community and template marketplace
- RESTful API and CLI for automation
Gaps
- No managed Kubernetes (container-focused, not K8s-focused)
- InfiniBand limited to dedicated clusters; standard instances use Ethernet
- Community Cloud capacity quality varies by host
- Limited enterprise compliance certifications documented
- No native Slurm support
- Multi-node distributed training requires manual configuration
Best for: Individual developers and small teams wanting fast, simple GPU access for inference and single-node training without enterprise overhead.
Hyperstack

Overview
Hyperstack is the GPU cloud arm of NexGen Cloud, a UK-based infrastructure company founded in 2020. The platform positions itself as a cost-effective alternative to hyperscalers, with pricing 30-75% lower than AWS/Azure/GCP.
NexGen Cloud has invested significantly in GPU infrastructure, partnering with NVIDIA and operating data centers across multiple regions. The company targets AI startups, researchers, and enterprises looking to reduce GPU costs without sacrificing performance.
Infrastructure
Hyperstack operates across 3 regions: CANADA-1, NORWAY-1, and US-1. The platform offers tiered service levels:
- Standard Tier: GPU VMs with standard Ethernet networking
- Supercloud Tier: High-performance clusters with 400Gb/s Quantum-2 InfiniBand for distributed training
Hardware: H100 (PCIe $1.90/hr, SXM $2.40/hr), H200 ($3.50/hr), A100, L40S. H100 PCIe at $1.90/hr is among the cheapest in the market. B200 and GB200 available via contact.
Pre-configured environments include PyTorch, TensorFlow, and popular ML frameworks. All instances include local NVMe storage.
Strengths
- Aggressive pricing: H100 at $1.90/hr, H200 at $3.50/hr
- Supercloud tier provides InfiniBand for multi-node training
- Simple RESTful API and web console
- No long-term contracts required
- Pre-configured ML environments reduce setup time
- Growing European presence with GDPR-compliant data centers
Gaps
- InfiniBand only available on Supercloud tier (standard tier is Ethernet)
- Managed Kubernetes (On-Demand K8s) available but less mature than CoreWeave/Nebius
- Limited documentation compared to larger providers
- Smaller GPU fleet than CoreWeave, Nebius, or Lambda
- Storage options less mature than competitors
- Less visibility into infrastructure topology
Best for: Cost-conscious teams wanting affordable H100/H200 access, comfortable with VM-based workflows rather than managed K8s.
DataCrunch / Verda
Overview
DataCrunch was founded in 2019 in Helsinki, Finland. In 2024, the company rebranded to Verda to emphasize its sustainability positioning. The company is now headquartered in Iceland.
The core differentiator is 100% renewable energy: Verda’s Icelandic data centers run on geothermal and hydroelectric power, providing genuine carbon-neutral AI infrastructure rather than offset-based claims. The company holds ISO 27001 certification and is GDPR compliant.
Infrastructure
Primary data centers are in Iceland, leveraging abundant renewable energy and natural cooling. The cold climate reduces cooling costs significantly while enabling higher density deployments.
Hardware: H100 SXM5 ($2.29/hr), H200 SXM5 ($2.99/hr), B200 SXM6 ($3.79/hr), B300 ($1.24/hr), A100. All multi-node clusters include 400Gb/s NDR InfiniBand, supporting clusters from 16 to 128 GPUs with InfiniBand.
Storage: Block storage ($0.05-0.20/GB/mo), NVMe shared filesystem ($0.20/GB/mo). Spot pricing available at 50% discount for serverless containers.
Strengths
- 100% renewable energy (geothermal/hydro), not offsets
- B300 available at competitive $1.24/hr
- 400Gb NDR InfiniBand standard on clusters
- Natural cooling in Iceland reduces operational costs
- Strong sustainability credentials for ESG-conscious organizations
- ISO 27001 certified, GDPR compliant
Gaps
- No managed Kubernetes offering
- Single geographic region (Iceland) may cause latency for US/Asia users
- Smaller brand recognition than US-based neoclouds
- Cluster sizes limited to 128 GPUs with InfiniBand
- Bare-metal focus; less abstraction than serverless platforms
Best for: Organizations with sustainability mandates needing genuine renewable energy infrastructure, or European teams wanting low-latency access with strong compliance.
Vultr

Overview
Vultr was founded in 2014 as a general-purpose cloud provider, making it one of the more established players in this comparison. The company has expanded aggressively into GPU cloud, becoming an NVIDIA Cloud Partner with both NVIDIA and AMD GPU offerings.
Vultr differentiates through global footprint: 32 data center locations worldwide, more than any neocloud. This enables low-latency inference deployments close to end users. The company is privately held with undisclosed revenue.
Infrastructure
32 locations spanning North America, Europe, Asia-Pacific, South America, and Australia. This geographic diversity is unmatched among neoclouds.
Hardware: NVIDIA H100 ($2.99/hr on-demand, $2.30/hr 36-month prepaid), A100, L40S, A40. AMD MI300X ($1.85/hr, cheapest in market), MI325X ($2.00/hr), MI355X ($2.59/hr on-demand, $2.29/hr 36-month reserved).
H100 and H200 clusters include 400Gb/s Quantum-2 InfiniBand in non-blocking topology. Bare-metal GPU servers are also available.
Storage: Block storage ($0.10/GB/mo), Object storage ($0.018-0.10/GB/mo with S3 API), NVMe shared filesystem ($0.10/GB/mo).
Strengths
- Cheapest AMD MI300X in market at $1.85/hr
- 32 global locations enable low-latency edge deployments
- Full stack: VKE (managed Kubernetes), bare-metal, block/object/shared storage
- Strong compliance: SOC 2 (HIPAA), ISO 27001, PCI DSS
- Both NVIDIA and AMD GPU availability
- Self-service signup, no sales approval required
- Established company with 10+ years operational history
Gaps
- H100 pricing ($2.99/hr) higher than budget neoclouds
- GPU fleet smaller than CoreWeave, Lambda, or Nebius
- No Slurm offering
- Less AI/ML-specific tooling than specialized neoclouds
- Documentation spread across general cloud and GPU-specific content
- InfiniBand availability may vary by location
Best for: Teams needing global GPU presence for inference, AMD GPU access, or preferring an established provider with comprehensive compliance certifications.
OVHcloud

Overview
OVHcloud is a French cloud provider founded in 1999, making it the oldest company in this comparison. Publicly traded on Euronext Paris, OVHcloud reported €1.03B revenue in 2024. The company operates 43 data centers globally with a strong European presence.
OVHcloud’s GPU offerings are part of their broader cloud portfolio. The company emphasizes European data sovereignty, owning and operating all infrastructure without reliance on US hyperscalers. OVHcloud has achieved SecNumCloud certification (French government security standard), making it one of few providers qualified for French public sector AI workloads.
Infrastructure
43 data centers across Europe, North America, and Asia-Pacific. Primary GPU capacity is in European facilities. OVHcloud uses water-cooled systems reducing energy consumption by up to 50%.
Hardware: H100 (from $2.99/hr), A100, L4, L40S. The company focuses on the Private AI offering: dedicated GPU infrastructure managed by OVHcloud within isolated environments.
Private AI includes managed Kubernetes for GPU workloads. OVHcloud’s Managed Kubernetes Service is also available for standard workloads.
Strengths
- European data sovereignty with no US hyperscaler dependencies
- SecNumCloud, SOC 2, ISO 27001, PCI DSS, HDS certifications
- 43 data centers provide extensive geographic coverage
- Water-cooled infrastructure reduces environmental impact
- Private AI offering for isolated, dedicated GPU environments
- 25+ years operational track record
- Competitive European pricing
Gaps
- H100 pricing not published; requires sales conversation
- GPU portfolio smaller than US neoclouds
- No InfiniBand documented for standard offerings
- AI/ML tooling less developed than specialized providers
- Slower to adopt latest GPU architectures (no B200/GB200 listed)
- Primary focus remains general cloud; GPU is secondary business
Best for: European enterprises with data sovereignty requirements, French public sector organizations needing SecNumCloud certification, or teams preferring established European infrastructure.
FluidStack
Overview
FluidStack was founded in 2017 at Oxford University in London. The company raised $200M in Series A funding (February 2025) led by Cacti, following earlier rounds including a $24.7M SAFE and $37.5M debt financing in 2024. FluidStack manages 100,000+ GPUs and has secured up to $10B in GPU-collateralized debt financing from Macquarie Group.
The company has transitioned from a GPU marketplace aggregator to primarily building dedicated infrastructure for large enterprises. Revenue mix shifted to 62% Private Cloud (owned infrastructure, $100M+ contracts) vs 38% Marketplace (aggregator, ~$340K contracts).
Notable customers include Anthropic (selected to build custom data centers in NY and TX), Meta, Mistral AI, Character.AI, Poolside, and Black Forest Labs. The company reported $180M ARR as of December 2024, up 620% year-over-year, with long-term contracts including 10+ year agreements with TeraWulf.
Infrastructure
FluidStack now operates two distinct models:
Private Cloud: Purpose-built GPU clusters for enterprise customers. Single-tenant, fully isolated infrastructure with dedicated engineering support. Storage integrates customer-preferred solutions (VAST, WEKA, DDN).
Marketplace: Aggregated GPU capacity from partner data centers with variable specifications.
Hardware: H100 ($2.10/hr), H200 ($2.30/hr), A100, L40S. InfiniBand available for dedicated clusters. Atlas OS provides bare-metal orchestration with managed Kubernetes and Slurm options.
Strengths
- Proven track record building custom data centers for top AI labs (Anthropic, Meta)
- $10B debt capacity enables rapid scaling without customer pre-funding
- Single-tenant by default with full infrastructure isolation
- Enterprise compliance: GDPR, HIPAA, ISO 27001, SOC 2 Type II
- Competitive H100 pricing starting at $2.10/hr
- Flexible storage integration (VAST, WEKA, DDN) for enterprise deployments
Gaps
- Enterprise Private Cloud requires sales engagement; not self-service
- Marketplace tier has variable infrastructure quality
- Less public documentation than competitors
- Storage options depend on deployment type (enterprise vs marketplace)
Best for: AI labs and enterprises needing custom-built GPU infrastructure with long-term contracts, or teams wanting marketplace access for shorter-term workloads.
Vast.ai

Overview
Vast.ai was founded in 2018 as a marketplace for GPU compute. Unlike traditional cloud providers, Vast.ai connects renters with independent GPU hosts, similar to Airbnb for compute. This model enables the lowest prices in the market but with significant variability in infrastructure quality.
The platform is popular with researchers, hobbyists, and cost-conscious startups. Vast.ai serves hundreds of thousands of users running ML training, inference, and rendering workloads.
Infrastructure
Vast.ai is a marketplace, not a traditional cloud provider. GPU capacity comes from:
- Data Center Hosts: Professional operators with standardized infrastructure
- Individual Hosts: Enthusiasts renting out personal hardware
This creates extreme price variation: H100 80GB ranges from $1.74-1.87/hr (cheapest in market) to $3+/hr depending on host. The platform shows real-time availability, reliability scores, and host ratings.
Hardware: H100, H200, A100, L40S, RTX 4090/3090, and older GPUs. Most instances are Ethernet-only; InfiniBand available only from specific data center hosts.
Docker-based deployments with templates for PyTorch, TensorFlow, Stable Diffusion, and other frameworks.
Strengths
- Lowest H100 prices in market ($1.74-1.87/hr from quality hosts)
- Massive selection of GPU types including consumer hardware
- Real-time availability and pricing transparency
- Host reliability ratings help identify quality infrastructure
- Docker-based deployment with pre-built templates
- No minimum commitments; pay-per-minute billing
- Good for experimentation and prototyping
Gaps
- Infrastructure quality varies dramatically by host
- No InfiniBand on most instances (data center hosts only)
- No managed Kubernetes or enterprise orchestration
- Limited enterprise compliance certifications
- Host reliability can be inconsistent
- Multi-node training difficult due to fragmented infrastructure
- No SLA guarantees on marketplace instances
- Support quality varies by host
Best for: Researchers and hobbyists prioritizing cost over reliability, teams willing to trade consistency for the lowest prices in market.
TensorWave

Overview
TensorWave was founded in December 2023 in Las Vegas by Darrick Horton, Jeff Tatarchuk, and Piotr Tomasik. The company raised $100M in Series A funding (May 2025) led by Magnetar and AMD Ventures, with total funding of $146.7M. TensorWave is an AMD-exclusive GPU cloud provider.
While most neoclouds build on NVIDIA infrastructure, TensorWave bet entirely on AMD’s Instinct line. The company deployed the largest AMD training cluster in North America (8,192 MI325X GPUs) and was first to deploy large-scale direct liquid-cooled AMD GPU infrastructure. TensorWave has over 1GW of capacity and holds SOC 2 Type II and HIPAA certifications.
Infrastructure
TensorWave operates US-based data centers purpose-built for AMD GPUs. The infrastructure uses Aviz ONES fabric for 400Gb Ethernet networking with RoCE (RDMA over Converged Ethernet) rather than InfiniBand.
Hardware: MI300X (sold out), MI325X ($1.95/hr), MI355X ($2.85/hr for reservations). All systems support AMD ROCm for PyTorch, TensorFlow, and JAX workloads.
The RoCE implementation provides RDMA capabilities over standard Ethernet, offering lower cost than InfiniBand while maintaining reasonable performance for distributed training.
Strengths
- AMD-first specialization provides access when NVIDIA is constrained
- MI325X at $1.95/hr is competitive with NVIDIA H100 pricing
- RoCE networking provides RDMA without InfiniBand cost
- Deep ROCm expertise for AMD software optimization
- Early access to AMD’s latest GPU generations
- Lower cost infrastructure through AMD partnership
Gaps
- AMD ROCm requires workload adaptation (not drop-in CUDA replacement)
- MI300X sold out; availability constraints
- RoCE has higher latency than InfiniBand under network congestion
- No managed Kubernetes offering documented
- Less ecosystem tooling compared to NVIDIA-focused providers
Best for: Teams with ROCm expertise seeking AMD GPU access, or those willing to adapt workloads to benefit from AMD’s price/performance.
Hot Aisle

Overview
Hot Aisle was founded in October 2023 by Jon Stevens and Clint Armstrong, with backing from Joseph Lubin (ConsenSys founder) and Mesh. The founders have decades of technical experience deploying infrastructure across 9 data centers.
The company specializes exclusively in AMD Instinct GPUs, providing an alternative to NVIDIA-dominant neoclouds. Hot Aisle holds SOC 2 Type II and HIPAA certifications, with ISO 27001 planned.
Infrastructure
Hot Aisle operates from the Switch Pyramid facility in Grand Rapids, Michigan (Tier 5 Platinum data center). Infrastructure includes Dell XE9680 servers and Broadcom 57608 networking with Dell PowerSwitch Z9864F spine switches at 400G.
Networking: RoCEv2 delivering 3200 Gbps throughput per node. Per-minute billing with no long-term contracts.
Hardware: MI300X ($1.99/hr with 192GB), MI355X (available for reservations). Configurations range from 1x GPU VMs to 8x GPU bare metal.
Strengths
- AMD MI300X at $1.99/hr is competitive pricing
- Early MI355X availability for reservations
- SOC 2 Type II and HIPAA certified
- 3200 Gbps RoCEv2 throughput per node
- Dell/Broadcom enterprise infrastructure
- Per-minute billing, no contracts required
Gaps
- Single data center location (Michigan)
- Small company (founded October 2023)
- AMD ROCm requires workload adaptation from CUDA
- Limited GPU selection (AMD only)
Best for: Teams seeking AMD MI300X access at competitive pricing with enterprise compliance certifications.
Nscale

Overview
Nscale launched from stealth in May 2024 and has raised significant funding: $155M Series A (December 2024) and $1.1B Series B (2025) led by Aker ASA. Other investors include NVIDIA, Microsoft, G Squared, Dell, Nokia, Fidelity, Point72, and OpenAI. The company is targeting an IPO in 2026.
Nscale focuses on sustainable AI infrastructure, operating data centers in Norway powered by renewable energy. The company has a joint venture with Aker ASA for “Stargate Norway” in Narvik (230MW initial, 290MW expansion planned) and a partnership with OpenAI.
Infrastructure
Owned facilities in Glomfjord, Norway (30MW, expanding to 60MW) and a 15MW lease agreement with Verne for colocation in Iceland (deploying 4,600 Blackwell Ultra GPUs throughout 2026). All facilities leverage hydroelectric and geothermal power with natural cooling.
Hardware: H100, H200, GB200 NVL72, A100, and AMD MI300X (all contact pricing).
Networking uses Nokia 7220 IXR switches (recently upgraded to IXR-H6 with 800GE/1.6TE capability) with RoCE rather than InfiniBand.
Strengths
- Genuine renewable energy (hydro/geothermal), not carbon offsets
- Nordic locations provide natural cooling efficiency
- Strong investor backing (NVIDIA, Microsoft, OpenAI, Aker ASA)
- GB200 NVL72 capacity available
- OpenAI partnership via Stargate Norway
- Nokia partnership for latest networking hardware
Gaps
- All pricing requires sales contact; no self-service
- Limited documentation and transparency
- RoCE has higher latency than InfiniBand under congestion
- Nordic locations may cause latency for US/Asia workloads
- No managed Kubernetes or Slurm documented
- Early-stage company (launched 2024); operational track record developing
Best for: Large enterprises with sustainability mandates seeking renewable-powered GPU infrastructure, especially those interested in OpenAI ecosystem alignment.
SF Compute
Overview
SF Compute (San Francisco Compute) was founded in 2023 by Evan Conrad and raised $40M in 2025 led by DCVC and Wing Venture Capital at a $300M valuation. Other backers include Jack Altman and Electric Capital. The company has approximately 30 employees and recently hired Eric Park (former Voltage Park CEO) as CTO.
SF Compute operates as a GPU marketplace/broker, deliberately avoiding hardware ownership. The platform enables buyers to access compute and resell unused capacity, creating spot and forward markets for GPU compute. SF Compute manages $100M+ worth of GPU hardware through its marketplace model.
The key differentiator is flexible time-based reservations: you can book GPU clusters for arbitrary windows (e.g., 7 days starting next Tuesday) at guaranteed pricing. When you purchase capacity, SF Compute provisions infrastructure from partner providers configured to your specifications, ensuring consistent performance across the reservation period.
Infrastructure
As a marketplace, SF Compute does not own infrastructure but provides access to partner capacity. The platform offers:
- Kubernetes clusters: 400Gb/s InfiniBand per GPU, 0.5-second spinup
- VMs: No InfiniBand, 5-minute spinup
- Bare-metal: Available upon request
Hardware: H100i and H100v at $1.45-1.50/hr, H200 (requires contact), B300 coming soon. Storage: 1.5TB+ NVMe per node. Free egress with 100% uptime SLA.
Strengths
- H100 pricing at $1.45-1.50/hr is among the lowest in market
- Flexible time-based reservations: book arbitrary windows (7 days, 2 weeks, etc.) at guaranteed pricing
- Managed Kubernetes with InfiniBand configured automatically
- Marketplace model allows resale of unused capacity
- 24/7 support via Slack, phone, email
- 100% uptime SLA with automated refund for failed nodes
Gaps
- Marketplace model means infrastructure varies by underlying provider
- No owned infrastructure (relies on partner capacity)
- H200 not yet available (requires contact)
- Limited geographic control
Best for: Price-sensitive teams wanting lowest H100 costs with managed Kubernetes, comfortable with marketplace model and varying underlying infrastructure.
Last updated: December 2025. Pricing and features change frequently. Verify current offerings on provider websites before making decisions.