πŸ“£ Introducing $2.95/Hr H100, H200, and B200s: train, fine-tune, and scale ML models affordably, without having to DIY the infrastructure Β  πŸ“£ Run Saturn Cloud on AWS, GCP, Azure, Nebius, Crusoe, or on-prem. πŸ“£ Introducing $2.95/Hr H100, H200, and B200s: train, fine-tune, and scale ML models affordably, without having to DIY the infrastructure Β  πŸ“£ Run Saturn Cloud on AWS, GCP, Azure, Nebius, Crusoe, or on-prem. πŸ“£ Introducing $2.95/Hr H100, H200, and B200s: train, fine-tune, and scale ML models affordably, without having to DIY the infrastructure Β  πŸ“£ Run Saturn Cloud on AWS, GCP, Azure, Nebius, Crusoe, or on-prem.

GPU Cloud Comparison Report: Neoclouds for AI Infrastructure

An in-depth comparison of GPU cloud providers for AI training and inference, with detailed provider profiles and technical analysis.

This report provides what you won’t find on vendor websites: honest assessments of each provider’s strengths, weaknesses, and best-fit use cases. We cover pricing, InfiniBand networking, storage, Kubernetes support, and platform maturity across the neocloud landscape.


The Neocloud Landscape

The term “neocloud” refers to cloud providers primarily offering GPU-as-a-Service. Unlike hyperscalers with broad service portfolios, neoclouds focus on delivering GPU compute with high-speed interconnects for AI and HPC workloads.

Between 10-15 neoclouds currently operate at meaningful scale in the US, with footprints growing across Europe, the Middle East, and Asia.

Why consider neoclouds over AWS, GCP, or Azure?

The hyperscaler GPU experience involves quota requests, waitlists, and premium pricing:

ProviderH100 80GBAvailability
AWS~$12/hr (p5.48xlarge, 8x H100)Quota approval required, multi-week waitlists common
Azure~$12/hr (NC H100 v5, 8x H100)Quota requests, capacity constraints
GCP~$12/hr (a3-highgpu-8g, 8x H100)Limited regions, quota approval process
SF Compute$1.45-1.50/hrSelf-service signup, provision in minutes

That’s an 8x price difference. Even at the higher end of neocloud pricing ($3-4/hr for H100s), you’re paying 70% less than hyperscalers.

Beyond cost, neoclouds eliminate the friction of quota approvals. On AWS, requesting H100 quota often requires a support ticket explaining your use case, with approval taking days to weeks. GCP and Azure have similar processes. Neoclouds typically offer self-service access: sign up, add payment, deploy GPUs in minutes.

Infrastructure is also optimized differently. Neoclouds treat InfiniBand as standardβ€”400Gb/s per GPU for multi-node training. Hyperscalers charge premium tiers for similar networking (AWS EFA, GCP GPUDirect) and availability varies by region.

GPU Hardware & Pricing

Your first decision: which GPUs do you need, and what will they cost? This section covers on-demand pricing for NVIDIA and AMD GPUs. Most providers offer reserved capacity discounts of 30-60%, but those require sales conversations.

On-Demand GPU Pricing

ProviderH100H200B200GB200Source
CoreWeave~$6.15/hr~$6.30/hr$8.60/hr~$5.25/hrLink
Crusoe$3.90/hr$4.29/hrContactContactLink
DataCrunch/Verda$4.75/hr$4.80/hr$4.95/hrβ€”Link
FluidStack$2.10-2.89/hrContactContactContactLink
GMI Cloud$2.10/hr$2.50-3.50/hrPre-orderPre-orderLink
Hot Aisleβ€”β€”β€”β€”N/A
Hyperstack$1.90-2.40/hr$3.50/hrContactContactLink
Lambda$2.29-3.29/hrβ€”$3.79/hrβ€”Link
Nebius$2.95/hr$3.50/hr$5.50/hrPre-orderLink
NscaleContactContactβ€”ContactLink
OVHcloudContactβ€”β€”β€”Link
RunPod$1.99-2.69/hr$3.59/hrContactβ€”Link
SF Compute$1.45-1.50/hrContactβ€”β€”Link
TensorWaveβ€”β€”β€”β€”N/A
Vast.ai$1.74-1.87/hrVariesVariesβ€”Link
Voltage Park$1.99-2.49/hrContactContactContactLink
Vultr$2.99/hrContact$2.89/hrβ€”Link

Pricing varies significantly based on whether you’re renting individual GPUs, full nodes (typically 8 GPUs), or multi-node clusters with InfiniBand. Reserved capacity discounts of 30-60% are available from most providers but require sales conversations.

AMD GPU Availability

ProviderMI300XMI325XMI355XSource
Crusoe$3.45/hrβ€”ContactLink
Hot Aisle$1.99/hrβ€”ReservationsLink
NscaleContactβ€”β€”Link
TensorWaveSold out$1.95/hr$2.85/hrLink
Vultr$1.85/hr$2.00/hr$2.59/hrLink

AMD adoption is growing. Vultr offers the cheapest MI300X at $1.85/hr. Hot Aisle and TensorWave are AMD-only providers.

Training Infrastructure

For multi-node training, your infrastructure determines actual performance regardless of GPU specs. Network bandwidth between GPUs and shared storage throughput are the critical factors. Without proper networking and storage, even the fastest GPUs will sit idle waiting for data.

InfiniBand and High-Speed Networking

For multi-node distributed training, network bandwidth between GPUs is critical. InfiniBand provides lower latency and higher bandwidth than Ethernet, with RDMA enabling GPU-to-GPU communication without CPU involvement.

ProviderInfiniBandSpeed (per GPU)AvailabilityTopologySource
CoreWeaveYes400Gb/s (Quantum-2)H100/H200 clustersNon-blocking fat-treeLink
CrusoeYes400Gb/sAll GPU nodesPartition-based isolationLink
DataCrunch/VerdaYes400Gb/s (NDR)Instant clustersRail-optimizedLink
FluidStackYes400Gb/sClustersNot documentedLink
GMI CloudYes400Gb/sAll GPU nodesNot documentedLink
Hot AisleRoCE only400Gb EthernetAll nodesDell/BroadcomLink
HyperstackSupercloud only400Gb/sH100/H200 SXMQuantum-2Link
LambdaClusters only400Gb/s (Quantum-2)1-Click ClustersRail-optimizedLink
NebiusYes400Gb/s (Quantum-2)All GPU nodesRail-optimizedLink
NscaleRoCE only400Gb EthernetAll nodesNokia 7220 IXRLink
OVHcloudNo25Gb EthernetPublic Cloud GPUvRack private networkingLink
RunPodClusters only200-400Gb/sInstant ClustersNot documentedLink
SF ComputeYes400Gb/sK8s clusters onlyNot documentedLink
TensorWaveRoCE only400Gb EthernetAll nodesAviz ONES fabricLink
Vast.aiNoVaries by hostMarketplaceVaries by hostLink
Voltage ParkYes400Gb/s (Quantum-2)IB tier ($2.49/hr)Rail-optimizedLink
VultrYes400Gb/s (Quantum-2)H100/H200 clustersNon-blockingLink

400Gb/s NDR InfiniBand is now standard among providers with InfiniBand. TensorWave, Hot Aisle, and Nscale use RoCE (RDMA over Converged Ethernet) instead, which provides RDMA capabilities over standard Ethernet with lower cost but potentially higher latency under congestion.

Storage Options

Training workloads need three types of storage: block storage for OS and application data, object storage for datasets and checkpoints, and shared filesystems for multi-node data access.

ProviderBlock StorageObject StorageShared FSTechnologySource
CoreWeaveYesS3 $0.03-0.06/GB/mo$0.07/GB/moVAST, WEKA, DDNLink
Crusoe$0.08/GB/moβ€”$0.07/GB/moLightbitsLink
DataCrunch/Verda$0.05-0.20/GB/moComing soon$0.20/GB/moNVMe SFSLink
FluidStackFilesystem onlyβ€”β€”Not documentedLink
GMI CloudIntegratedVAST S3VAST NFSVAST Data, GPUDirectLink
Hot AisleNot documentedβ€”β€”Not documentedLink
Hyperstack~$0.07/GB/moIn developmentWEKA (Supercloud)NVMeLink
Lambdaβ€”S3 adapter only$0.20/GB/moVAST DataLink
Nebius$0.05-0.12/GB/moS3 $0.015/GB/mo$0.08/GB/moNFSLink
NscaleNot documentedNot documented“Parallel FS”Not documentedLink
OVHcloud$0.022/GB/moS3 + egress$120-150/TB/moNetAppLink
RunPod$0.10/GB/moS3 (5 DCs)$0.05-0.07/GB/moNetwork volumesLink
SF ComputeLocal NVMe onlyβ€”β€”1.5TB+ per nodeLink
TensorWaveLocal onlyβ€”WEKA (custom)Not documentedLink
Vast.aiPer-hostβ€”β€”VariesLink
Voltage ParkLocal NVMeVAST S3VAST NFSVAST DataLink
Vultr$0.10/GB/moS3 $0.018-0.10/GB/mo$0.10/GB/moNVMe-backedLink

VAST Data is popular among neoclouds (Lambda, Voltage Park, CoreWeave, GMI Cloud). Crusoe notably lacks native object storage, requiring customers to self-manage MinIO or integrate VAST Data separately.

Orchestration & Platform

How you’ll actually run workloads matters as much as the hardware. But there’s an important distinction: infrastructure orchestration (Kubernetes, Slurm) vs. the platform layer.

Neoclouds provide Kubernetes or Slurm to schedule containers or jobs on GPU nodes. That’s infrastructure orchestrationβ€”it gets your code running on hardware. But production AI teams need more: hosted dev environments where data scientists can iterate, distributed training orchestration that handles multi-node configurations, parallel job scheduling with automatic retries, and cost allocation by user and project.

Most neoclouds stop at infrastructure. The platform layerβ€”the operational tooling that makes GPU infrastructure actually usable for teamsβ€”is what you build on top, or what Saturn Cloud provides out of the box.

Kubernetes and Orchestration

ProviderManaged K8sSlurmAutoscalingNotesSource
CoreWeaveYes (CKS)SUNKYesBare-metal K8s, no hypervisorLink
CrusoeYes (CMK)YesYesRun:ai integrationLink
DataCrunch/Verdaβ€”Pre-installedβ€”Slurm on clustersLink
FluidStackβ€”β€”β€”Atlas platformLink
GMI CloudYes (Cluster Engine)β€”YesK8s-based orchestrationLink
Hot Aisleβ€”β€”β€”Bare-metal focusLink
Hyperstackβ€”β€”β€”VMs onlyLink
LambdaYes (1-Click Clusters)Availableβ€”Managed K8s and SlurmLink
NebiusYesManaged + SoperatorYesFirst Slurm Kubernetes operatorLink
NscaleYes (NKS)Yesβ€”Limited docsLink
OVHcloudYesβ€”YesStandard managed K8sLink
RunPodβ€”β€”YesServerless focusLink
SF ComputeYesβ€”β€”Managed K8s per zoneLink
TensorWaveβ€”Yesβ€”Pyxis/Enroot containersLink
Vast.aiβ€”β€”β€”Container-basedLink
Voltage ParkAdd-onβ€”β€”Helm/Rook-Ceph guidesLink
VultrYes (VKE)β€”YesStandard managed K8sLink

Nebius’s Soperator is the first open-source Kubernetes operator for running Slurm clusters. CoreWeave’s SUNK supports 32,000+ GPU jobs. Most serverless/container platforms (RunPod, Vast.ai, FluidStack) trade Kubernetes flexibility for simpler deployment models.

The Platform Layer: Saturn Cloud

Neoclouds provide GPU infrastructure (compute, networking, storage, Kubernetes). What they don’t provide is the platform layerβ€”the operational tooling that makes that infrastructure usable for AI teams.

What infrastructure teams build on top of neoclouds:

Every AI team running on neoclouds eventually builds or buys similar platform capabilities:

  • Hosted dev environments: JupyterLab or VS Code instances where data scientists iterate on code and run experiments, with SSH access to GPU nodes for debugging
  • Distributed training orchestration: Automated setup of multi-node training (environment variables for torchrun/DeepSpeed, InfiniBand configuration, NCCL tuning)
  • Job scheduling with failure handling: Run thousands of hyperparameter sweeps or training runs in parallel, with automatic retries and resource cleanup
  • Cost allocation: Track GPU usage by user, team, or project for chargebacks and budget management
  • Idle detection: Automatically shut down resources when not in use to prevent GPU waste

This is undifferentiated work. It doesn’t give your team competitive advantageβ€”every AI org needs it, and the implementation is similar across companies. Building it in-house typically takes 6-12 months of infrastructure engineering time.

What Saturn Cloud provides:

Saturn Cloud is the platform layer for GPU infrastructure. It runs on any Kubernetes clusterβ€”Nebius, Crusoe, CoreWeave, or your own bare-metal.

Core capabilities:

  • Dev environments: Hosted JupyterLab and VS Code with one-click GPU access, git integration, and SSH tunneling for debugging
  • Distributed training: Click a checkbox to convert a single-node job to multi-node, Saturn Cloud handles torchrun configuration and InfiniBand setup
  • Parallel jobs: Run 100s of experiments simultaneously with dependency graphs, automatic retries, and resource limits per user
  • Usage tracking: Real-time dashboard showing GPU hours by user/team/project, with export to your billing system
  • Idle shutdown: Configurable policies to stop instances after N minutes of inactivity, preventing wasted spend
  • SSO and RBAC: Integrate with your identity provider, assign permissions by team

Saturn Cloud deploys via Helm chart on your existing K8s cluster. Your data never leaves your infrastructureβ€”Saturn Cloud just adds the control plane and UI.

When to consider Saturn Cloud:

  • You’ve chosen a neocloud (or have bare-metal) and need to make it usable for your AI team
  • Your infra team is spending significant time building dev environment provisioning, job scheduling, or usage tracking
  • You need cost allocation and idle detection to control GPU spend
  • You want data scientists productive on day one rather than waiting months for platform tooling

Saturn Cloud handles the commodity 90% (platform layer) so your infrastructure team can focus on the 10% that differentiates your business.

Operational Considerations

Hidden costs and networking capabilities often determine whether a provider works for production deployments. Egress fees can add 20-40% to monthly bills at hyperscalers, while load balancers and VPCs are baseline requirements for inference endpoints.

Egress Pricing

ProviderEgress CostNotesSource
CoreWeaveFree (object storage)Object storage via LOTALink
CrusoeFreeZero data transfer feesLink
DataCrunch/VerdaNot documentedLink
FluidStackFreeZero egress/ingressLink
GMI CloudNot documentedLink
Hot AisleNot documentedLink
HyperstackFreeZero bandwidth chargesLink
LambdaFreeZero egressLink
NebiusFree (networking)S3 standard $0.015/GB, enhanced freeLink
NscaleNot documentedLink
OVHcloud$0.011/GBObject storage only, compute egress freeLink
RunPodFreeZero data transferLink
SF ComputeFreeNo ingress/egress feesLink
TensorWaveNot documentedClaims “no hidden costs”Link
Vast.aiVariesPer-host, can be $20+/TBLink
Voltage ParkFreeNo hidden costsLink
Vultr$0.01/GBAfter 2TB/mo freeLink

Network Services

ProviderLoad BalancerVPC/Private NetworkVPN/PeeringPublic IPsSource
CoreWeaveYes (K8s LB)Yes (VPC)Direct Connect (Equinix, Megaport)Yes + BYOIPLink
CrusoeYesYes (VPC)β€”YesLink
DataCrunch/Verdaβ€”Not documentedβ€”Not documentedLink
FluidStackβ€”Not documentedβ€”Not documentedLink
GMI Cloudβ€”Yes (VPC)β€”Yes (Elastic IPs)Link
Hot Aisleβ€”β€”β€”YesLink
Hyperstackβ€”β€”β€”Not documentedLink
Lambdaβ€”β€”β€”Not documentedLink
NebiusYes (K8s LB)Yesβ€”YesLink
Nscaleβ€”β€”β€”Not documentedLink
OVHcloudYes (L4/L7, Octavia)Yes (vRack)OVHcloud ConnectYes (Floating IPs)Link
RunPodServerless onlyGlobal networking (Pod-to-Pod)β€”Shared (port mapping)Link
SF Computeβ€”β€”β€”Not documentedLink
TensorWaveβ€”β€”β€”Not documentedLink
Vast.aiβ€”β€”β€”Shared (port mapping)Link
Voltage Parkβ€”β€”β€”Not documentedLink
VultrYes (L4, $10/mo)Yes (VPC 2.0)β€”YesLink

Developer Experience & Enterprise Readiness

How easy is it to get started, and does the platform meet enterprise requirements? Terraform providers and APIs enable infrastructure-as-code, self-service access determines time-to-first-GPU, and compliance certifications gate enterprise adoption.

Terraform and API Support

ProviderTerraform ProviderAPICLISource
CoreWeaveOfficialYesYesLink
CrusoeOfficialRESTYesLink
DataCrunch/Verdaβ€”Not documentedβ€”Link
FluidStackβ€”Not documentedβ€”Link
GMI Cloudβ€”RESTβ€”Link
Hot Aisleβ€”RESTβ€”Link
Hyperstackβ€”Infrahub APIβ€”Link
Lambdaβ€”RESTYesLink
NebiusOfficialYesYesLink
Nscaleβ€”RESTYesLink
OVHcloudOfficialRESTYesLink
RunPodβ€”GraphQLYesLink
SF Computeβ€”YesYesLink
TensorWaveβ€”β€”β€”Link
Vast.aiβ€”RESTYesLink
Voltage Parkβ€”β€”β€”Link
VultrOfficialRESTYesLink

Self-Service Access

ProviderAccess ModelNotesSource
CoreWeaveSales-gatedRequires organizational approval from sales teamLink
CrusoeSelf-serviceSign up via console, larger deployments contact salesLink
DataCrunch/VerdaSelf-serviceOrder GPU instances in minutes via dashboard or APILink
FluidStackSelf-serviceSign up at auth.fluidstack.io, launch in under 5 minutesLink
GMI CloudSelf-serviceSign up, launch instances in 5-15 minutes via console/APILink
Hot AisleSelf-serviceSSH-based signup, credit card, no contractsLink
HyperstackSelf-serviceInstant access, one-click deploymentLink
LambdaSelf-serviceCreate account and launch GPUs in minutes, pay-as-you-goLink
NebiusSelf-serviceSign up, add $25+, deploy up to 32 GPUs immediatelyLink
NscaleHybridSelf-service for inference only; training clusters require salesLink
OVHcloudSelf-serviceCreate account, $200 free credit for first projectLink
RunPodSelf-serviceDeploy GPUs in under a minute, no rate limitsLink
SF ComputeSelf-serviceSign up to buy, larger deployments contact salesLink
TensorWaveSales-gatedContact sales/solutions engineers to get startedLink
Vast.aiSelf-service$5 minimum to start, per-second billingLink
Voltage ParkSelf-serviceOn-demand GPUs available, reserved capacity contact salesLink
VultrSelf-serviceFree account signup, provision via portal/API/CLILink

Compliance and Enterprise Features

ProviderComplianceSSO/SAMLRegionsSource
CoreWeaveSOC 2, ISO 27001SAML/OIDC/SCIMUS, UK, Spain, Sweden, NorwaySecurity
CrusoeSOC 2 Type IINot documentedUS (TX, VA), Iceland, Norway (soon)Link
DataCrunch/VerdaISO 27001β€”EU (Finland, Iceland)Link
FluidStackβ€”β€”Not documentedLink
GMI CloudSOC 2 Type 1, ISO 27001β€”Not documentedLink
Hot AisleSOC 2 Type II, HIPAAβ€”US (MI)Link
Hyperstackβ€”β€”Europe, North AmericaLink
LambdaSOC 2 Type IINot documentedNot documentedLink
NebiusSOC 2 Type II, HIPAA, ISO 27001YesUS, EU (Finland, France, Iceland)Regions, Trust Center
Nscaleβ€”β€”NorwayLink
OVHcloudSOC 2, ISO 27001, PCI DSS, HDS, SecNumCloudNot documentedGlobal (46 DCs)Infrastructure, Certifications
RunPodSOC 2 Type Iβ€”MultipleLink
SF Computeβ€”β€”Not documentedLink
TensorWaveβ€”β€”Not documentedLink
Vast.aiβ€”β€”Varies by hostLink
Voltage ParkSOC 2 Type II, ISO 27001, HIPAAβ€”US (WA, TX, VA, UT)Infrastructure, Security
VultrSOC 2 (HIPAA), ISO 27001, PCI DSSβ€”32 global locationsLocations, Compliance

Infrastructure Ownership Models

Understanding whether a provider owns their infrastructure or aggregates from others matters for reliability, support, and pricing stability.

ProviderModelDescriptionSource
CoreWeaveOwnerAcquired Core Scientific ($9B, 1.3 GW) and NEST DC ($322M); 250K+ GPUs across 32 DCsLink
CrusoeOwnerVertically integrated; manufactures own modular DCs via Easter-Owens Electric acquisitionLink
DataCrunch/VerdaOwner (colo)Owns GPUs in 4 Nordic colos (3x Helsinki, 1x Iceland); building own DCs in 2025Link
FluidStackOwner + Aggregator62% owned infrastructure, 38% marketplace; $10B GPU asset financing via MacquarieLink
GMI CloudOwnerFull-stack ownership; offshoot of Realtek/GMI Tech with Taiwan supply chain advantageLink
Hot AisleOwner (colo)Owns AMD GPUs; colocation at Switch Pyramid Tier 5 DC in Grand Rapids, MILink
HyperstackOwner (colo)Owns 13K GPUs; long-term agreements with hyperscalers and renewable energy DCsLink
LambdaOwner (colo)Owns GPU hardware; colocation in SF and Texas; Nvidia leases back GPUs ($1.5B deal)Link
NebiusOwner + ColoOwns DCs in Finland and NJ (300 MW); colocation in Kansas City, Iceland, ParisLink
NscaleOwnerOwns 60MW Glomfjord DC; JV with Aker for 230MW Stargate Norway facilityLink
OVHcloudOwnerFully vertically integrated; designs/manufactures servers, builds/manages own DCsLink
RunPodOwner + AggregatorSecure Cloud (Tier 3/4 partners) + Community Cloud (aggregated third-party hosts)Link
SF ComputeAggregatorTwo-sided marketplace connecting GPU cloud providersLink
TensorWaveOwner (colo)Owns 8K AMD GPUs; leases 1 GW capacity across TECfusions portfolio (AZ, VA, PA)Link
Vast.aiAggregatorPure marketplace connecting 10K+ GPUs from individuals to datacentersLink
Voltage ParkOwner (colo)Owns 24K H100s ($500M) across 6 Tier 3 DCs in TX, VA, WALink
VultrColoOperates across 32 global colocation facilities (Digital Realty, Equinix, QTS partnerships)Link

Choosing a Provider

For production multi-node training: Nebius, CoreWeave, and Crusoe provide the most complete platforms with InfiniBand on all GPU nodes, managed Kubernetes, high-performance shared storage, and enterprise compliance. CoreWeave has the largest scale and was first to market with GB200. Nebius offers the most complete managed service stack (K8s, Slurm via Soperator, MLflow, PostgreSQL). Crusoe is the only option if you need AMD GPUs with full enterprise features.

For cost-optimized workloads: SF Compute ($1.45-1.50/hr for H100) and Vast.ai ($1.74-1.87/hr) offer the lowest pricing. Both are marketplace aggregatorsβ€”SF Compute provides managed Kubernetes, while Vast.ai is container-based with per-second billing. Trade-off: variable infrastructure quality and less documentation than enterprise providers.

For European data sovereignty: Nebius and DataCrunch/Verda operate EU data centers with 100% renewable energy. Nebius provides SOC 2 Type II + HIPAA + ISO 27001 with managed K8s and Slurm. DataCrunch/Verda offers Iceland-based infrastructure with ISO 27001 and genuine carbon-neutral positioning (geothermal/hydro power, not offsets).

For AMD GPUs: Vultr offers MI300X at $1.85/hr (cheapest in market) with managed Kubernetes. Hot Aisle and TensorWave are AMD-only providers with MI300X/MI325X/MI355X at competitive pricing and SOC 2/HIPAA compliance.


Provider Profiles

Each profile below covers infrastructure details, strengths, gaps, and best-fit use cases for the provider.

Nebius

Nebius

Overview

Nebius spun off from Yandex N.V. in 2024 following Russia-related sanctions pressures. The company repositioned from a search conglomerate to a dedicated AI infrastructure provider, led by Yandex co-founder Arkady Volozh. In December 2024, Nebius raised $700M from NVIDIA and Accel, followed by $1B in debt financing in June 2025 for global expansion.

The company reported $105M revenue in Q2 2024, up 625% year-over-year, and targets $900M-$1.1B ARR. A major Microsoft deal worth $17-19B over 5+ years was announced in late 2025.

Infrastructure

Nebius owns data centers in Finland (MΓ€ntsΓ€lΓ€, ranked #19 globally for supercomputing) and is building in the US. The Kansas City facility launched Q1 2025 with 5MW base capacity scalable to 40MW (~35K GPUs). Additional sites in Paris, Iceland, and the UK are operational or under development. Target: 1GW+ power capacity by end of 2026.

Hardware: H100 ($2.95/hr), H200 ($3.50/hr), B200 ($5.50/hr), L40S, and GB200 NVL72 (pre-order). All GPU nodes include 400Gb/s Quantum-2 InfiniBand with rail-optimized topology.

Storage performance is a differentiator: 12 GBps read, 8 GBps write per 8-GPU VM on their shared filesystem.

Strengths

  • Most complete managed service stack among neoclouds: Kubernetes ($0 for control plane), Slurm via Soperator, MLflow, Spark, PostgreSQL
  • Soperator is the first fully-featured open-source Kubernetes operator for Slurm, enabling 20-30 minute cluster deployment
  • 20% lower TCO through proprietary hardware design and energy efficiency
  • Strong sustainability angle: MΓ€ntsΓ€lΓ€ facility’s heat recovery covers 65% of local municipality heating
  • Competitive pricing at $2.95/hr for H100 vs ~$12/hr on AWS

Gaps

  • US presence is new (Kansas City launched Q1 2025); limited footprint compared to CoreWeave
  • No Asia-Pacific data centers yet (expansion planned)
  • No documented spot/preemptible instance pricing
  • As a 2024 spinoff, long-term operational stability still being proven

Best for: Teams wanting a fully-managed platform (K8s + Slurm + MLflow) with competitive pricing and strong European presence.


CoreWeave

CoreWeave

Overview

CoreWeave is the largest neocloud by GPU count. Founded in 2017 as Atlantic Crypto (Ethereum mining), the company pivoted to GPU cloud in 2019 and went public on Nasdaq (CRWV) in March 2025, raising ~$4B at a $35B valuation. The company operates 250,000 GPUs across 32 data centers.

Major customers include OpenAI ($22.4B total contract), Microsoft (62% of 2024 revenue), Mistral AI, IBM, and Databricks. 2024 revenue was $1.92B with projected $8B in 2025.

Infrastructure

CoreWeave has aggressively expanded through both organic growth and acquisition. The July 2025 acquisition of Core Scientific ($9B all-stock) added 1.3 GW of power capacity, with 500 MW from Bitcoin mining infrastructure being converted to AI workloads.

Locations span the US (New Jersey, Texas, Pennsylvania, North Dakota, Georgia, Kentucky, North Carolina, Alabama, Oklahoma), UK (Crawley, London Docklands), and planned European expansion (Norway, Sweden, Spain by end 2025).

CoreWeave was first to deploy NVIDIA Blackwell at scale: 110,000 Blackwell GPUs with Quantum-2 InfiniBand, GB200 NVL72 systems (April 2025), and Blackwell Ultra GB300 NVL72 (July 2025).

Strengths

  • Largest GPU fleet and first-mover on new NVIDIA architectures
  • SUNK (Slurm on Kubernetes) supports 32,000+ GPU jobs with GitOps deployment via ArgoCD
  • Non-blocking fat-tree InfiniBand topology with NVIDIA SHARP (2x effective bandwidth)
  • NVIDIA’s top cloud partner with exclusive hardware co-design relationship
  • Published pricing with up to 60% discounts for committed usage

Gaps

  • Sales-gated access: requires organizational approval, no self-service signup
  • Extreme customer concentration: Microsoft was 62% of 2024 revenue, top two customers 77%
  • Material weaknesses in internal controls disclosed in SEC S-1; remediation expected through 2026
  • High debt load (~$14B) with $310M quarterly interest expense (6x operating profit)
  • Stock fell 30% in November 2025 after guidance cut due to data center construction delays
  • Documentation gaps: custom configurations and large-scale pricing require sales conversations

Best for: Large enterprises needing massive scale, latest NVIDIA hardware, and willingness to work through sales process.


Crusoe

Crusoe

Overview

Crusoe was founded in 2018 with a unique angle: converting stranded natural gas (flared at oil wells) into computational power. Their Digital Flare Mitigation technology captures methane with 99.9% combustion efficiency, reducing emissions by ~99% compared to regular flaring.

The company raised $1.375B in Series E (October 2024) at $10B+ valuation, with investors including NVIDIA, Mubadala, Founders Fund, Fidelity, and Tiger Global. Total funding: $2.64B across 13 rounds.

In March 2025, Crusoe divested its Bitcoin mining operations to NYDIG (which had been 55% of 2024 revenue) to focus purely on AI infrastructure. They’re now the lead developer on the Stargate project’s flagship Abilene campus (OpenAI/Oracle/SoftBank’s $500B AI initiative).

Infrastructure

Crusoe operates 22 data centers across 6 regions with 1.6+ GW under operations/construction and 10+ GW in development. The Abilene, Texas Stargate campus will total 1.2 GW across ~4M sq ft when Phase 2 completes (mid-2026), designed for up to 50,000 GB200 NVL72 GPUs per building. A 1.8 GW Wyoming campus is under development.

European presence includes Iceland (57 MW, 100% geothermal/hydro) and Norway (12 MW, 100% hydroelectric).

Vertical integration through the 2022 Easter-Owens acquisition gives Crusoe in-house data center design and manufacturing capability.

Hardware: GB200 NVL72, B200, H200, H100, L40S ($1.45/hr), and AMD MI300X ($3.45/hr). First major cloud to virtualize AMD MI300X on Linux KVM.

Strengths

  • Energy-first model provides long-term cost predictability and genuine sustainability credentials
  • Vertical integration from power generation through hardware to software orchestration
  • Full platform: Managed Kubernetes (CMK), Slurm, Run:ai integration, Kubeflow
  • SemiAnalysis ClusterMAX 2.0 “Gold” rating
  • Strong AMD GPU support alongside NVIDIA
  • 99.98% uptime SLA

Gaps

  • No native managed object storage; customers must self-manage MinIO or integrate VAST Data/Lightbits
  • Limited geographic footprint (22 data centers vs 30+ for hyperscalers)
  • Energy price volatility exposure: Texas grid crisis (March 2025) saw costs spike 40%
  • Stranded gas supply may decline as world transitions away from fossil fuels
  • Certain GPU types in certain regions require sales discussion

Best for: Teams prioritizing sustainability, AMD GPU access, or participation in Stargate-class infrastructure.


Lambda

Lambda

Overview

Lambda was founded in 2012 by brothers Stephen and Michael Balaban. The company is known for its developer-friendly approach and the Lambda Stack (pre-configured PyTorch/TensorFlow/CUDA environment) used by 100K+ users.

Funding has accelerated: $320M Series C (February 2024), $480M Series D (February 2025) at $2.5B valuation, and over $1.5B Series E (November 2025) led by TWG Global. NVIDIA is a major investor and strategic partner. Lambda is targeting an IPO in H1 2026.

The NVIDIA relationship is notably deep: a September 2024 $1.5B GPU leaseback deal has NVIDIA leasing 18,000 GPUs from Lambda over 4 years, with NVIDIA researchers using the capacity. Lambda is targeting an IPO in H1 2026.

Infrastructure

Lambda operates on a pure colocation model (no owned facilities). Current locations: San Francisco, Allen (TX), Plano (TX), with additional sites across North America, Australia, and Japan (6 total data centers).

The May 2025 Aligned Data Centers partnership added a liquid-cooled facility in Plano, TX (~$700M investment) designed for Blackwell and Blackwell Ultra.

Hardware: HGX B200, HGX H200, HGX H100. 1-Click Clusters scale from 16 to 2,040+ nodes. All clusters include Quantum-2 InfiniBand (400Gb/s per GPU, 3.2Tbps per node) and VAST Data storage integration.

Strengths

  • 1-Click Clusters: instant multi-node provisioning with one-week minimum (no long-term contracts)
  • Simple, transparent pricing: $2.99-4.49/GPU/hour depending on generation
  • Pre-installed Lambda Stack eliminates environment configuration
  • VAST Data partnership for petabyte-scale shared storage with S3 API
  • No egress/ingress fees
  • SOC 2 Type II certified

Gaps

  • GPU availability issues during peak demand (“out of stock” messages common)
  • No free tier or trial
  • No built-in cost allocation/usage tracking by team or project
  • Limited European presence
  • Cross-data-center networking falls back to Ethernet (degrades vs single-cluster InfiniBand)

Best for: Teams wanting fast, simple cluster provisioning without long-term commitments, comfortable with SSH/terminal workflows.


Voltage Park

Voltage Park

Overview

Voltage Park was founded in 2023 with an unusual structure: it’s backed by a $1B grant from Navigation Fund, a nonprofit founded by Jed McCaleb (Stellar co-founder, Ripple co-founder). The mission is democratizing AI infrastructure access.

Leadership includes Ozan Kaya (CEO, ex-CarLotz President) and Saurabh Giri (Chief Product & Technology Officer, ex-Amazon Bedrock lead). The company has ~80 employees and 100+ customers including Cursor, Phind, Dream.3D, Luma AI, and Caltech researchers.

In March 2025, Voltage Park acquired TensorDock (GPU cloud marketplace), expanding their portfolio beyond first-party H100s.

Infrastructure

Voltage Park owns 24,000 H100 GPUs (80GB HBM3e, SXM5) deployed across 6 Tier 3+ data centers in Washington, Texas, Virginia, and Utah. The Quincy, WA facility runs on hydro and wind power.

Hardware runs on Dell PowerEdge XE9680 servers: 8 H100s per node with NVLink, 1TB RAM, dual Intel Xeon Platinum 8470 (52-core each). Quantum-2 InfiniBand provides 3.2Tbps aggregate bandwidth per node, scaling in 8,176 GPU increments.

Next-gen hardware (B200, GB200, B300, GB300) is available for pre-lease with capacity reserved ahead of public release.

June 2025 brought two major updates: VAST Data partnership for enterprise storage and managed Kubernetes launch.

Strengths

  • Competitive pricing: $1.99/hr for H100s with 15-minute spinup, no contracts required
  • Bare-metal access claimed to provide 40% acceleration for LLM training vs managed services
  • VAST AI OS integration: unified file/object/block storage, multi-tenant security
  • SOC 2 Type II, ISO 27001, and HIPAA certified (details at trust.voltagepark.com)
  • Only neocloud partner in NSF NAIRR pilot; donated 1M H100 GPU hours for research
  • 99.982% uptime SLA
  • Nonprofit backing suggests mission-driven rather than pure profit optimization

Gaps

  • Only Ubuntu Server 22.04 LTS supported (no alternative OS, no GUI, SSH only)
  • VM instances limited to 100Gbps Ethernet (vs bare-metal InfiniBand at 3.2Tbps)
  • No data recovery after instance termination; customers must backup externally
  • Historically focused on H100s only (TensorDock acquisition broadens selection)
  • Limited documentation depth
  • Managed Kubernetes only launched June 2025; VM support still in development

Best for: Researchers, startups, and teams wanting low-cost H100 access with VAST Data storage, especially those eligible for NAIRR research allocations.


GMI Cloud

GMI Cloud

Overview

GMI Cloud was founded in 2023 as an offshoot of Realtek Semiconductors and GMI Technology. The company is headquartered in Mountain View, California and raised $82M in Series A funding (October 2024). GMI Cloud has approximately 120 employees.

GMI Cloud is an NVIDIA Cloud Partner, providing access to the latest GPU architectures including H200 and upcoming Blackwell systems.

Infrastructure

Regions include the US (primary) and Asia (Taiwan, Singapore, with Tokyo and Malaysia planned). GMI operates 9 global data centers with capacity for multi-tenant workloads.

Hardware: H200 HGX, B200 HGX (pre-order), GB200 (pre-order), and H100 clusters. All training clusters include 400Gb/s InfiniBand with 3.2Tbps aggregate bandwidth per node.

Storage is powered by VAST Data, providing S3-compatible object storage and NFS shared filesystems with GPUDirect integration for high-throughput data loading.

Strengths

  • GMI Cluster Engine provides managed Kubernetes orchestration for GPU workloads
  • VAST Data partnership delivers enterprise-grade storage with GPUDirect
  • Strong Asia-Pacific presence through regional data centers
  • B200 and GB200 available for pre-order
  • $82M Series A funding provides runway for expansion

Gaps

  • Limited public documentation and pricing transparency
  • Smaller footprint than major neoclouds
  • Less established brand recognition in North America and Europe
  • No Slurm offering documented
  • Compliance certifications not prominently published
  • Early-stage company (founded 2023)

Best for: Teams seeking H200/B200 access with VAST Data storage and managed Kubernetes, especially with Asia-Pacific presence needs.


RunPod

RunPod

Overview

RunPod was founded in 2022 and is headquartered in New Jersey. The company raised $20M in seed funding (May 2024) co-led by Intel Capital and Dell Technologies Capital, following an earlier $18.5M round in November 2023. Total funding is $38.5M. RunPod has approximately 80 employees.

The platform serves 500,000+ developers, from individual researchers to enterprise teams. RunPod’s differentiator is simplicity: GPU instances launch in under a minute with pre-configured ML environments.

Infrastructure

RunPod operates 31+ data centers globally with a mix of first-party and partner infrastructure. The platform offers three deployment models:

  1. Pods: GPU VMs with persistent storage, available on-demand or spot (up to 80% cheaper)
  2. Serverless: Auto-scaling inference endpoints billed per-second
  3. Community Cloud: Marketplace of third-party GPU capacity at lower prices

Hardware: H100 80GB ($2.40/hr on-demand, $1.99/hr spot), H200 ($3.59/hr), A100 80GB, L40S, RTX 4090/3090. InfiniBand is available for dedicated clusters only; standard instances use Ethernet.

Storage: Network volumes ($0.10/GB/mo standard, $0.05-0.07/GB/mo shared), S3-compatible object storage available in 5 data centers.

Strengths

  • Sub-minute instance launch times with one-click templates for PyTorch, TensorFlow, Stable Diffusion
  • Serverless inference with pay-per-second billing and automatic scaling
  • Spot instances at 50-80% discount for interruptible workloads
  • Simple, transparent pricing with no hidden fees
  • Active community and template marketplace
  • RESTful API and CLI for automation

Gaps

  • No managed Kubernetes (container-focused, not K8s-focused)
  • InfiniBand limited to dedicated clusters; standard instances use Ethernet
  • Community Cloud capacity quality varies by host
  • Limited enterprise compliance certifications documented
  • No native Slurm support
  • Multi-node distributed training requires manual configuration

Best for: Individual developers and small teams wanting fast, simple GPU access for inference and single-node training without enterprise overhead.


Hyperstack

Hyperstack

Overview

Hyperstack is the GPU cloud arm of NexGen Cloud, a UK-based infrastructure company founded in 2020. The platform positions itself as a cost-effective alternative to hyperscalers, with pricing 30-75% lower than AWS/Azure/GCP.

NexGen Cloud has invested significantly in GPU infrastructure, partnering with NVIDIA and operating data centers across multiple regions. The company targets AI startups, researchers, and enterprises looking to reduce GPU costs without sacrificing performance.

Infrastructure

Hyperstack operates across 3 regions: CANADA-1, NORWAY-1, and US-1. The platform offers tiered service levels:

  1. Standard Tier: GPU VMs with standard Ethernet networking
  2. Supercloud Tier: High-performance clusters with 400Gb/s Quantum-2 InfiniBand for distributed training

Hardware: H100 ($1.90-2.40/hr), H200 ($3.50/hr), A100, L40S. H100 at $1.90/hr is among the cheapest in the market. B200 and GB200 available via contact.

Pre-configured environments include PyTorch, TensorFlow, and popular ML frameworks. All instances include local NVMe storage.

Strengths

  • Aggressive pricing: H100 at $1.90/hr, H200 at $3.50/hr
  • Supercloud tier provides InfiniBand for multi-node training
  • Simple RESTful API and web console
  • No long-term contracts required
  • Pre-configured ML environments reduce setup time
  • Growing European presence with GDPR-compliant data centers

Gaps

  • InfiniBand only available on Supercloud tier (standard tier is Ethernet)
  • No managed Kubernetes offering
  • Limited documentation compared to larger providers
  • Smaller GPU fleet than CoreWeave, Nebius, or Lambda
  • Storage options less mature than competitors
  • Less visibility into infrastructure topology

Best for: Cost-conscious teams wanting affordable H100/H200 access, comfortable with VM-based workflows rather than managed K8s.


DataCrunch / Verda

Verda

Overview

DataCrunch was founded in 2019 in Helsinki, Finland. In 2024, the company rebranded to Verda to emphasize its sustainability positioning. The company is now headquartered in Iceland.

The core differentiator is 100% renewable energy: Verda’s Icelandic data centers run on geothermal and hydroelectric power, providing genuine carbon-neutral AI infrastructure rather than offset-based claims. The company holds ISO 27001 certification and is GDPR compliant.

Infrastructure

Primary data centers are in Iceland, leveraging abundant renewable energy and natural cooling. The cold climate reduces cooling costs significantly while enabling higher density deployments.

Hardware: H100 SXM5 ($4.75/hr), H200 SXM5 ($4.80/hr), B200 SXM6 ($4.95/hr), B300 ($1.24/hr), A100. All multi-node clusters include 400Gb/s NDR InfiniBand (3.2 Tb/s per 8-GPU node), supporting clusters from 16 to 128 GPUs with InfiniBand.

Storage: Block storage ($0.05-0.20/GB/mo), NVMe shared filesystem ($0.20/GB/mo). Spot pricing available at 50% discount for serverless containers.

Strengths

  • 100% renewable energy (geothermal/hydro), not offsets
  • B300 available at competitive $1.24/hr
  • 400Gb NDR InfiniBand standard on clusters (3.2 Tb/s per node)
  • Natural cooling in Iceland reduces operational costs
  • Strong sustainability credentials for ESG-conscious organizations
  • ISO 27001 certified, GDPR compliant

Gaps

  • No managed Kubernetes offering
  • Single geographic region (Iceland) may cause latency for US/Asia users
  • Smaller brand recognition than US-based neoclouds
  • Cluster sizes limited to 128 GPUs with InfiniBand
  • Bare-metal focus; less abstraction than serverless platforms

Best for: Organizations with sustainability mandates needing genuine renewable energy infrastructure, or European teams wanting low-latency access with strong compliance.


Vultr

Vultr

Overview

Vultr was founded in 2014 as a general-purpose cloud provider, making it one of the more established players in this comparison. The company has expanded aggressively into GPU cloud, becoming an NVIDIA Cloud Partner with both NVIDIA and AMD GPU offerings.

Vultr differentiates through global footprint: 32 data center locations worldwide, more than any neocloud. This enables low-latency inference deployments close to end users. The company is privately held with undisclosed revenue.

Infrastructure

32 locations spanning North America, Europe, Asia-Pacific, South America, and Australia. This geographic diversity is unmatched among neoclouds.

Hardware: NVIDIA H100, A100, L40S, A40. AMD MI300X ($1.85/hr, cheapest in market), MI325X ($2.00/hr), MI355X ($2.59/hr). H100 pricing is $2.99/hr on-demand.

All Cloud GPU instances include 400Gb/s Quantum-2 InfiniBand in non-blocking topology. Bare-metal GPU servers are also available.

Storage: Block storage ($0.10/GB/mo), Object storage ($0.018-0.10/GB/mo with S3 API), NVMe shared filesystem ($0.10/GB/mo).

Strengths

  • Cheapest AMD MI300X in market at $1.85/hr
  • 32 global locations enable low-latency edge deployments
  • Full stack: VKE (managed Kubernetes), bare-metal, block/object/shared storage
  • Strong compliance: SOC 2 (HIPAA), ISO 27001, PCI DSS
  • Both NVIDIA and AMD GPU availability
  • Self-service signup, no sales approval required
  • Established company with 10+ years operational history

Gaps

  • H100 pricing ($2.99/hr) higher than budget neoclouds
  • GPU fleet smaller than CoreWeave, Lambda, or Nebius
  • No Slurm offering
  • Less AI/ML-specific tooling than specialized neoclouds
  • Documentation spread across general cloud and GPU-specific content
  • InfiniBand availability may vary by location

Best for: Teams needing global GPU presence for inference, AMD GPU access, or preferring an established provider with comprehensive compliance certifications.


OVHcloud

OVHcloud

Overview

OVHcloud is a French cloud provider founded in 1999, making it the oldest company in this comparison. Publicly traded on Euronext Paris, OVHcloud reported €1.03B revenue in 2024. The company operates 43 data centers globally with a strong European presence.

OVHcloud’s GPU offerings are part of their broader cloud portfolio. The company emphasizes European data sovereignty, owning and operating all infrastructure without reliance on US hyperscalers. OVHcloud has achieved SecNumCloud certification (French government security standard), making it one of few providers qualified for French public sector AI workloads.

Infrastructure

43 data centers across Europe, North America, and Asia-Pacific. Primary GPU capacity is in European facilities. OVHcloud uses water-cooled systems reducing energy consumption by up to 50%.

Hardware: A100, H100, L4, L40S. H100 pricing requires sales contact. The company focuses on the Private AI offering: dedicated GPU infrastructure managed by OVHcloud within isolated environments.

Private AI includes managed Kubernetes for GPU workloads. OVHcloud’s Managed Kubernetes Service is also available for standard workloads.

Strengths

  • European data sovereignty with no US hyperscaler dependencies
  • SecNumCloud, SOC 2, ISO 27001, PCI DSS, HDS certifications
  • 43 data centers provide extensive geographic coverage
  • Water-cooled infrastructure reduces environmental impact
  • Private AI offering for isolated, dedicated GPU environments
  • 25+ years operational track record
  • Competitive European pricing

Gaps

  • H100 pricing not published; requires sales conversation
  • GPU portfolio smaller than US neoclouds
  • No InfiniBand documented for standard offerings
  • AI/ML tooling less developed than specialized providers
  • Slower to adopt latest GPU architectures (no B200/GB200 listed)
  • Primary focus remains general cloud; GPU is secondary business

Best for: European enterprises with data sovereignty requirements, French public sector organizations needing SecNumCloud certification, or teams preferring established European infrastructure.


FluidStack

FluidStack

Overview

FluidStack was founded in 2017 at Oxford University in London. The company raised $200M in Series A funding (February 2025) led by Cacti, following earlier rounds including a $24.7M SAFE and $37.5M debt financing in 2024. FluidStack operates 100,000+ GPUs under management across a distributed network.

Notable customers include Anthropic (selected for custom data centers in NY and TX), Mistral AI, Character.AI, and Meta. The company reported $180M ARR as of December 2024, up 620% year-over-year.

The model is API-first: FluidStack provides programmatic access to GPU compute without the complexity of managing infrastructure. The company focuses on both training and inference at scale.

Infrastructure

FluidStack aggregates GPU capacity across multiple data centers and partners. This distributed model provides flexibility but means infrastructure specifications vary by location.

Hardware: H100 ($2.10-2.89/hr depending on configuration), A100, L40S. InfiniBand is available at select locations for multi-node training.

Storage partnerships include VAST Data integration at supported locations.

Strengths

  • API-first approach simplifies programmatic GPU access
  • Flexible capacity through distributed network
  • VAST Data partnership at select locations
  • Competitive H100 pricing starting at $2.10/hr
  • Focus on inference scalability
  • No long-term commitments required

Gaps

  • Infrastructure varies by location; inconsistent specifications
  • InfiniBand availability limited to specific data centers
  • No managed Kubernetes offering documented
  • Less transparency into underlying infrastructure
  • Documentation less comprehensive than larger providers

Best for: Teams building training or inference applications needing elastic GPU capacity and API-first integration, especially those wanting access to a large distributed network.


Vast.ai

Vast.ai

Overview

Vast.ai was founded in 2018 as a marketplace for GPU compute. Unlike traditional cloud providers, Vast.ai connects renters with independent GPU hosts, similar to Airbnb for compute. This model enables the lowest prices in the market but with significant variability in infrastructure quality.

The platform is popular with researchers, hobbyists, and cost-conscious startups. Vast.ai serves hundreds of thousands of users running ML training, inference, and rendering workloads.

Infrastructure

Vast.ai is a marketplace, not a traditional cloud provider. GPU capacity comes from:

  1. Data Center Hosts: Professional operators with standardized infrastructure
  2. Individual Hosts: Enthusiasts renting out personal hardware

This creates extreme price variation: H100 80GB ranges from $1.74-1.87/hr (cheapest in market) to $3+/hr depending on host. The platform shows real-time availability, reliability scores, and host ratings.

Hardware: H100, H200, A100, L40S, RTX 4090/3090, and older GPUs. Most instances are Ethernet-only; InfiniBand available only from specific data center hosts.

Docker-based deployments with templates for PyTorch, TensorFlow, Stable Diffusion, and other frameworks.

Strengths

  • Lowest H100 prices in market ($1.74-1.87/hr from quality hosts)
  • Massive selection of GPU types including consumer hardware
  • Real-time availability and pricing transparency
  • Host reliability ratings help identify quality infrastructure
  • Docker-based deployment with pre-built templates
  • No minimum commitments; pay-per-minute billing
  • Good for experimentation and prototyping

Gaps

  • Infrastructure quality varies dramatically by host
  • No InfiniBand on most instances (data center hosts only)
  • No managed Kubernetes or enterprise orchestration
  • Limited enterprise compliance certifications
  • Host reliability can be inconsistent
  • Multi-node training difficult due to fragmented infrastructure
  • No SLA guarantees on marketplace instances
  • Support quality varies by host

Best for: Researchers and hobbyists prioritizing cost over reliability, teams willing to trade consistency for the lowest prices in market.


TensorWave

TensorWave

Overview

TensorWave was founded in December 2023 in Las Vegas by Darrick Horton, Jeff Tatarchuk, and Piotr Tomasik. The company raised $100M in Series A funding (May 2025) led by Magnetar and AMD Ventures, with total funding of $146.7M. TensorWave is an AMD-exclusive GPU cloud provider.

While most neoclouds build on NVIDIA infrastructure, TensorWave bet entirely on AMD’s Instinct line. The company deployed the largest AMD training cluster in North America (8,192 MI325X GPUs) and was first to deploy large-scale direct liquid-cooled AMD GPU infrastructure. TensorWave has over 1GW of capacity and holds SOC 2 Type II and HIPAA certifications.

Infrastructure

TensorWave operates US-based data centers purpose-built for AMD GPUs. The infrastructure uses Aviz ONES fabric for 400Gb Ethernet networking with RoCE (RDMA over Converged Ethernet) rather than InfiniBand.

Hardware: MI300X (sold out), MI325X ($1.95/hr), MI355X ($2.85/hr for reservations). All systems support AMD ROCm for PyTorch, TensorFlow, and JAX workloads.

The RoCE implementation provides RDMA capabilities over standard Ethernet, offering lower cost than InfiniBand while maintaining reasonable performance for distributed training.

Strengths

  • AMD-first specialization provides access when NVIDIA is constrained
  • MI325X at $1.95/hr is competitive with NVIDIA H100 pricing
  • RoCE networking provides RDMA without InfiniBand cost
  • Deep ROCm expertise for AMD software optimization
  • Early access to AMD’s latest GPU generations
  • Lower cost infrastructure through AMD partnership

Gaps

  • AMD ROCm requires workload adaptation (not drop-in CUDA replacement)
  • MI300X sold out; availability constraints
  • RoCE has higher latency than InfiniBand under network congestion
  • No managed Kubernetes offering documented
  • Less ecosystem tooling compared to NVIDIA-focused providers

Best for: Teams with ROCm expertise seeking AMD GPU access, or those willing to adapt workloads to benefit from AMD’s price/performance.


Hot Aisle

Hot Aisle

Overview

Hot Aisle was founded in October 2023 by Jon Stevens and Clint Armstrong, with backing from Joseph Lubin (ConsenSys founder) and Mesh. The founders have decades of technical experience deploying infrastructure across 9 data centers.

The company specializes exclusively in AMD Instinct GPUs, providing an alternative to NVIDIA-dominant neoclouds. Hot Aisle holds SOC 2 Type II and HIPAA certifications, with ISO 27001 planned.

Infrastructure

Hot Aisle operates from the Switch Pyramid facility in Grand Rapids, Michigan (Tier 5 Platinum data center). Infrastructure includes Dell XE9680 servers and Broadcom 57608 networking with Dell PowerSwitch Z9864F spine switches at 400G.

Networking: RoCEv2 delivering 3200 Gbps throughput per node. Per-minute billing with no long-term contracts.

Hardware: MI300X ($1.99/hr with 192GB), MI355X (available for reservations). Configurations range from 1x GPU VMs to 8x GPU bare metal.

Strengths

  • AMD MI300X at $1.99/hr is competitive pricing
  • Early MI355X availability for reservations
  • SOC 2 Type II and HIPAA certified
  • 3200 Gbps RoCEv2 throughput per node
  • Dell/Broadcom enterprise infrastructure
  • Per-minute billing, no contracts required

Gaps

  • Single data center location (Michigan)
  • Small company (founded October 2023)
  • AMD ROCm requires workload adaptation from CUDA
  • Limited GPU selection (AMD only)

Best for: Teams seeking AMD MI300X access at competitive pricing with enterprise compliance certifications.


Nscale

Nscale

Overview

Nscale launched from stealth in May 2024 and has raised significant funding: $155M Series A (December 2024) and $1.1B Series B (2025) led by Aker ASA. Other investors include NVIDIA, Microsoft, G Squared, Dell, Nokia, Fidelity, Point72, and OpenAI. The company is targeting an IPO in 2026.

Nscale focuses on sustainable AI infrastructure, operating data centers in Norway powered by renewable energy. The company has a joint venture with Aker ASA for “Stargate Norway” in Narvik (230MW initial, 290MW expansion planned) and a partnership with OpenAI.

Infrastructure

Owned facilities in Glomfjord, Norway (30MW, expanding to 60MW) and a 15MW lease agreement with Verne for colocation in Iceland (deploying 4,600 Blackwell Ultra GPUs throughout 2026). All facilities leverage hydroelectric and geothermal power with natural cooling.

Hardware: H100, H200, GB200 NVL72, A100, and AMD MI300X (all contact pricing).

Networking uses Nokia 7220 IXR switches (recently upgraded to IXR-H6 with 800GE/1.6TE capability) with RoCE rather than InfiniBand.

Strengths

  • Genuine renewable energy (hydro/geothermal), not carbon offsets
  • Nordic locations provide natural cooling efficiency
  • Strong investor backing (NVIDIA, Microsoft, OpenAI, Aker ASA)
  • GB200 NVL72 capacity available
  • OpenAI partnership via Stargate Norway
  • Nokia partnership for latest networking hardware

Gaps

  • All pricing requires sales contact; no self-service
  • Limited documentation and transparency
  • RoCE has higher latency than InfiniBand under congestion
  • Nordic locations may cause latency for US/Asia workloads
  • No managed Kubernetes or Slurm documented
  • Early-stage company (launched 2024); operational track record developing

Best for: Large enterprises with sustainability mandates seeking renewable-powered GPU infrastructure, especially those interested in OpenAI ecosystem alignment.


SF Compute

SF Compute

Overview

SF Compute (San Francisco Compute) was founded in 2023 by Evan Conrad and raised $40M in 2025 led by DCVC and Wing Venture Capital at a $300M valuation. Other backers include Jack Altman and Electric Capital. The company has approximately 30 employees and recently hired Eric Park (former Voltage Park CEO) as CTO.

SF Compute operates as a GPU marketplace/broker, deliberately avoiding hardware ownership. The platform enables buyers to access compute and resell unused capacity, creating spot and forward markets for GPU compute. SF Compute manages $100M+ worth of GPU hardware through its marketplace model.

Infrastructure

As a marketplace, SF Compute does not own infrastructure but provides access to partner capacity. The platform offers:

  • Kubernetes clusters: 3.2Tb/s InfiniBand, 0.5-second spinup
  • VMs: No InfiniBand, 5-minute spinup
  • Bare-metal: Available upon request

Hardware: H100i and H100v at $1.45-1.50/hr, H200 (requires contact), B300 coming soon. Storage: 1.5TB+ NVMe per node. Free egress with 100% uptime SLA.

Strengths

  • H100 pricing at $1.45-1.50/hr is among the lowest in market
  • Managed Kubernetes with 3.2Tb/s InfiniBand
  • Marketplace model allows resale of unused capacity
  • 24/7 support via Slack, phone, email
  • 100% uptime SLA with automated refund for failed nodes
  • No long-term contracts required

Gaps

  • Marketplace model means infrastructure varies by underlying provider
  • No owned infrastructure (relies on partner capacity)
  • H200 not yet available (requires contact)
  • Limited geographic control

Best for: Price-sensitive teams wanting lowest H100 costs with managed Kubernetes, comfortable with marketplace model and varying underlying infrastructure.


The Platform Layer

Neoclouds provide GPU infrastructure, but most don’t provide what you need to run production AI workloads: dev environments, distributed training orchestration, job scheduling with failure handling, and cost allocation by user and project.

This is the platform layer, and it’s where Saturn Cloud fits. Whether you’re on Nebius, Crusoe, CoreWeave, or your own bare-metal infrastructure, Saturn Cloud provides:

Dev Environments: Hosted Jupyter and IDE environments with pre-configured ML frameworks, SSO integration, and automatic idle detection to prevent GPU waste.

Distributed Training Orchestration: Multi-node training coordination with environment variables pre-configured for torchrun, DeepSpeed, and other distributed frameworks. No manual IP address wrangling or host file management.

Job Scheduling: Massively parallel job orchestration with automatic retry on failure, dependency management, and resource quotas.

Cost Allocation: Usage tracking by user, team, and project. Know who’s using what and allocate costs back to business units.

This is undifferentiated work that every AI team needs but doesn’t provide competitive advantage. Saturn Cloud handles the commodity platform layer so your infrastructure team can focus on what actually differentiates your business.


Last updated: December 2025. Pricing and features change frequently. Verify current offerings on provider websites before making decisions.