On the way to the AI revolution, the industry made a costly miscalculation. The Neocloud economy, a class of purpose-built GPU cloud providers estimated at $35 billion and projected to approach $240 billion by 2031,1 was built almost entirely around a single obsession: acquiring GPUs. The arms race was measured in chips per rack.

Nobody was talking about storage.

That silence has become very expensive. In the span of eighteen months, NAND flash prices surged to an order of magnitude above the cost of hard disk drives on a per-TB basis.2 Leading NAND vendors have indicated tight supply capacity through 2026. Storage, once a quiet 10% line item in AI infrastructure budgets, is now trending toward 20 to 30% or more in all-flash deployments, and it is growing faster than any other component of the stack. For Neocloud operators who bet their entire data storage infrastructure on ‘all-flash’ architectures, this is the bill coming due for a planning assumption that no longer holds: that flash storage was cheap, abundant, and power efficient. It is none of those things.

The Economics That Changed Everything

The all-flash storage era, it turns out, was a product of temporarily favorable economics, not architectural wisdom. When NVMe SSDs were affordable and plentiful, it made sense to throw flash at every storage problem. The default position of going all-flash because it was easy was, for a brief window, economically defensible.

That window has closed. Phison CEO Khein-Seng Pua described NAND production capacity as effectively allocated through 2026.3 Silicon Motion CEO Wallace Kou called the current supply environment unprecedented, with HDD, DRAM, HBM, and NAND all facing severe constraints simultaneously.4 Goldman Sachs research projects DRAM prices rising by double-digit percentages quarter-over-quarter throughout 2026.5

The math is stark. At representative street pricing, a 122.88 TB QLC NVMe SSD costs roughly ~$27,000. A 7.68 TB drive from the same generation delivers comparable sequential throughput, approximately 14.5 GB/s, for around ~$1,800. For a 4,096-GPU cluster on NVIDIA’s Enhanced specification, the flash bill ranges from $600,000 to $9.6 million depending on which SSD capacity you select. The throughput is effectively the same. The only variable is how much cold data you are willing to store on premium media that delivers no additional throughput for the privilege.

This is the question that separates operators who survive consolidation from those who do not: Are you buying flash for performance, or are you buying it out of architectural inertia?

What Google, Meta, and Microsoft Already Know

The largest AI builders in the world do not run all-flash storage. Google, Meta, and Microsoft commonly deploy mixed-tier architectures with intelligent tiering, using just enough NVMe flash to saturate GPU throughput requirements, then draining data to high-density HDDs as fast as the physics allow. They have known for years what the Neocloud market is now learning the hard way: flash is a performance medium, not a capacity medium.

This is not a philosophical preference. It is an economic imperative driven by the physics of AI workloads. NVIDIA’s DGX storage guidance specifies that text-based LLM training requires approximately 0.5 GB/s of read throughput per GPU, while physical AI and visualization workloads require approximately 4 GB/s of reads and 2 GB/s of writes per GPU.6 A storage architecture that delivers 5.8 GB/s of measured read throughput per SSD in internal testing7 needs roughly 353 drives to saturate a 4,096-GPU cluster on the Enhanced tier. An architecture that delivers approximately 1.9 GB/s per SSD based on published deployment configurations, because it interposes an intermediary between the drive and the client, needs over 1,000. Three times the SSDs means three times the power, three times the rack space, and three times the capital. At $12,000 per 30 TB drive, that is not a rounding error. It is a business-model-level decision.

The Power Equation Nobody Planned For

NVIDIA’s own BlueField-4 architecture documentation states it plainly: power availability is the primary constraint for scaling AI factories, making energy efficiency a defining metric.8 AI workloads now consume 40 to 250 kilowatts per rack, compared with 10 to 15 kW for conventional compute.9 Every watt consumed by an inefficient storage layer is a watt that cannot power a GPU.

The performance-per-watt differential across storage architectures is not marginal. It is multiplicative. Based on published specifications and comparable configurations, delivering approximately 1,340 GB/s of read throughput, one architecture burns 55 kW while another achieves similar output at roughly 16 kW. That represents a 3.4x difference in performance per watt. In a power-constrained data center, which describes nearly every Neocloud facility being built today, this translates directly into GPU capacity. Wasted storage watts are GPUs you cannot deploy.

And this is before accounting for the hidden resource taxes imposed by certain storage clients. Some architectures require 5 GB of DRAM and one to four dedicated CPU cores permanently locked per GPU node to achieve peak storage performance. Across a 500-node cluster, that adds up to 2.5 TB of DRAM and up to 2,000 CPU cores permanently unavailable to AI workloads. When you are paying $30,000 or more per GPU, every stolen core and every gigabyte of locked memory is a direct tax on the compute investment that is supposedly the entire point of the infrastructure.

SLAs: The Battleground That Storage Built

The Neocloud market is entering a phase where GPU counts no longer win deals. The SemiAnalysis ClusterMAX 2.0 rating system, an increasingly influential benchmark for evaluating GPU cloud providers, makes this explicit: SLAs are a critical piece during pricing negotiations between customer and provider.10 CoreWeave and Oracle already offer 99% rack-level uptime.11 Providers without competitive SLAs are losing deals today.

But here is the uncomfortable truth that most Neocloud operators have not internalized: storage is the single largest blast radius in any GPU cluster. If shared storage fails, every GPU rack breaches SLA simultaneously. A 5,000-GPU cluster with 98% storage availability does not deliver a marginal SLA shortfall. It produces 876,000 GPU-hours of lost compute per year, roughly $2.6 million in idle GPU costs alone, plus SLA credits owed on all 50 racks at once.

The durability story is even more sobering. Enterprise customers benchmark against the hyperscalers: AWS S3 offers 11 nines of durability, Azure Blob offers 12 or more.13 Legacy HPC storage architectures built on local RAID can fall below 5 nines of durability at scale, depending on drive failure rates and rebuild windows, potentially resulting in thousands of files lost per year across a billion-file corpus. Modern network erasure coding with multi-level protection can push durability well past 11 nines.14 The gap between these two realities is not incremental. It is the difference between a storage system you can underwrite an SLA against and one you cannot.

Consolidation Is the Storage Test

The Neocloud market in 2026 is entering its first real reckoning. First-generation infrastructure from the 2021 to 2022 GPU deployment wave is hitting depreciation limits, forcing fleet-wide replacements. H100 rental rates have dropped significantly from their peak, with market data suggesting declines of 60% or more. The era of rewarding companies simply for amassing GPUs is definitively over. The market is shifting from “build it and they will come” to “prove the return on invested capital.”

In this environment, the operators who survive will be those who have solved the total cost of ownership equation, not just the GPU procurement equation. And the answer is not simply buying a flash layer and bolting on a separate object store for cold data. That approach, which has become the default for operators who recognize the economics but have not rethought the architecture, introduces a second software stack, a second data plane, and external data movers that create networking complexity, operational overhead, and performance bottlenecks between tiers. It is a workaround dressed up as a strategy.

The hyperscalers solved this differently. Google, Meta, and Microsoft run SSD and HDD within the same software stack, the same data plane, with high-performance native tiering and no external data movers. Data flows between flash and disk as a first-class operation inside the storage system, not as a batch job managed by a separate tool. That is the architecture that keeps storage closer to 10% of the infrastructure budget while still saturating every GPU.

Software-defined architectures that can ride independent cost curves for flash and disk. High performance tiering that keeps flash thin and hot while HDD handles density and cold data. Self-healing systems that maintain six-nines availability without specialized administrators performing manual recovery at 3 AM. One control plane, one data plane, one namespace.

This is the hyperscaler playbook, finally available to every AI factory. It is what separates infrastructure built for day-one benchmarks from infrastructure built for year-three economics.

What Comes Next

The Neocloud economy is not slowing down. Forrester projects $20 billion in Neocloud revenue this year.15 Mordor Intelligence expects the market to reach $236 billion by 2031. Microsoft, Google, Amazon, and Meta have committed more than $380 billion in combined infrastructure spending for 2025 per public earnings guidance.16 The demand for GPU compute has never been higher.

But demand for GPU compute is, at its core, demand for a functioning AI factory. One where every GPU-second is productive, every checkpoint is protected, every inference request is served at low latency, and every SLA is backed by infrastructure that can actually deliver on the promise. Storage is not peripheral to this equation. It is foundational.

The providers who recognize this, who treat storage architecture as a first-class infrastructure decision rather than a procurement afterthought, will be the ones still standing when the consolidation wave recedes. The question every Neocloud operator should be asking is not how many GPUs they can acquire. It is how much useful work each GPU actually produces per dollar, at scale, over time, through failures.

That question has always been, and will always be, a storage question.

Ken Claffey, CEO at VDURA, and Matt Swalley, Senior Director at VDURA

VDURA is modern data storage infrastructure software purpose-built for AI and HPC. Its HYDRA architecture delivers hyperscaler-grade storage on commodity hardware for the world’s most demanding GPU environments.

Author

AIJ Thought Leader

View all posts

AIJ Thought Leader 2 days ago

7 minutes read

The $35 Billion Blind Spot: How Storage Became the Silent Tax on Every GPU in the Neocloud Economy

By Ken Claffey, CEO, VDURA and Matt Swalley, Senior Director, VDURA

Nobody was talking about storage.

The Economics That Changed Everything

What Google, Meta, and Microsoft Already Know

The Power Equation Nobody Planned For

SLAs: The Battleground That Storage Built

Consolidation Is the Storage Test

What Comes Next

Author

Nobody was talking about storage.

The Economics That Changed Everything

What Google, Meta, and Microsoft Already Know

The Power Equation Nobody Planned For

SLAs: The Battleground That Storage Built

Consolidation Is the Storage Test

What Comes Next

Author

Related Articles

How Does AI Enhance Marketing Insights from Large Datasets?

AI in HR: How Intelligent Corporate Wellness Platforms Are Transforming the Modern Workplace

AI-Proof Your Career: Don’t Get MAD at AI — Get MAD With It

How SMBs Are Leveraging AI to Compete: The New Reality for Small Businesses