AI

Meta is building a data center the size of Manhattan. How cool is that?

By Kevin Roof

Meta isย spending hundreds ofย billions of dollars onย a string ofย multi gigawatt AIย dataย centersย that will beย less like traditional businessย facilities, butย cities in their own right.ย ย 

Thatโ€™sย notย hyperbole.ย Mark Zuckerbergย in Julyย said that its Hyperion dataย centerย would โ€œbe able to scale up to 5GW over several years, and โ€œmultiple moreโ€ of these titan clustersย would be built.ย 

โ€œJust one of these covers a significant part of the footprint of Manhattan,โ€ the Facebook founderย said. To prove his point, heย postedย a gif thatย illustrated precisely how oneย of theseย massiveย facilities would fit across theย sliver of land that is home to 1.66 million people.ย 

Butย ifย runningย a city likeย Manhattanย is a challenge,ย keepingย itย cool is even harder.ย ย 

Manhattan is a classic example of the phenomenon where cities are several degrees hotter than surrounding areas, thanks to the cityโ€™s โ€œbuiltย environment.โ€ย [urban heat island effect]ย 

Andย thatโ€™sย without running millions of GPUsย and associated equipment.ย 

We know that dataย centerย operatorsย already face a massive challenge when it comes to managing the heat their facilities produce โ€“ andย calmingย the publicโ€™s concern over theirย potentialย environmental impact.ย 

So, how should Meta approach the cooling challengeย for these truly titanic dataย centers?ย Here are some thoughts basedย on our experienceย of cooling at scale.ย 

Calculating the optionsย 

Currentย cutting-edgeย dataย centerย designs stretch into the 100s ofย kWs, andย we can safely assume that Meta is laying its plans withย NVIDIAโ€™sย roadmap in mind, and that envisions 1MW racks.ย ย 

Traditional evaporative cooling isย consideredย a non-starter for dataย centersย of this scaleย except for peripheral activities.ย Just the amount of water needed would be astronomical, tens of millions of Olympic sized swimming pools.ย 

So liquid cooling is the mostย practicalย optionย forย the majority ofย cooling in theseย nextย generationย ย dataย centers.ย ย 

But what type of liquid cooling?ย Immersionย does have an edge when it comes toย pureย cooling potential.ย But is complex andย relatively inflexible.ย Direct to chip is theย most likely optionย here. Indeed,ย NVIDIAโ€™sย own reference designs lean towardsย direct to chipย liquid cooling for itsย cutting-edgeย designs.ย ย ย 

Opting forย direct to chipย liquid cooling is the easy part though.ย Metaโ€™s engineersย thenย face the challenge of implementing it at titanic scale.ย Taking the right approach from the outset could makeย theirย livesย easier, for years into the future.ย 

Just consider the amount of kit involved. Whether you do someย sums on the back of a napkin orย dropย a few prompts into ChatGPT,ย Metaโ€™s planned dataย centersย are likely to houseย between one and two million GPUs spread across ten to twenty thousand racks,ย alongย with associated networking and storage.ย 

That is an incredible amount ofย equipmentย toย deploy andย manage.ย And a correspondingly massive amount of coolingย infrastructureย to deploy andย maintainย too.ย And there willย likely beย failures or upgrades that require components to be replacedย over time.ย 

So,ย when designing this cooling infrastructure,ย Meta will need to consider not just the amountย of heatย it must shift today, or next year, but generations of silicon into the future.ย 

This meansย Meta willย needย a cooling solution, or solutions,ย whichย doesnโ€™tย just offer raw cooling capability.ย It needs a liquid cooling approach thatย canย evolve alongside its plans.ย ย 

Even someone with the vast resources of Metaย wonโ€™tย want to tie-up capital or delay deployingย racks while it forย cooling componentsย to be delivered. But neither will it wantย cooling equipmentย satย in storage, waiting to be deployed, orย in place, butย underutilized.ย ย 

It needsย partnersย who can supply at scale, continuously, and which haveย an engineering ecosystem capable of deploying and install cooling at the same pace it rolls off the production line.ย 

Even better,ย Metaย shouldย chooseย aย liquid cooling architectureย that can be deployed incrementally. Why wait until the whole of Manhattan is covered inย racks before flicking the onย switch?ย Why not start serving customersย or refining modelsย whenย the equivalent of Central Park is ready to go online.ย 

A liquid vision?ย 

Thatโ€™sย just the start of the dataย centerย lifecycle though.ย From that point on manageability is key. The racks of compute and storageย in a dataย centerย areย designed with redundancyย in mind. Components are standardized andย hot-swappableย as far as possible.ย so,ย Zuckerbergโ€™s engineers should ensure the sameย appliesย across the infrastructure keeping it all cool.ย ย 

A decoupled system โ€“ where the control console is a in a separate unit for example โ€“ will give Meta more options when it comes to scaling up over time.ย If there are faults in oneย component, itย wonโ€™tย need to replace the entire unit.ย So, cooling units andย individualย components, such as pumps or sensors, should beย speccedย in the same way.ย ย 

And of course, this will be all for nothing, if those components are not easily accessible โ€“ and spares readily available.ย Front access will beย critical andย will also allow more flexibility when itย comeย to positioning units and designingย the aisles.ย 

Presumably,ย Metaย will have the brain and compute power to developย cutting edgeย predictiveย analyticsย to manage maintenance.ย But it will help immensely if the cooling architecture it chooses can serve up the right, real time informationย to inform those systems.ย 

At a broader level,ย theย heatย the cooling system extractsย mustย stillย go somewhere,ย Municipalย authorities mayย be grateful for the jobs a dataย centerย willย bring butย may notย heย happy if itย createsย microclimates in the immediate area. Environmental groupsย andย regulatorsย will beย on the watch for any adverse environmental impact.ย 

Meta should be thinkingย from the outset about how thisย energy can be repurposed, whether for district heat systemsย or other industrial usesย such as factories or heating greenhouses.ย A cooling partner withย a broader vision of how heat can be reused,ย andย the engineering skills toย makeย thisย happen, will beย a great helpย here too.ย 

Metaโ€™s datacentre ambitionsย are more than justย a massive test of its resolve to dominate AI.ย Theyย representย an engineering bet of truly titanic proportions. And that means theyย also present the biggest test yet for the liquid cooling sector.ย Itโ€™sย in all ourย interests thatย it passes.ย 

Author

Related Articles

Back to top button