AI

Running AI Workloads Is Getting Extremely Expensive—Here’s How to Keep Costs Down

By Heidi Kay Carson, Pepperdata, Inc.

Industry surveys repeatedly find a third of cloud spend goes wasted every day, equating to hundreds of billions of dollars globally every year.  

The problem is only worsening as enterprises scale their cloud usage and increasingly adopt compute-intensive technologies like AI and ML. Graphical Processing Units (GPUs), once the domain of high-performance gaming, have become the resource of choice for large AI workloads, and cloud providers often charge a premium for GPU instances. What’s worse, GPU resources often go to waste, with only 7% of companies believing their GPU infrastructure achieves more than 85% utilization during peak periods.  

As organizations begin to expand their scope of running AI applications, managing these new expenses is becoming a key focus for organizations operating at scale.  

Increases in Costs Driven by Increases in Consumption 

Think of cloud computing like the self-service food bar at the supermarket: every option you choose costs you (and sometimes quite a lot) whether you consume it or not. But, unlike a food bar, cloud computing is incredibly complex and highly dynamic, down to the millisecond level or lower.  

Overprovisioning of resources like GPU, CPU, and memory to ensure workloads run to completion is common, as is having unused resources that were spun up and either not used or forgotten about. Real-time resource management can be extremely challenging, if not impossible, especially if attempted manually. 

AI applications promise massive ROI on the surface level, but deploying and managing the resources that they need can erode that ROI. Automated, dynamic resource optimization is becoming essential in today’s large-scale data environments.  

Industry Use Cases That Can Cause GPU Costs to Skyrocket 

Financial Services 

Generative AI within Financial Services currently has more AI investments than any other industry. From risk management to cyber fraud to personalized mobile banking, GenAI promises a swath of use cases that are expected to revolutionize banking. However, if cloud resource waste continues at 30% or more, then the costs to run these applications could potentially outweigh the actual business benefit. 

Advertising Technology 

With audience segmentation, personalized messaging, and bot detection, AdTech companies are adopting AI for more effective campaign targeting and management. Improved campaign performance means fewer wasted impressions, fewer irrelevant ads, and better ROI. In theory, that means ad spend is more efficient and less budget is wasted. However, just like the FinServ use cases, if the AI models and pipelines are not well aligned with the business value, the ROI of new GPU-dependent AdTech projects may be poor, leading to wasted costs. 

Automated Solutions to the Rescue 

Cloud cost optimization platforms leverage advanced algorithms to continuously monitor GPU, CPU, and memory resource utilization, identify inefficiencies, and autonomously adjust configurations to optimize spending without manual tuning or other human intervention. This proactive approach ensures that resources are allocated precisely as needed, preventing wasteful overprovisioning and reducing costs. 

  1. All the major cloud computing platforms now offer proprietary tools designed to provide granular visibility into spend, down to the node level, and on a minute-by-minute basis. These tools typically include highly detailed visualizations that allow operators to pinpoint the precise drivers of their cloud costs, such as specific services, projects, or individual usersOperators can define spending limits for various aspects of their cloud infrastructure and receive automated alerts when actual costs approach or exceed these predefined thresholds. Many cost issues associated with running AI workloads stem from a simple lack of visibility. 
  2. Automated cloud FinOps platforms are transforming how organizations manage their cloud infrastructure costs. They provide real-time, granular visibility into an organization’s entire cloud spending across various providers and services. Beyond simple reporting, they leverage advanced algorithms to proactively identify and automatically implement significant cost-saving opportunities, for example, through the purchase of committed-use discounts (CUDs)Savings Plans, and preemptible VMs, all of which can significantly reduce cloud costs.  
  3. Kubernetes automation tools automatically rightsize workloads, “bin-pack” resources (filling nodes to maximize utilization), and dynamically scale clusters up or down based on real-time demand. This ensures that resources are allocated and consumed in the most efficient manner possible. Their core functionalities typically include: 
  • Automated workload rightsizing: Monitoring and adjusting resource allocations for workloads, including CPU, memory, and GPU requests and limits to ensure optimal resource allocationespecially crucial for AI workloads with variable demands 
  • Efficient bin packing: Maximizing node utilization by intelligently scheduling pods to fill nodes completely, reducing the number of required virtual machines or physical servers 
  • Dynamic autoscaling: Scaling clusters up or down in response to real-time demand 

Automated solutions such as these transform GPU resource management in the cloud from a reactive, manual process into a proactive, intelligent, and automated one. By integrating one or more automation capabilities, organizations can achieve a continuous optimization loop within their AI workload environments, translating into not just reduced spend, but also improved performance and reliability. 

The Imperative to Invest in Cloud Resource Management 

Global cloud spend is expected to exceed $1 trillion by 2027, driven in no small part by AI. While AI comes with significant costs, the greater risk of course lies in ignoring it and falling behind. Organizations that continue to invest in AI today are establishing the data, talent, and infrastructure crucial for future leadership. Equally important is the commitment to tools, techniques, and processes that effectively manage the increase in cloud spend driven by AI. Even a small percentage of waste savings in an GPU environment can have outsized financial impact, enabling further investment for growth.  

Author

Related Articles

Back to top button