AI

Autonomous but Not Infallible: Your AI Agents Require Oversight

By Howard Beader, Vice President of Product Marketing, Catchpoint

Justย asย youย were beginning toย get the hang ofย generative AIย as a concept, along comesย agentic AI,ย GenAIโ€™sย evenย moreย free-thinkingย cousin.ย The rise ofย agentic AI, or machine learning models that mimic human decision-making to solve problems in realย time,ย is transforming how businessesย operate.ย ย 

Unlike traditional AI,ย agentic AIย goes beyondย assistanceย toย make decisions, completeย tasksย and communicateย autonomously with other systems.ย According toย EYโ€™s US AI Pulse,ย 97% of senior leaders investing in AI are seeing positive ROI from AIย throughoutย business functions,ย across industries.ย Furthermore,ย 34% of senior leaders reportedย they’reย already looking ahead toย what’sย nextย and implementing agentic AI technology.ย When it comes to implementation, agentic AI isย largely beingย deployed in administration and supportive roles: Customer support, IT and cybersecurity being its primary reported applications.ย ย 

These agentsย are poweringย everything from supply chains to customer satisfaction, but when they fail, the impact is immediateย and often costly.ย The shift from generative to agentic AI is happening at lightning speed, but not all businessย leadersย are ready for the true complexity it bringsย to their tech stack.ย ย 

The Trueย Vulnerabilities of Agentic AIย 

Autonomous systemsย canย boostย efficiency, butย theyโ€™reย built on fragile foundationsย likeย APIs, cloud infrastructure,ย databasesย andย third-party tools.ย Agentic AIย in particularย reliesย on a web of third-party services, meaning a single point of failure anywhere in the chain can derail an operationย entirely.ย Andย without the proper protections in place,ย andย organizationย may not noticeย itsย weak points until thereย isย a full-blown incidentย impactingย customers or crippling business operations.ย ย 

The impact ofย thoseย disruptions on the bottom line is well-recorded: A recent Forresterย study of eCommerce companies found thatย 42% ofย retail executivesย reported losing $6M+ annuallyย inย Internetย disruption costs.ย More broadlyย across industries,ย 51% of companiesย beyond retailย realized even steeper costs to disruptions: $1M+ย from monthly incidents.ย 

As AI adoption grows, each new dependency multiplies the risk.ย You may recallย June’s ChatGPTย outage, when aย globalย OpenAIย disruptionย left businessesย that relied on its APIs unresponsive.ย Despite theย optionย for alternative AI engines, onceย youโ€™veย built business operations aroundย oneย option,ย itโ€™sย virtually impossibleย to switch the foundation of your AIย infrastructureย fast enough to react to an ongoing issue.ย ย 

When one link breaks, the fallout is immediate, includingย haltedย or slowedย operations, lostย revenueย and damaged trust.ย The impactย of agentic AI vulnerabilitiesย will comeย down to how quickly you canย detect,ย identifyย and resolve the issue.ย 

AI Can Fail Quietlyย 

A recent reportย found thatย 57% of businessesย noticeย AI issues in real-time, butย an alarming 43% rely on alerts,ย userย complaintsย orย other forms ofย delayed discoveryย to surface issues.ย That reactive strategy puts organizations at risk, since they are not proactivelyย monitoringย what could go wrong.ย Even in 2024,ย 75% ofย surveyed workersย were already relying on AIย to support their workflows. If Internetย disruptionsย costย $1M+ per month for enterprises,ย AI outages could costย significantlyย more due toย additionalย productivity loss.ย 

Youย mayย hope that whenย your companyโ€™s infrastructure experiences a disruption, internal or external,ย yourย IT team will catch it fast enough to mitigate impact on users. Hopeย isnโ€™tย enoughย with increasingly distributed infrastructure and with the advent of AI complicating dependencies further.ย When autonomous agents fail,ย even if your team isย immediatelyย notified,ย the problemย mightย not beย obvious. You knowย somethingโ€™sย broken: Now, is it your system, your AIย providerย or a buried issue in the network path?ย Pinpointing the root cause without full visibility isย nearly impossible,ย andย unfortunately,ย traditionalย monitoring toolsย areย stillย catching up toย this level of complexity.ย ย 

Here’s Whereย Traditional Monitoring Falls Shortย 

AI autonomy does not equal AI stability.ย Agentic AI is alluring, and for good reason:ย it is making unmatched leaps inย personalized customer serviceย andย automatedย logistics.ย Every action the agent takes depends onย a number ofย other factorsย workingย seamlessly.ย Often, thoseย other factors are out of your control.ย ย 

Agentic AI is just the latest example of networks distributed across external environments, butย itโ€™sย a critical one.ย Asย many agents rely on foundational LLMsย in addition to an organizationโ€™s internal infrastructure,ย eachย emergesย asย a potential failure point.ย Ifย youโ€™reย putting something asย importantย as customer support in agentic AIโ€™s hands, you need to beย monitoringย theseย evolving platformsย as emerging risk vectors, from every angle.ย 

Building Resilience into theย Internetย Stackย 

To mitigate the risksย in the age of agentic AI,ย itโ€™sย time we redefine resilience. While theย challengesย are real, the answerย isnโ€™tย pulling back on AI. The answer isย achieving real visibility intoย whatโ€™sย happening behind the scenesย so that agentic AI canย do its assigned task withoutย disruptions.ย 

In order toย achieveย trueย visibility, an organization needs to incorporateย Internet Performance Monitoring toย monitorย third party systems,ย LLMโ€™s, andย APIโ€™sย for theirย AI workflows.ย Some Internet Performance Monitoring toolsย areย keepingย pace with AI adaptations โ€“ย offeringย AI-specific monitoring capabilities that address these challenges directly.ย Everyย organization implementing agentic AI must also implementย tools designed toย catchย issuesย within the agentic ecosystemย before they become incidents.ย ย 

Strategies for Ensuring Resilientย Autonomous Agentsย 

When implementing agentic AI in workflows and business operations, regardless of industry, there are a fewย best practices to ensure you keep the agentic AIย lightsย on:ย 

  • Define customer-focused SLOsย (Service-Level Objectives)/XLOsย (Experience-Level Objectives)ย to guide resilience prioritiesย 
  • Map andย monitorย all dependencies, includingย internal systems and third-party servicesย 
  • Prepare for failure with automated alerts and clearly outlinedย incident response plansย 

Agentic AI relies onย a living, breathingย setย of dependencies, and thus requires a living, breathing monitoring strategyย and response planย forย when even one of them falters.ย 

The future of businessย mightย beย autonomous,ย butย it still requires supervision.ย Agentic AIย wonโ€™tย raise the bar for resilience unlessย the humansย integratingย itย do.ย To ensure business continuity and protectย user experiences, businesses must treat AI like any other critical system:ย withย proactivityย and a plan forย missteps.ย Autonomousย doesnโ€™tย necessarily mean invincible, and whileย itโ€™sย tempting to โ€œset it and forget itโ€ with increasingly capable agentic AI, a thoughtful integration strategy that incorporates full-stack visibility is a necessity to keep things running smoothly.ย 

Author

Related Articles

Back to top button