AI & Technology

Routing cuts your compute bill. It can’t touch the root cause of exploding AI costs: discovery.

By Mario Moscatiello, Head of Growth, Airbyte

Multi-engine routing makes each agent query cheaper. That’s awesome, but agents shouldn’t be running so many queries in the first place. 

For years, the thing keeping data leaders up at night was the cost of serving all the analysts querying their data warehouse. Things have changed. Now, the biggest consumer of enterprise data is no longer analysts, but AI agents. 

Barry McCardel, the CEO of Hex, pointed out in March that agents now create more cells in his platform than humans do. MotherDuck recently wrote about getting ready for a thousand agents to hit their lakehouse at once. Everyone is seeing the same thing: the traffic is coming, and it’s coming fast. 

A recent guest post on the Orchestra newsletter written by the CEO of a routing startup pitched a fix for this: multi-engine routing. Instead of firing every agent query at your data warehouse, you route it to whatever engine can answer it most cheaply. Right now that’s usually DuckDB over your Iceberg tables instead of the warehouse itself.  

Smart idea, and it does exactly what it promises. But it’s not enough to fix your AI agent economics.  

The diagnosis is accurate. Ask an agent which of your enterprise accounts are most likely to churn this quarter, and it won’t fire one query. It’ll fire six, maybe twelve, before it lands on an answer. Multiply that across every agent and every question, bill it per credit, and you’ve got a line item for cost that gets scarier every quarter as costs mount up. 

Routing does solve part of the problem by reducing query costs. But the deeper issue is that agents are running this many queries in the first place.  

That’s what we call the discovery problem and it’s the actual root cause of these exploding AI bills. Routing makes the queries cheaper to run. I’d argue agents shouldn’t be running so many queries to begin with. 

Why one question becomes twelve queries 

Hand an agent a question of any real complexity and it starts from zero. It doesn’t know which systems hold the data. It doesn’t know what your tables are called. It has no idea what any of the fields actually mean. So it goes digging, one redundant round trip at a time: 

  1. List the tables in a database; 
  2. Inspect the schema of one of those tables; 
  3. Pull a sample of the records in that table to determine what the fields mean; 
  4. Determine how to join those tables together to answer the question; 
  5. Build the query to execute; 
  6. Review the records that were returned to make sure the query actually answers the question; 
  7. If not, go back to steps 4 and 5, fix the query, and run it again. 

That whole loop is the discovery problem, and it’s where your money goes. It’s not the price of any single query. It’s the sheer number of redundant ones an agent has to run because it doesn’t have a way to understand where relevant context actually resides. Route them to a cheaper engine if you like. You’ve still got the agent running every last one of them. 

Routing reprices the symptom and leaves the disease 

Multi-engine routing sees the discovery problem and tries to make it cheaper to live with. For big analytical queries, sure, sending them to an engine that costs less than your warehouse is a good call. But for the discovery queries, the ones that make up the bulk of agent traffic, routing changes nothing about the problem itself. 

Two things happen when you route. First, your cost per query drops, but it still scales with the total number of queries your agents fire. Routing doesn’t shrink that count. If agent traffic increases your query volume by 5, you’re still running 5 times the queries. They’re just cheaper now. 

Second, routing only works inside the warehouse behind your lakehouse. But your agents are querying your entire business: the open deal in your CRM, the unresolved ticket in Jira, the long Slack thread where your customer spelled out what they actually wanted. None of that lives in your warehouse. Routing can’t touch it. 

The Context Store closes the discovery gap 

Discovery is exactly what we built the Context Store to solve. The idea is simple: hand the agent the resolved context up front so it never has to go rediscover it. 

The data your agent touches across all your business systems gets replicated into the Context Store, where it can find the entities it needs. Replication runs continuously, the data refreshes hourly, and you control access at the organization level so the agent only sees what it should from each source. 

So, instead of poring through a list of discovery, schema, and validation queries every time someone asks a question, the agent searches the Context Store for the entities and attributes it needs and gets them back in milliseconds. It only reaches out to an external system when it needs something the store doesn’t already hold. 

Take that same question: “Which of your enterprise accounts are most likely to churn this quarter?” No listing tables or inspecting schemas. In fact, no building or running queries at all. The agent just asks the Context Store, which already resolved all of that ahead of time, and gets its answer. 

And when 10 different people ask that same question? You don’t get 10 times the queries. They all draw on the same resolved context. The bursty, unpredictable agent traffic the original article flagged as the real risk becomes your most predictable traffic, because the discovery queries behind it are simply gone. 

The Context Store has already shown promising results. 40% fewer tool calls, 80% fewer tokens consumed, and 90% lower costs on multi-source queries. 

Our benchmarks are public if you want to review them for yourself. 

This isn’t an argument against your data warehouse 

To be clear, this isn’t me telling you to rip out your warehouse. To the contrary, for a lot of workflows data warehouses are still the gold standard, and Airbyte still powers a ton of analytics use cases for customers. When it makes sense to route big analytical queries to a cheaper engine, do it. 

But the bulk of agent query traffic isn’t analytical. It’s discovery. Routing makes those queries cheaper to run. The Context Store eliminates them entirely.  

Efficient query engines are genuinely useful, and we lean on them ourselves. But ultimately, if you want to manage costs, you need to reduce query volumes at source.  

Your agents are going to eat more of your data stack every year from here on out. You can spend the next two years buying tools that make each round trip cheaper. Or you can give your agents the context they need to answer the question without ever making the trip.  

I know which bet I’d make. 

Author

Related Articles

Back to top button