AI & Technology

The Pre-Selection Layer: Routing Intent Before the Model Sees It

By Martin Lucas is founder and CEO of TheTMXGroup.com and inventor of SDCI™ — Synthetic Deterministic Cognitive Intelligence.

The architectural mistake that defines current AI deployment is sending everything to the model.

Every query. Every interaction. Every decision. Every routing question. Every classification task. All of it routed to the same probabilistic substrate, which then has to figure out what kind of question it is being asked before it can answer.

This is expensive. It is slow. It is non-deterministic. It is unnecessary.

The fix is a pre-selection layer.

What a pre-selection layer is

A pre-selection layer sits between the user input and the model. Its job is to examine the input, classify the intent, score the confidence, route the request to the appropriate execution path, and pass to the model only the cases that genuinely require generative work.

This is not a router in the trivial sense. It is not “if input contains the word ‘invoice’, send to invoice handler.” That kind of brittle keyword routing has been around for decades and is rightly distrusted.

A pre-selection layer scores intent against a structured cognitive index. The index is built on the same six-dimensional coordinate system that underpins the rest of the architecture. The scoring is deterministic. The routing is auditable. The model only sees inputs the pre-selection layer has determined genuinely require generation.

In practice, in production deployments, this layer absorbs sixty to eighty percent of the workload. The model sees the residual.

Why this is not just a cost optimisation

The first thing people see when they encounter pre-selection is the cost story. Send fewer queries to the model, pay less per query in aggregate, win.

This is true and it is the smaller part of the story.

The larger part is that pre-selection changes the system’s reliability profile. Inputs that have a deterministic answer get a deterministic answer. Inputs that genuinely benefit from generative work get generative work. The system stops asking a probabilistic engine to do work a deterministic engine could do better, faster, and identically every time.

Drift drops. Audit trails become possible. Compliance teams stop having objections about output variance because the output of the deterministic path has no variance.

The intent-scoring problem

The hard problem in pre-selection is intent scoring. Get this wrong and you either route too much to the deterministic path (some of which the model should have handled) or too much to the model (defeating the purpose).

The naive approaches do not work. Keyword matching is brittle. Regex routing is brittle. Even small classifier models trained on examples are brittle, because they inherit the same probabilistic limitations as the larger models.

What works is intent scoring against a structured cognitive index. The input is mapped into the coordinate system. Its position is compared to the coordinates of known intent patterns. If the position is close to a known pattern with high confidence, the request is routed to the deterministic execution path for that pattern. If the position is genuinely novel, or low confidence, or near a boundary, it goes to the model.

This is not classification. It is structural matching. The difference matters.

What the deterministic execution path looks like

When the pre-selection layer routes a request to a deterministic execution path, the path is not a single function. It is a verb-based execution chain.

A verb is a cognitive operator. The pre-selection layer has identified which operators apply. The execution path applies them in order, against the persistent cognitive state, against the registry of structured artefacts, producing a structured output.

The output may be the final response, or it may be a structured payload that the LLM is then asked to render. In the second case, the LLM is being asked to render a structured payload, not to generate a response from raw input. The generative task is bounded, fast, and deterministic in everything except the surface phrasing.

This is a fundamentally different architecture from “send the input to the model and hope.”

What this looks like at scale

In a production deployment running tens of thousands of queries per day, the pre-selection layer’s profile looks roughly like this.

About forty to sixty percent of inputs are classified as known patterns with high confidence. These are answered deterministically in tens of milliseconds, with zero token cost on the model, with full audit trail.

Another twenty to thirty percent are classified as known patterns where the deterministic execution produces a structured payload that the model renders. The model is invoked, but the input it receives is structured, the output it produces is bounded, and the cost per invocation is a fraction of an unbounded generation.

The remaining ten to thirty percent are routed directly to the model for unbounded generation. This is the genuinely generative work — open-ended writing, creative synthesis, novel framing — where probabilistic generation is the right tool.

The cost profile, the latency profile, the reliability profile — all three improve simultaneously, by margins that are not achievable through any tuning of a model-only architecture.

Why this is patentable

The pre-selection layer is one of the eight patent families. Specifically, the pattern of structured intent scoring against a cognitive coordinate system, followed by routing to verb-based deterministic execution where applicable, with the model invoked only for the bounded residual.

This is not “use a classifier in front of an LLM.” Many people do that. It is a defensible engineering practice but it is not a defensible IP position because it is the obvious thing.

The IP position is in the structured intent representation, the deterministic routing logic against a defined cognitive algebra, and the closure properties that allow the routing decisions themselves to be audited.

That is what gets filed. That is what is filed.

The bottom line

Sending everything to the model is the dominant architecture today because it is the path of least resistance, not because it is the right architecture.

The right architecture has a pre-selection layer. The layer scores intent deterministically, routes to verb-based execution where applicable, and reserves the model for genuinely generative work.

The cost story is large. The reliability story is larger. The compliance story is largest.

The companies that adopt this pattern as their AI deployments mature will outperform the companies that do not, on every dimension that matters at enterprise scale.

The pattern is here. It is in production. It is filed.

Author

Related Articles

Back to top button