Aleksandr Timashov has built ML systems at Meta and PETRONAS, handling everything from matching millions of products in real-time to risk scoring at scale. Later, he applied those lessons to legal AI. We talked about what actually works, what breaks, and where the hype ends.

Let’s start with the pragmatic question: Which legal workflows deliver repeatable ROI today? Contract review and e-discovery get mentioned constantly, but what are you actually seeing work in production?

The winners are high-volume, repetitive tasks where you can measure results. First-pass contract review: NDAs, MSAs. Privilege screening in e-discovery. Clause extraction, cite checks, basic triage.

These aren’t exactly sexy, but they’re the backbone of legal work. And they work because you can measure them: time saved, error rates compared to a baseline team. We’re seeing multiple-x faster throughput at the same or lower error rates. Not “feels faster”, but actual measured gains. The key is that these tasks are grounded in specific documents. You can verify every output. Hallucinations have nowhere to hide.

You build “agents” rather than copilots. When does a multi-step agent beat a smart copilot?

Agents win when you want end-to-end progress without constant supervision. A copilot requires the expert to guide every step. An agent can churn through an entire privilege log overnight and give you a structured report to review in the morning.

But when they fail, they fail in predictable ways. Tool mis-selection is common: the agent chooses the wrong retrieval strategy or tries to extract structured data from an image-heavy PDF. Retrieval drift is another: you ask for cases about forum selection in employment disputes, and it starts pulling general employment law because the vector embeddings lack jurisdictional context.

Citation errors are the most dangerous. Agents will confidently cite to the wrong paragraph or, worse, generate plausible-looking case names.

Why legal AI? Why now?

Going through an EB-1 visa petition showed me the problem. My immigration attorney knew exactly what to argue and what evidence mattered. The bottleneck was drafting. Months of assembling exhibits, structuring narratives, grounding every claim in boilerplate sections. Brilliant legal minds spending 70% of their time on assembly work.

I kept thinking: this is exactly what LLMs could handle well. Let agents do the repetitive drafting so lawyers can focus on strategy. That’s the unlock. Not replacing lawyers but amplifying them so the bottleneck isn’t typing speed.

You’ve seen these systems in production now. What are the biggest pros and cons, and what do teams consistently underestimate?

The pros are speed, consistency, and audit trails. You get work product faster, it doesn’t vary based on analyst fatigue, and everything is logged for compliance. Teams love that they can trace every output back to specific source documents and model decisions.

The challenges are more human than technical. Bureaucracy: legal organizations move slowly. Compliance: getting past data privacy and privilege concerns takes longer than building the system. Adoption: even when it works, convincing partners to trust the output requires serious change management.

Teams underestimate the organizational friction. You can build a brilliant agent that handles cite checks flawlessly, but if the senior partner doesn’t trust it or the IT department won’t approve deployment because of data residency rules, it doesn’t matter.

You’ve matched ~10 million products in real-time. How does that translate to legal retrieval?

The core principle is the same: start broad, then go deep. First extract a small, relevant subset fast — then apply heavier algorithms for precision.

In product matching, you use structured signals like brand, category, and price to narrow down millions of SKUs, then apply embedding-based similarity to finalize the match. Legal retrieval follows the same pattern.

My recipe: schema-aware hybrid retrieval: begin with structured filters — jurisdiction, court level, date range — then apply dense embeddings for semantic queries like “find cases about forum selection in employment agreements.”

The key to avoiding drift is source grounding at every step — ensuring the system always knows where each piece of text came from before reasoning over it.

How do you make an agent know something when it doesn’t know?

Admit unknowns fast. I combine confidence calibration with a risk score based on coverage, agreement, and support from sources.

Basically, just add a validation step to the loop — If the risk score is too high or coverage is weak, the agent should stop and escalate with context. “I found three cases, but they’re from different circuits and reach conflicting conclusions. Human review needed on whether to cite district court dicta or wait for circuit split resolution.”

This is way better than a confidently wrong answer. In legal work, saying “I don’t know, here’s why, and here’s what you should check” is often more valuable than a hallucination.

Legal teams want fast answers, but partners want bullet-proof ones. How do you balance speed and defensibility?

The goal is to keep end-to-end latency under a threshold where the agent is still usable. Say, 30 seconds for a cite check, a few minutes for a first-pass contract review. You need to ensure every stage has a quality gate. Fast and defensible means designing the pipeline so verification happens in parallel or in a final sweep, not blocking every intermediate step.

If you chase pure speed, you’ll cut corners on verification and ship garbage. If you chase pure correctness, you’ll be too slow to be useful. The sweet spot is streaming results. Show the agent’s work incrementally as it finishes subtasks so the user sees progress while verification runs in the background.

Based on your experience moving ML from lab to production at Meta and PETRONAS, what won’t you automate in the next 12 months, and where is human judgment structurally necessary?

I won’t automate strategy, negotiation, or risk trade-offs. These require contextual judgment with real stakes. Forum choice, novel legal arguments, settlement decisions. These aren’t retrieval problems.

The next 12 months, I would focus on triage, cite checks, and first-pass drafts with uncertainty flags. These are tasks where speed and consistency matter more than creativity. Agents can handle “pull all relevant cases from the Second Circuit on this issue” or “draft the background section of this memo using these sources.” Humans should own “which argument do we lead with?” and “do we settle or go to trial?”

Agents are force multipliers, not replacements. Let them handle the grunt work so lawyers can focus on high-stakes decisions that actually require judgment. That’s the future: brilliant attorneys not bottlenecked by typing and formatting, and more people getting access to quality legal help because the profession scales differently.

Author

Balla

I am Erika Balla, a technology journalist and content specialist with over 5 years of experience covering advancements in AI, software development, and digital innovation. With a foundation in graphic design and a strong focus on research-driven writing, I create accurate, accessible, and engaging articles that break down complex technical concepts and highlight their real-world impact.
View all posts

Balla 6 days ago

4 minutes read

Author

Related Articles

Inside the ‘mind’ of an AI agent: an exploration of the technology powering Agentic AI at Grafana Labs

The AI Race: Balancing Innovation with Sustainability

The Future of Paid Ads in an AI-First World: What Executives Should Be Asking Now

Teaching AI Agents to Think: Data Best Practices for Trustworthy Autonomy