
As airlines face growing pressure to improve operational efficiency while maintaining the industry’s uncompromising safety standards, artificial intelligence is moving from experimentation into real-world aviation environments. From predictive maintenance and operational decision support to infrastructure monitoring and flight operations, AI is increasingly becoming part of the systems that keep airlines running. At the same time, recent aviation incidents and heightened scrutiny of technology-driven decision-making have reinforced the importance of human oversight, transparency, and trust when deploying AI in safety-critical settings.
In this interview, AI Journal speaks with Nikhil Atkuri, Lead PM for AI Strategy at a major U.S. airline, about what it takes to move AI from theory to production in aviation. Drawing on his experience leading enterprise AI deployments across flight operations, maintenance, and operational infrastructure, as well as his previous work building large-scale AI platforms at Microsoft Azure, Nikhil shares insights on human-in-the-loop system design, regulatory considerations, operator trust, and the challenges of deploying AI responsibly in one of the world’s most complex operational environments.
How did you get involved in applying AI within airline operations, and what drew you to working in such a safety-critical environment?
My path to aviation AI wasn’t straightforward. Over several years and across multiple companies and diverse domains, I built my capabilities as an AI product leader.
Most recently, I built AI systems for infrastructure operations at a major cloud platform. I specifically created a troubleshooting platform that helped roughly 100,000 engineers diagnose and resolve critical incidents faster. That work taught me what it actually means to deploy AI in high-stakes environments, where there’s a real cost to getting it wrong.
So when the opportunity came to lead AI strategy at an airline, I jumped at it. Aviation is a more contained environment to build with AI, and one where you have to tread carefully. That was exactly the kind of challenge I wanted.
In software operations, a bad AI recommendation costs you time. In aviation, the calculus is completely different. That tension between the power of AI and the genuine consequences of getting it wrong is what makes this work exciting.
Your work spans flight operations, maintenance systems, and infrastructure. How do you approach deploying AI across such different but interconnected parts of an airline?
There’s always a temptation to treat each domain separately and as its own problem with its own solution. I think that’s a mistake. The flight operations, maintenance, infrastructure, and crew systems are constantly talking to each other. An AI model that doesn’t understand those dependencies will produce recommendations that look right in isolation but fail in the grand scheme of the operations ecosystem.
My approach starts with understanding the operational workflows and handoffs. Where does information flow between systems? Where do humans make decisions based on inputs from multiple sources?
I also insist on defining what “wrong” looks like before we deploy anything. Every system needs a human-in-the-loop element to recognize and correct its output. If we can’t articulate that clearly, we’re not ready to deploy.
When moving from theory to real-world implementation, what are the biggest challenges you’ve faced in actually getting AI systems into production?
The technical challenges are real, but most of them are solvable. The harder problem is the context layer. AI models are trained on data, but operational reality includes context that often lives outside that data. For instance, a maintenance event was logged minutes ago. A crew change that just happened. A regional weather pattern that experienced operators know is important, but that no dataset captures cleanly or accurately.
I’ve watched AI systems flag genuine-looking alerts that seasoned operators immediately knew were explained by something else that the model simply couldn’t see. When that happens repeatedly, operators and technicians stop trusting the system. Once that trust is broken, it is often much harder to establish and rebuild.
So closing that context gap and connecting AI reasoning to the full operational picture is the hardest production challenge I’ve faced.
In aviation, decisions are tightly regulated and high-stakes. How do regulatory requirements shape the way you design and deploy AI systems?
You’re right that FAA regulatory frameworks are often treated as constraints, but I’d argue that’s rightfully so. These regulations exist because the industry has already learned, sometimes the hard way, that human judgment is non-negotiable. When a regulation requires human sign-off on a dispatch decision, that’s not bureaucracy. That’s decades of institutional knowledge encoded into process.
That framing changes how I approach AI design. Rather than asking “how do we automate this?” I ask, “What does the regulation tell us about where the human must remain in control, and how do we build AI around that so it makes people more effective rather than trying to replace them?”
The result is AI that earns regulatory acceptance because oversight was designed in from the start as the norm, not retrofitted as an afterthought.
You’ve worked on human-in-the-loop models in operational environments. What does that look like in practice, and why is it so important in aviation?
In aviation, especially, AI will most likely always be an advisor, never an authority, and more importantly, never an independent executor. That may change a few years down the line, but we are far from it at the moment.
Let me give you a concrete example. An AI monitoring system flagged a potential reliability issue on an aircraft scheduled for an extended overwater route. The signal was technically valid; the telemetry showed patterns that matched historical risk signatures. But an experienced maintenance operations manager reviewing the alert had context that the model didn’t. The aircraft had just completed a routine servicing event, and the reading in question was a temporary fluctuation during system stabilization, which is not a real issue.
The MOM cross-checked the maintenance log, confirmed the context, and cleared the aircraft. The flight operated without any issues.
So human-in-the-loop is, in fact, the architecture that makes that moment possible, where the system surfaces the signal, the human provides the context, and the decision is better than either could have produced in isolation.
Operator trust is often a barrier to adoption. How do you build confidence among pilots, engineers, and operations teams using AI-driven systems?
I think this is the elephant in the room. You don’t build trust by telling people the system is accurate. You build it by showing them the system understands its own limitations.
Operators in safety-critical environments are highly experienced, tenured, and sophisticated. They’ve seen tools oversell their capabilities before. What earns their confidence is an AI that tells them when it’s uncertain, shows its reasoning rather than just the conclusion, and surfaces the sources behind that reasoning and makes it easy to override without penalty.
I also learned early that the first failure matters enormously. If an AI system gets something wrong in a visible way, and the operator feels like the system didn’t flag its own uncertainty, that trust is very hard to rebuild. So we spend as much time designing how the system communicates doubt as we do designing how it communicates confidence.
In my opinion, getting the trust factor right is the very first step — before shipping or piloting anything.
Based on your experience, what distinguishes AI deployments that succeed in safety-critical environments from those that struggle to gain traction?
The ones that succeed treat operator trust as an architecture problem, not a change management problem. This matters especially when you’re dealing with probabilistic systems rather than deterministic ones. With deterministic systems, you can often address trust after the build through training, communication, and adoption campaigns. With probabilistic AI in safety-critical environments, that’s too late. Trust has to be designed in and baked in from the very beginning.
Concretely, that means three things.
First, explainability is non-negotiable. If operators can’t interrogate and understand why the AI made a recommendation, if there are no sources behind it they won’t act on it.
Second, confidence calibration matters more than accuracy. A system that’s right 95% of the time but overconfident in the remaining 5% will fail catastrophically in environments where that 5% is where the disasters live.
Third, the system must learn from overrides. Every time an operator corrects the AI, that’s a signal and the system should be capable of capturing and incorporating that signal to improve. Systems that do this earn trust over time.

