Interview

The Engineer Who Has Seen Both Ends, Omkar Wagle on Embedded Systems, Cloud Scale, and the Future of Infrastructure

Few engineers have built systems at both ends of the computing spectrum. Most cloud architects have never written multithreaded C++ for a resource-constrained IoT gateway, and most embedded engineers have never had to keep distributed storage alive at cloud scale. 

Omkar Wagle has done both, and that range shapes everything about how he thinks. Starting with Linux-based IoT platforms and Zigbee protocol integrations at HealthAsyst before moving through Roku’s embedded and developer tooling environment, he has spent his career at the intersection of hardware constraints and software design. Now a Software Engineer II focused on high-availability protocols and C++ systems architecture for large-scale distributed storage, Omkar brings a ground-up systems perspective to questions that most cloud engineers only encounter from the top down. 

With deep expertise in Linux internals and database optimization, he has delivered efficiency improvements exceeding 70 percent in production environments, gains that came not from clever shortcuts but from methodical profiling and a refusal to accept assumptions about where performance was actually being lost. In this interview with AI Journal, he discusses how the debugging instincts built at the metal level translate to cloud-scale failure analysis, why infrastructure teams need a seat at the table earlier in AI system design, and what the next generation of self-aware, adaptive infrastructure will need to look like.

You began your career working on Linux-based IoT platforms and Zigbee integrations before moving into large scale distributed storage and high availability protocols, how did you get started in systems engineering, and how has that journey shaped your perspective on the future of computing?

Right after graduating from my undergrad, I developed an interest in embedded systems. This led me to dive deep into the topic and learn what system software is and how to write code for low-level embedded systems. 

At HealthAsyst, I wrote multithreaded C++ for a Linux IoT gateway, handling race conditions, Zigbee clusters, and TCP sockets. It was messy, close to the metal, and I loved it. That environment teaches you to think carefully, because bugs at that level can be really hard to track down.

That debugging instinct is honestly what’s carried me forward. At Roku, I was chasing memory leaks in partner SoCs, and now  I’m conducting root-cause analysis of load balancer failures and partition movements in Azure Storage. The scale is completely different, but the mindset is the same. I stay curious, don’t assume anything, and keep digging until I find the real problem.

As for the future, I think systems fundamentals matter more than ever, not less. Whether you’re programming a tiny ARM9 chip or managing distributed storage at cloud scale, it all comes back to reliability, performance, and understanding failure modes. Starting from the ground up gave me a perspective I really value.

As a Software Engineer II focused on C and C++ systems architecture for large-scale distributed storage, how do you see the relationship between hardware constraints and software design evolving as computing becomes more distributed across cloud systems and connected devices?

Working on Azure Storage, you’re dealing with systems where even a small inefficiency compounds massively at scale. And having started my career writing code for ARM9 IoT gateways, I’ve seen both ends of the spectrum pretty closely.

What I’ve noticed is that hardware constraints never really go away,  they just change shape. On an IoT device, you’re fighting memory and processing power. In the cloud, you’re fighting latency, network overhead, and I/O bottlenecks. I think as computing becomes more distributed, spanning cloud infrastructure, edge devices, and connected hardware, the engineers who understand both worlds will have a real advantage. The abstraction layers keep growing, but underneath it all, someone has to understand what’s actually happening at the metal level.

Honestly, my IoT background made me a better cloud engineer. When you’ve had to squeeze performance out of a resource-constrained device, you develop habits, profiling religiously, questioning every unnecessary operation — that translate really well to large-scale distributed systems. I think that mindset becomes more valuable as systems get more complex, not less.

Your work spans embedded environments at Roku and Linux-based IoT platforms at HealthAsyst, alongside cloud-scale infrastructure. What are the biggest architectural challenges when bridging edge devices with centralized cloud systems?

The biggest challenge, in my experience, is reliability across unreliable boundaries. On an IoT gateway at HealthAsyst, I built OTA firmware upgrade mechanisms and TCP/IP communication bridges.  The hard part was always handling the cases where things go wrong mid-transmission. Devices drop off, networks hiccup, and your software has to be designed for failure from day one. That’s very different from traditional software thinking, which assumes connectivity.

At Roku, it shifted slightly.  I was dealing with memory constraints on partner SoCs and working across firmware teams to get new workflows adopted. And that cross-functional piece is actually an underrated architectural challenge. Edge environments often involve hardware you don’t fully control, so your software design has to be defensive and flexible by nature.

Then, moving to cloud scaling, the challenge flips a bit. Now you have the compute and reliability, but you’re managing coordination across massive distributed systems, load balancing, partition movements, failover protocols. The question becomes less “will this device stay on” and more “how do we keep the whole system consistent when something inevitably fails?”

Honestly, having worked both sides gives you a much more grounded intuition for where the real failure points are. You stop making assumptions that either end can guarantee, and you design accordingly.

AI workloads place enormous pressure on storage systems, latency, and reliability. From your experience in high availability protocols and distributed storage, what must change in infrastructure design to support AI at scale?

Working on storage systems day to day, you feel that pressure directly. The workloads are unpredictable, the data volumes are huge, and when you’re talking about inference, latency isn’t something you can just shrug off. At AI scale, that kind of thing doesn’t stay quiet for long.

The load balancing problem is one I feel strongly about. Right now, most systems react after something starts degrading. That’s too slow for AI workloads that can spike almost instantly. The failover protocol work I’ve been doing is trying to push in the right direction, systems that are continuously self-monitoring rather than waiting for something to break before responding.

The I/O piece is honestly where I think people underestimate the challenge. It’s not always about raw hardware capacity. Back at HealthAsyst, switching to an in-memory write strategy for SQLite improved response times by 78% without changing hardware. That same mindset of really questioning how and when data moves is going to matter enormously for AI infrastructure.

But the bigger shift I think needs to happen is cultural, not just technical. Infrastructure teams need to be in the room earlier, when AI systems are being designed, not brought in later to speed things up. By the time you’re retrofitting, you’ve already painted yourself into a corner.

You have delivered efficiency improvements exceeding 70 percent in production systems, how critical is low level performance optimization in enabling real time analytics and AI inference in production environments?

Honestly, it’s not just critical; it’s often the difference between something working in production and something that looks good on paper but falls apart under real load.

The 72% efficiency improvement at HealthAsyst didn’t come from a single clever trick. It came from methodically profiling the system, finding where cycles were actually being wasted, and making targeted algorithmic changes. And the 78% database response time improvement was similar; we weren’t doing anything exotic, we just identified that hard write cycles to SQLite were killing us and switched to an in-memory writing strategy. Simple in hindsight, but you only see it when you’re looking at the right level of the stack.

That low-level instinct matters enormously for real-time analytics and AI inference. Inference, especially, is latency sensitive in an unforgiving way. You can have a brilliant model, but if your storage layer introduces unnecessary I/O overhead or your memory access patterns are inefficient, you’re leaving a lot of performance on the table. What I’ve learned across these roles is that performance optimization isn’t a phase you do at the end. It has to be a continuous discipline baked into how you think about architecture from the start. Profiling regularly, questioning assumptions, understanding what’s happening beneath the abstractions.

As connected devices generate growing volumes of data, some of which feeds AI systems, how can organizations better align hardware capabilities, distributed storage, and software architecture to avoid bottlenecks?

The honest answer is that most organizations don’t discover the misalignment between hardware, storage, and software until something breaks in production. And by then it’s expensive to fix. At HealthAsyst, I was building across the full stack on an IoT gateway, so when things were misaligned, I felt it immediately. That end-to-end exposure was really valuable because you can’t hide from the consequences of poor design decisions.

The biggest thing I’d say is design the data path early. Not as an afterthought. At Roku, finding a memory leak that pushed bandwidth headroom from 8% to 26% was a great result, but that constraint shouldn’t have existed in the first place. That kind of thing happens when hardware and software teams aren’t talking to each other until late in the process.

On the storage side, the assumption of steady, predictable traffic is where many systems get into trouble. Connected devices are bursty and unpredictable by nature. My work on load balancing at really hammered this home.  If your system can’t adapt dynamically to shifting workloads, it will bottleneck regardless of how powerful the hardware underneath is.

But honestly, beyond the technical stuff, the organizational piece is underrated. At Roku, I had to work closely with firmware teams just to get a new workflow adopted. Great architecture means nothing if the teams building on top of it aren’t aligned.

From your experience optimizing embedded and developer tooling systems at Roku, what lessons can consumer-scale platforms offer enterprise engineers building hybrid edge and cloud systems?

When I joined Roku, I built a Flask application to process GitLab notifications and catch missing modules in merge requests. Iit cut code audit effort by around 60%. That sounds like a developer tooling win, but the deeper lesson was about how much invisible overhead accumulates in large engineering organizations when you don’t automate the right things. At the consumer scale, with hundreds of engineers and constant shipping pressure, that overhead compounds fast. Enterprise teams building hybrid edge-cloud systems face the exact same problem, just dressed differently.

The other big one was the Copybara and Bazel workaround preventing accidental IP module sharing. What made that hard was getting firmware teams across the organization to actually change their workflow. And that’s a lesson that translates directly to enterprise hybrid systems. You can architect a beautiful edge-cloud boundary, but if the teams on either side have different tooling cultures and incentives, the seams will leak.

On the embedded side, the memory leak I found at Roku that improved bandwidth headroom from 8% to 26%.  That came from just sitting with the profiler long enough to see what was actually happening. Consumer platforms move fast and ship constantly, so you develop a habit of instrumenting everything and trusting data over intuition. Enterprise engineers sometimes have more breathing room, but I think that discipline of continuous profiling and measurement is something they should borrow regardless.

The consumer world is ruthless about performance and scale. Those instincts translate really well when you’re designing systems that have to work reliably across both edge and cloud environments.

Looking ahead, what innovations in systems architecture, distributed storage, or protocol design will most influence how hardware, software, and data come together to power the next generation of intelligent applications?

The thing I keep coming back to is intelligent, self-aware infrastructure. The work I’ve been doing at  around failover protocols and load balancing feels like an early version of where everything is heading. Right now, we’re still heavily reliant on engineers like me digging into logs and doing root cause analysis manually. The next generation of infrastructure should do much of that itself, continuously and proactively.

On the protocol side, I think the assumptions baked into a lot of current designs are going to get stress-tested hard. TCP worked beautifully for the internet we built, but intelligent applications spanning cloud, edge, and connected devices need something more adaptive. Lower latency, better handling of intermittent connectivity, smarter about prioritizing what data moves where and when. My background with Zigbee, MQTT, and TCP across constrained environments gave me a real appreciation for how much protocol choice shapes what’s even possible at the application layer.

Storage architecture is probably the area I’m most closely watching. AI workloads are exposing gaps that traditional storage design never had to solve for because the I/O patterns are just fundamentally different. I think we’ll see storage systems that are much more tightly coupled with compute, designed from scratch around how models actually consume data rather than retrofitted from general purpose designs.

 

Author

  • Tom Allen

    Founder and Director at The AI Journal. Created this platform with the vision to lead conversations about AI. I am an AI enthusiast.

    View all posts

Related Articles

Back to top button