When people talk about AI infrastructure, one name inevitably comes up: NVIDIA. For over a decade, it has been the undisputed backbone of machine learning workloads, dominating both training and inference with its powerful GPUs and mature developer ecosystem. But over the past few years, a challenger has emerged in a rather strategic and subtle way: AWS Neuron, a software development kit created to power Amazon’s custom chips for AI workloads, namely Inferentia and Trainium. What was once considered experimental is now gaining traction among forward-thinking engineers.

Kirill Starkov is one of them.

Kirill is not your average AI enthusiast. With over eight years of experience in machine learning, computer vision, and AI deployment at scale, his projects have spanned everything from vehicle tracking systems to pandemic-related safety tools. While many developers work within frameworks handed to them by companies, Kirill has consistently been at the frontier — writing tracking algorithms from scratch, implementing transformer-based models before they became widespread, and recently, evaluating AWS Neuron’s real-world readiness.

“We worked on a project with Refact.ai that involved AWS Inferentia (Neuron),” he explains. “It was a real system, not a benchmark or lab test. That makes a difference. You’re not just reading documentation, you’re watching your model’s behaviour in production.”

The aim was to shift parts of the inference workload to Inferentia, allowing more efficient throughput compared to standard GPU deployments. The results were — in his words — “promising, but not seamless.”

The Market Landscape: Why This Comparison Matters

Before diving into technical details, Kirill is careful to contextualise what’s at stake. “When you’re working on large-scale machine learning systems, compute becomes your biggest bottleneck. Not just in terms of performance, but also in cost. NVIDIA’s top-tier GPUs, while powerful, are expensive. And when you deploy at scale, that adds up quickly. If you’re a small company or startup, every penny matters — and that’s where alternatives like AWS Inferentia start to look appealing. You may trade off some flexibility, but the cost savings can make a real difference.”

AWS’s approach — building dedicated AI chips optimised for specific types of workloads — is fundamentally different. Inferentia is focused on inference. Trainium is built both for training and inference, it’s just more expensive. By keeping things tightly integrated with the AWS cloud, Amazon controls the entire pipeline — from chip design to deployment.

“It’s like comparing an all-terrain SUV to a racing bike,” Kirill says. “One is versatile. The other is optimised. The question becomes: do you need versatility, or do you need speed and cost-efficiency on a well-defined track?”

AWS Neuron Through a Developer’s Eyes

From a developer’s perspective, the transition to AWS Neuron isn’t always smooth. “If you’ve spent years working with CUDA, PyTorch, and GPU-optimised tools, there’s a learning curve,” Kirill admits. “Neuron is young. Some operations aren’t fully supported. Documentation is improving, support is very helpful, but you’ll still find gaps.”

But he’s quick to note the progress. “What impressed me was the consistency of performance once the models were compiled properly. Especially for inference-heavy systems — things like LLM deployment or computer vision classification tasks — you can get real gains.”

He shares an example from the project: “We were running a hybrid architecture that included parts of a transformer model optimised for inference. On GPUs, the performance was solid, but once ported and compiled for Inferentia, we saw a noticeable reduction in latency per request — around 25 to 30 percent in some cases. That’s not trivial.”

Still, Neuron’s compiler and SDK tools are very much a work in progress. “I wouldn’t say it’s plug-and-play. You need to allocate time to debug Neuron-specific errors, and understand how to refactor models so that they’re compatible. There are some simple examples that just work — the kind of pipelines AWS provides out of the box — but the moment you need something that isn’t part of the standard library, it becomes a struggle. You end up writing workarounds, rethinking your architecture, and generally spending a lot more time than you would on CUDA. But if you’re patient, the results are there.”

Comparing Against NVIDIA: Where the Gap Remains

Despite the optimism, Kirill doesn’t downplay NVIDIA’s continued dominance. “The ecosystem around CUDA is massive. Every major framework supports it natively. If something breaks, someone else has already found a fix. With Neuron, you’re often the first.”

One of the biggest gaps is flexibility. “NVIDIA cards are general-purpose. You can use them for training, inference, simulations, even graphics. With AWS Neuron, you’re playing within a narrower sandbox. And if your use case doesn’t fit perfectly, you feel it.”

Tooling also plays a role. “NVIDIA has Nsight, TensorRT, all kinds of performance profilers. With Neuron, you have basics, but not yet the same depth. You’ll sometimes spend extra time chasing what should be obvious metrics.”

Looking Ahead: Potential and Strategy

Despite the gaps, Kirill believes that AWS is playing the long game, and playing it well. “AWS isn’t trying to beat NVIDIA at its own game. They’re redefining the game. By offering cost-efficient, scalable, and tightly integrated hardware-software stacks, they make it easier for companies to say: we don’t need the best, we need the most manageable.”

He predicts that within the next two to three years, Neuron will become a strong alternative for startups and mid-size businesses aiming to scale cost-effectively. “If you’re starting something new today, and planning to deploy primarily on AWS, I’d say give Neuron a shot. The learning curve is worth it.”

Kirill also sees a broader benefit in the emergence of Neuron. “It’s good that AWS is trying to compete with NVIDIA — real competition drives progress. It forces everyone to raise the bar, improve their ecosystems, and think more critically about developer needs. That’s good not just for companies, but for the whole industry.”

He pauses for a moment before concluding.

“Right now, Neuron is a highly promising product with a lot of potential. It’s not yet a full replacement for everything NVIDIA offers, and it still comes with challenges — especially when you step outside the most common use cases. But AWS is actively investing in its development, and you can see tangible progress with each update. I believe Neuron will carve out a solid niche for itself in the near future, especially for teams already building within the AWS ecosystem. It’s becoming more accessible with time, and many of the early limitations are already being addressed.”

Author

Tom Allen

Founder of The AI Journal. I like to write about AI and emerging technologies to inform people how they are changing our world for the better.
View all posts

Tom Allen 1 April 2024

4 minutes read

The Market Landscape: Why This Comparison Matters

AWS Neuron Through a Developer’s Eyes

Comparing Against NVIDIA: Where the Gap Remains

Author

Related Articles

Base44 vs. Famous.ai: Full Comparison of AI App Builders for Creators and Developers

Are Mobile App Reverse Engineers Getting an Assist from AI?

The Digital Shift: Transforming Processes Across Life Sciences

The Human-First Revolution in Software Design