Digital TransformationEducationInterview

Rohan Benkar on Why Infrastructure Decisions Define What’s Possible to Learn

Rohan Benkar joined Coursera in 2016 as a founding member of the company’s AI engineering team. For nearly a decade, he’s built the infrastructure that supports hands-on learning at internet scale, including the Labs platform that enables millions of learners worldwide to execute code, debug systems, and interact with real development environments directly in the browser. Before Coursera, Benkar worked at PayPal’s research lab, where he won the Money2020 Global Hackathon and contributed to work that resulted in a U.S. patent in virtual display technology. He holds a Master’s in Computer Science from The Ohio State University and has published research on interactive geographic information systems.

His work sits at the intersection of distributed systems, generative AI, and pedagogy. The technical choices his team makes about security isolation, session persistence, and compute provisioning aren’t just operational details; they directly determine what kinds of learning experiences instructors can design and what learners can actually do. That perspective has become more urgent as AI-generated code shifts the bottleneck in technical education from syntax to judgment, and as hands-on execution environments move from supplementary tools to core curriculum infrastructure.

You’ve spent nine years building AI systems at Coursera. What does educational AI get wrong that consumer AI gets right, and vice versa?

You can think of consumer AI and educational AI as having totally different jobs—it all comes down to what they are designed to achieve. Consumer AI (like your favorite chatbot) is focused on making you happy and keeping you engaged right now; it’s fast, flexible, and just needs to be “good enough” in the moment. Educational AI, though, has a much tougher contract: its mission is making sure the stuff you learn actually sticks with you, which means it has to be super rigorous about correctness, sequencing, and teaching methods.

Consumer AI absolutely nails a couple of key things. First, personalization: it instantly changes gears based on what you’re trying to do, which makes you feel seen. Second, it wins on engagement; it’s so intuitive and low-friction to use, while educational systems can sometimes feel stiff or overly structured.

Educational AI, however, is the expert on how people learn. It builds on solid learning science—using techniques like scaffolding (gradually increasing difficulty), spaced repetition, and assessment. It understands that a little friction, like practice and feedback, is actually crucial for remembering things. Plus, it gives you clear learning paths so you build concepts systematically, which is something consumer AI usually doesn’t bother with.

The big opportunity is bringing these two together: marrying the fun, adaptable spirit of consumer AI with the rigor and deep structure of serious educational systems.

When you’re designing AI for a learner who might be in Lagos or Manila on a mobile connection, how does that change the architecture decisions versus building for a San Francisco office worker?

It changes things quite a bit. You can’t just assume everyone has a stable connection, a powerful device, or long, uninterrupted sessions. Instead of just making everything as rich as possible, we have to build for resilience. That means smaller payloads, fewer round trips, and making sure the system can recover gracefully if a connection drops.

It also shifts how we think about the learner’s experience. Someone on a phone might be learning in short bursts between other tasks, they might be very sensitive to data costs, and they’ll likely get interrupted. We need to make sure responses are concise, progress is easy to save and resume, and the system works reliably even when conditions aren’t perfect.

Then there’s the human element. A learner in Lagos or Manila isn’t just a “low-bandwidth” version of someone in San Francisco. A truly global system needs to be context-aware, not just translated. It has to adapt to local languages, examples, and educational backgrounds. That requirement goes all the way back to our core architecture—from how we design prompts to how we retrieve and serve content.

You’ve argued that compute is part of the curriculum itself. Can you walk through a specific decision on the Labs platform where an infrastructure choice directly changed what learners could actually do?

One concrete example was our decision to back Labs with Docker containers and allow instructors to bring their own environments. At the time, this looked like an infrastructure choice—standardizing execution and making it easier to run arbitrary code safely.

But it fundamentally changed what instructors could teach. Suddenly, instructors weren’t stuck with rigid, pre-set environments. They had the freedom to build whatever they could imagine. For instance, the University of London created this amazing interactive tool called Sleuth. It turned learning to code into a detective game where students solved puzzles through programming.

That’s not just “running code.” It’s a totally different learning experience that weaves problem solving, narrative and interactivity together. We couldn’t have offered that kind of immersion without that flexible compute layer underneath.

That’s exactly why I say compute is part of the curriculum. The choices we make about infrastructure don’t just support the lesson—they define the boundaries of what’s possible to learn. When we expanded those technical limits, we moved from static exercises to truly applied, hands-on experiences.

What’s the most counterintuitive thing about scaling hands-on learning environments that people outside the problem wouldn’t expect?

One thing that is counterintuitive is that scaling a hands-on learning environment isn’t at all like scaling a typical cloud service. In a normal system, you’re managing compute; here, you’re actually managing a human’s train of thought.

Usually, you’d optimize for things like statelessness and aggressive autoscaling—just spinning things up and down as needed. But when someone is learning, their session is long and deeply stateful. Think about a learner who’s been deep in a debugging session for 30 minutes; they’ve built up all this mental and technical context that you can’t just reset.

This makes scaling down much trickier. You can’t just terminate an instance because it’s “efficient” if it means kicking a student out of their flow. You have to wait and drain capacity gradually, which feels “wrong” to a pure infrastructure person but is absolutely right for the learner.

The stakes of a technical failure change, too. In most apps, a dropped request is a minor blip. In education, dropping a session mid-stride doesn’t just break the code—it breaks the learner’s motivation and momentum.

So, we end up making choices that might look inefficient on a spreadsheet—like keeping idle capacity ready—because we know it’s necessary to protect the experience.

How do you handle the conflict between aggressive cost controls and protecting active learner sessions? Where does that tension actually show up in practice?

It’s a constant tug-of-war. Standard cloud systems are built to squeeze every bit of value out of a server, but learning systems have to be built to protect a student’s focus.

In practice, this really comes to a head when we talk about timing out sessions. From a CFO’s perspective, you want to shut down any machine that isn’t “active.” But for a learner, “idle” doesn’t mean they’ve walked away—it often means they’re deep in thought, reading a complex diagram, or trying to wrap their head around a bug. If the system kills their session right then to save a few cents, we haven’t just saved money; we’ve completely derailed their momentum.

We’ve had to make very deliberate choices to be “inefficient.” We drain our capacity much more slowly and keep sessions alive longer than a typical web app would. It might look messy on a server utilization chart, but it’s the right thing for the person on the other side of the screen.

You see it everywhere—from keeping environments “warm” so students don’t have to wait for a cold start, to over-provisioning when we know a big exam is coming up. We’ve learned to accept that a little bit of waste is a fair price to pay for making sure a learner is never interrupted mid-epiphany.

Ultimately, we don’t try to “solve” the tension between cost and experience. We just acknowledge it and consistently choose to side with the learner.

AI-generated code means learners can produce working software without understanding it. Does that change what execution environments need to do?

Absolutely. It’s a total shift in where the “hard part” of learning actually happens. AI has effectively moved the bottleneck from the act of writing code to the much more difficult task of evaluating it.

Think about it: a learner can prompt their way to a working script in seconds, but that doesn’t mean they’ve internalized how it works. The challenge for them now isn’t syntax; it’s judgment. They have to figure out if the code is actually correct, how it handles edge cases, and—most importantly—how to fix it when it inevitably breaks.

This means our execution environments have to evolve. They can’t just be a “play” button anymore. They need to be sandboxes for inspection and iteration—places where a student can take what the AI gave them, pull it apart, and really reason through it.

It also forces us to design better curriculum. If a lab is just a narrow, single-step problem, the AI solves it instantly and the learning stops. We have to build broader, more realistic workflows that require human decision-making and the ability to navigate ambiguity.

We also have to embrace how people actually work now. The environment should support AI-native workflows—like prompting, tweaking generated code, and rapid experimentation—because that is the reality of modern engineering.

In a strange way, the execution environment is more critical now than it was before AI. It’s no longer just where the code runs; it’s where the learner develops the technical intuition and judgment that an LLM simply can’t provide.

There’s a real debate about whether AI tutors personalize learning or just personalize the pace of the same bad curriculum. Where do you land on that?

I think a lot of what’s called “personalization” in AI tutors today is overstated. Most systems are really personalizing the interaction—they adjust tone, pacing, or how they explain something—but the underlying curriculum is largely unchanged.

So in that sense, it often ends up being a more engaging way to move through the same content, rather than true personalization.

Real personalization in education is much harder. It requires the system to accurately infer what a learner actually understands—not just what they’ve been exposed to—and then adapt what comes next. That means changing the tasks, the level of challenge, and even the sequence of concepts, not just rephrasing explanations.

Then there’s the human side of it: knowing when to help and when to let someone struggle. That “productive struggle” is where the real learning happens, yet most AI today just focuses on being helpful because that’s the easy part to code.

Right now, AI tutors are great at personalizing the surface. But the real breakthrough will come when we can personalize the learning path itself. That’s much harder because it requires a deep, messy integration of technology and actual teaching science.

What did your time in PayPal’s research lab teach you about distributed systems that shaped how you approach education technology?

My time at PayPal shaped how I think about reliability and correctness in distributed systems, and that carried directly into education.

One key lesson is that failure isn’t an edge case—it’s the default. Networks fail, services time out, things degrade. So you design systems to handle that gracefully. In education, that becomes even more critical, because a failure isn’t just a technical issue—it disrupts a learner in the middle of thinking or problem-solving. And in high-stakes scenarios like exams or graded assignments, that failure can have real consequences.

Another is correctness. In payments, you can’t be “mostly right”—even small errors can have direct financial consequences, and they compound quickly at scale. In learning, it’s similar. If the system gives incorrect feedback or behaves inconsistently, it can create misconceptions that are hard to undo.

I also think a lot about safe retries. In payments, retries shouldn’t double-charge. In learning, learners should be able to rerun code, retry exercises, and recover their work without penalty. That requires the system to be stateful and predictable.

More broadly, PayPal instills the idea that trust is a systems property. Users trust the product because it works reliably under all conditions. In education, that trust is just as important—if the platform fails at the moment a learner is most engaged, or during something like an exam, you don’t just lose a request, you lose trust in the system.

Author

  • Tom Allen

    Founder and Director at The AI Journal. Created this platform with the vision to lead conversations about AI. I am an AI enthusiast.

    View all posts

Related Articles

Back to top button