
The car key has been quietly disappearing for two decades. The metal blade gave way to the remote fob, the fob gave way to the phone in your pocket, and now the car recognizes you before you touch anything. A camera on the exterior reads a gesture, identifies a face, and decides whether to unlock. No fob, no app tap, no contact at all. Production vehicles across Asia and Europe already do it.
That turns a plain exterior camera into something more interesting than a parking aid. The same lens that watches for an authorized gesture can load your driver profile as you walk up, then stand guard over the car while it’s parked.
But here’s the part that trips up most engineering teams. Getting a model to recognize a face or a hand in a lab is the easy part. Doing it on an automotive-grade processor, inside a wake-up latency budget, a power budget, a privacy boundary, and a formal development process, is where viability gets decided.
Vehicle access is becoming an edge AI problem
Three functions are converging on the exterior camera. Gesture access: a touchless wave to open a door or tailgate. Driver identification by face, which loads a personal profile (seat, mirrors, climate, driving mode) before the driver sits down. And parked-vehicle monitoring, a sentry mode that watches the surroundings while the car sleeps.
All three sit on the critical path of a physical interaction. A door that opens half a second late feels broken. A face check that stalls waiting for a server is worse than no face check at all.
Cloud processing isn’t inherently bad here. Plenty of automotive features depend on it. But for access functions, latency, privacy, and power push you toward on-device processing. The decision has to happen locally, fast, and without streaming who-stood-near-your-car to a remote service.
Why lab-grade computer vision doesn’t transfer to automotive hardware
Teams that have shipped a vision demo on a workstation tend to underestimate the gap to a production vehicle. Three barriers come up again and again.
Off-the-shelf vision models rarely scale to automotive processors without serious custom work. A model that tops the benchmark on still images can fall apart on a live camera stream, and porting it to an embedded inference engine is its own project.
An always-on exterior camera lives inside a tight power budget. Draw too much current in standby and you flatten the 12 V battery. Pick the wrong wake-up path and you’re looking at a late hardware redesign.
ASPICE process requirements can’t be bolted on at the end. Requirements traceability, design specs, and test procedures have to be planned from the architecture stage, or the project carries debt that surfaces the moment an OEM asks for evidence.
None of these is a model-accuracy problem. They’re systems-engineering problems, and they decide whether a prototype ever reaches a vehicle.
One camera platform, three operating modes
A reference build shows how these constraints collide better than any slide. As an internal R&D project, Promwad’s automotive vision engineering team built a reusable AI camera platform for exterior vehicle access, to check whether all three functions could run on a single automotive-grade chip. Here it ran on an Ambarella CVflow System-on-Chip, picked because it handles image processing and AI inference on one part suited to edge AI in automotive camera systems. Other platforms can do this too. The engineering logic carries over; the silicon brand doesn’t.
The platform runs three modes. Gesture access for touchless entry. Driver face ID to trigger profile loading over the vehicle bus. A Sentry Vision mode for parked-vehicle surveillance. Mode switching happens in firmware on the same hardware, so one validated platform gets reused across vehicle programs instead of being respun for each one.
Latency and wake-up time are product requirements, not benchmarks
For a feature you trigger while standing at the door, response time is the experience. In the tested prototype configuration, the system captured its first video frame within 100 ms of a CAN wake-up signal and returned a gesture decision within 120 ms. Single-frame inference ran at roughly 7 ms.
Those numbers matter because the person at the car has no patience for a system that “thinks.” Respond the instant they gesture and it feels effortless. Lag, and it feels unreliable, which defeats the whole point of access control. These are prototype figures under defined test conditions, not a production guarantee. What they show is that the latency budget was a design input from day one, which is how it has to work.
Face recognition must be private by architecture
Biometric access raises an obvious question: where does the face data go? Here, nowhere. All processing runs on-device, and no biometric data leaves the vehicle. That’s privacy by design in the literal sense. The architecture keeps the data local, not a policy document.
On accuracy, some restraint is warranted. In the prototype setup, face verification cleared 99% for two enrolled users under defined test conditions. That’s a real result. It doesn’t promise the same number across every user, lighting condition, market, or vehicle, and validation across a real enrolled population is separate work. On-device processing supports GDPR compliance without guaranteeing it. Final alignment still depends on retention policy, user consent, biometric data handling, and the OEM’s deployment context.
Power budget decides whether the system can stay ready
A sentry that flattens the battery overnight isn’t a feature. While parked, the platform sleeps at under 100 µA and wakes on motion instead of recording around the clock. When something moves, it classifies the scene (a passerby versus someone inspecting the car), pushes a notification with a short clip to the owner’s app, and goes back to sleep.
The payoff cuts two ways. Motion-triggered capture keeps the always-ready function inside a viable power envelope, and skipping continuous recording shrinks stored data. Less to store, and less to worry about on privacy.
Production readiness starts before the software does
The least glamorous part of this work is often the most decisive. The platform was built with ASPICE CL2 process readiness baked in from the architecture stage: requirements traceability, design specs, and test procedures written in parallel with the engineering, not reconstructed after the fact.
For an OEM or Tier 1, that’s the difference between a demo and a candidate for a vehicle program. The work shows up with a documented integration path instead of a folder of clever code no auditor can trace. Process readiness isn’t paperwork for its own sake. It’s the connective tissue between a working prototype and a part a carmaker can actually ship.
What this means for OEMs and Tier 1 suppliers
For a team launching an exterior vision program, the value here is a validated base architecture, not a generic AI demo. Models are already ported to the target SoC (two real toolchain incompatibilities got fixed along the way), the system architecture has been exercised, and the ASPICE groundwork is in place. You start from a working base backed by test results instead of a blank repository, which de-risks the early phase and trims the configuration work that usually delays readiness.
The architecture scales inside existing ECU boundaries. More gestures, more registered users, integration with vehicle or fleet management apps, all without a hardware redesign. As a planning reference, the build targets an estimated hardware cost under $80 at 10K units and under $49 at 1M+. Treat that as a cost model to design against, not a guaranteed price. It’s also why single-chip integration pays off: fewer parts and one ECU make certification simpler. This kind of embedded AI and automotive camera development comes down to getting an entire stack to agree with itself.
The real shift: cameras as software-defined edge nodes
The exterior camera is quietly turning into a software-defined edge AI node, one piece of hardware whose job is redefined in firmware as access controller, personalization sensor, and parked-vehicle guard. The teams that win this transition won’t be the ones with the highest benchmark accuracy. They’ll be the ones who treat the model, the embedded hardware, the power management, the firmware, and the process readiness as one system, designed together. The model is the easy part. Everything around it is the engineering.