AI

The Quiet AI Revolution Happening Inside Your Messaging Apps

While the AI world obsesses over massive models and cloud-scale infrastructure, a quieter revolution is unfolding in your pocket. Valerii Popov, a senior software engineer at WhatsApp, is part of a small but impactful group making AI smarter, faster, and entirely local.

With over a decade of experience across mobile platforms, from early Android work to leading machine learning deployment at scale, Valerii has helped shape everything from encrypted messaging to intelligent voice features. At WhatsApp, his focus is bringing cutting-edge intelligence directly to usersโ€™ devices, without sacrificing privacy, speed, or battery life.

We sat down with Valerii to talk about whatโ€™s working in on-device AI today, whatโ€™s still hard, and where the next wave of innovation will hit.

Whatโ€™s been the biggest on-device AI win for messaging recently?

“Early wins that really landed with users were on-device translationโ€”both text and voiceโ€”and voice-note summarization. Especially in global markets, this is a game changer. Imagine receiving a voice message in a language you donโ€™t speak and getting a concise summary, instantly and offline.”

Thatโ€™s not all. Valerii points to under-the-hood AI features like local ranking of messages, filtering low-value business content, and even context-aware smart replies. These arenโ€™t flashy, but they workโ€”fast, private, and without pinging a server.

โ€œThese features keep latency low, work offline, and avoid sending sensitive content to servers. Thatโ€™s a trifecta of user value in messaging.โ€

How do you decide what runs locally versus in the cloud?

Itโ€™s a balancing act between privacy, latency, and resource cost.

โ€œIf it touches user content and we donโ€™t have clear consent, it stays on-device. Full stop.โ€

Valerii often starts with a server-based model to test feasibility, then moves components on-device when privacy or real-time performance demands it. For example:

  • Offline needs? Device wins.

  • Battery-intensive task? Push it server-side.

  • Simple classification task? Use a small on-device modelโ€”donโ€™t waste a 30B parameter LLM.

Most production systems at scale end up hybrid: โ€œWe often try the model on-device first. If confidence is low, we fall back to the server. A remote config controls this gating logic.โ€

Whatโ€™s actually working in reducing model size and power use on mobile in 2025?

โ€œQuantization to INT8 is the defaultโ€”it gives us 3โ€“4x smaller models and up to 3x speedups with minimal accuracy loss.โ€

Beyond that, Valerii highlights distillation (โ€œteacher-student training is gold for on-deviceโ€), operator fusion to reduce memory traffic, and NPU offloading as game changersโ€”when supported.

Pruning is hit-or-miss: โ€œIt saves size, but real-world speedups depend on kernel support. And over-pruning hurts performance, especially on attention-heavy models.โ€

His rule of thumb? Classifiers under 1 MB, speech or ASR models between 5โ€“20 MB, and keep cold-start times under 500 ms.

Can you really personalize without sending user data to the cloud?

Yes, but itโ€™s tricky. The go-to solution: federated learning.

โ€œA global model gets fine-tuned on your device using local data. Only anonymized weight updates are sent backโ€”never the raw messages.โ€

In practice, personalization often involves fine-tuning only the final layers of a model or running a few local SGD epochs. For features like speech recognition or smart replies, this adds meaningful value without compromising privacy.

How do you enforce safety and moderation in E2E encrypted environments?

This is where on-device shines. Valerii separates concerns clearly:

  • On-device: Lightweight moderation (e.g., blur NSFW content, flag suspicious links), user-side nudges, local heuristics.

  • Server-side: Network-wide abuse detection, rate limiting, and coordinated spam analysisโ€”done via metadata, not content.

โ€œOn-device checks are conservative and reversibleโ€”think โ€˜Are you sure you want to send this?โ€™ prompts. No permanent actions without server-side confirmation.โ€

Importantly, any local model runners are sandboxed and cryptographically verified before execution to ensure security.

Whatโ€™s your blueprint for scalable hybrid architecture?

โ€œLocal-first inference with server fallback is standard. We use remote config to enable or disable features based on device capabilities.โ€

iOS and Android bring their own pain points:

  • iOS: More predictable performance; tight Core ML + ANE integration.

  • Android: Device fragmentation, inconsistent NNAPI support, stricter APK size budgets.

Maintaining parity means building from a common intermediate representation like ONNX or Core ML, using device allowlists, and testing latency/battery regressions across the board.

โ€œRollout is blocked unless we hit latency and memory targets on both platforms.โ€

How do you make on-device AI privacy-defensible?

Beyond slogans, Valerii details a rigorous approach:

  • Key management + secure enclaves for sensitive caches.

  • Rollback protection for model integrity.

  • Schema validation + code signing for model runners.

  • Clear user consent flows and opt-outs.

โ€œNo shortcuts. Every component is reviewed with a privacy lens.โ€

So, whatโ€™s next in on-device AI for messaging in the next 12โ€“18 months?

Valerii is bullish on small multimodal models, better on-device ASR, and edge summarization.

โ€œWeโ€™ll see more intent classification, lightweight agents, and context-aware workflowsโ€”all running locally. The user expects intelligence without compromise.โ€

And maybe, just maybe, weโ€™ll stop calling it โ€œon-deviceโ€ and just call it โ€œsmart.โ€

Author

  • I am Erika Balla, a technology journalist and content specialist with over 5 years of experience covering advancements in AI, software development, and digital innovation. With a foundation in graphic design and a strong focus on research-driven writing, I create accurate, accessible, and engaging articles that break down complex technical concepts and highlight their real-world impact.

    View all posts

Related Articles

Back to top button