Intelligence no longer dwells only in the cloud. Just as the rise of mobile shifted compute from centralised mainframes to the gadgets in people’s pockets, on-device AI is about to change the face of a newly mainstream technology and begin a whole new era of innovation.

Apple Intelligence offers a glimpse of what’s ahead, combining on-device models with a cloud fallback system called Private Cloud Compute. It allows models to shift work between local and server runtimes depending on complexity, privacy, and compute needs. The next step is models that are fully device-based.

On-device AI is a clear solution to the accelerating challenges of traditional cloud-dependent GenAI. Small models that run on modern smartphones are certainly not AGI. But neither are cloud LLMs, which have not achieved superintelligence and may not get there anytime soon – if ever, according to some large language model critics.

But what on-device models can do is perform useful tasks independently and with no need for an always-on cloud connection. This is big news for brands, who currently pay Salesforce 10 cents per AI “action”. Personal AI models can do similar tasks for free.

The dawn of this new paradigm is also great for consumers, who will be able to access deeply personalised experiences like nutrition coaches, financial assistants, automobile navigation systems and much more. These experiences use their data to enable personalisation without sending it off into the black boxes lurking in distant, mysterious data centres.

On-device AI – which we’re calling device-native AI – is a hugely exciting new segment which gives developers massive opportunities to deliver new services in new ways. But to get the best out of this technology, there are several obstacles that need to be overcome first.

Siloes and competing giants

One of the biggest obstacles to developing on-device AI is the emergence of new siloes between ecosystems. Apple, Google, and other platform owners are each building their own frameworks, privacy rules, and hardware accelerators – meaning developers can’t write once and deploy everywhere. Apple’s tight integration of Neural Engine APIs and on-device privacy requirements won’t easily translate to Android’s more open but fragmented AI stack, where Tensor and Qualcomm chips each have different capabilities and SDKs.

The result is a fractured landscape: engineers must tailor models, optimise runtimes, and comply with unique data policies for every major platform. Instead of accelerating innovation, this balkanisation of AI standards risks slowing progress, entrenching walled gardens and making life harder for independent developers trying to reach users across devices.

The answer is to use an orchestration platform that’s data and model-agnostic, allowing AI workloads to run securely across devices and ecosystems without being locked into Apple’s or Google’s proprietary rules.

Infrastructure and tooling gaps

The most fundamental challenge in edge-native AI is the absence of orchestration infrastructure built for on-device deployment. Traditional, cloud-era tools were designed for centralised processing, not for the distributed, privacy-sensitive nature of mobile environments.

As a result, developers are forced to repurpose enterprise frameworks that don’t easily support rapid iteration or local model updates. Bridging this gap requires a new generation of orchestration and deployment platforms that handle distributed workloads, local monitoring, and secure model lifecycle management – giving developers the same control and visibility they expect from cloud infrastructure, but directly on the device.

Fragmented data flows and complexity

Building on-device AI can introduce chaos in data flow management. Developers must choreograph deterministic rules, machine learning models and generative components without letting sensitive data leak off the device.

Fragmentation worsens in multi-tiered intelligence setups, where logic, inference, and generation must operate in sync. Traditional frameworks weren’t built for this kind of task, so the emerging solution is to use a neutral orchestration layer that manages state and coordination across devices and models while keeping all data movement privacy-compliant.

Model size and storage constraints

Even the latest Small Language Models, such as Gemma 3, Llama variants, and Qwen2.5, strain against smartphone memory limits. Delivering rich, personalised AI often requires several specialist models, which can quickly exhaust device storage. Dynamic model management – compressing, caching, and loading models on demand – solves this challenge by balancing sophistication with footprint, maintaining seamless experiences without offloading data to the cloud.

Constrained context windows

On-device models can’t yet match the vast context windows of their cloud-based counterparts, limiting long-term coherence and memory. Maintaining continuity in conversations or decision-making requires smarter context handling – summarisation, hierarchical memory, and selective retrieval that preserve meaning while minimising data exposure. These techniques allow small models to act as if they have broader awareness, without breaching on-device privacy boundaries.

Resource management and battery optimisation

AI inference on mobile hardware is a constant balancing act between performance, heat, and battery life. Because conditions vary with user behaviour and device state, developers increasingly rely on adaptive scheduling and workload distribution systems that scale model complexity in real time. This allows AI to respond intelligently to device constraints, maintaining user experience while preventing runaway resource drain.

Limited device-native tooling

Conventional mobile dev environments offer few tools for tracing model performance or debugging privacy boundaries. Developers are filling this gap with localised debugging frameworks and portable runtime environments that can monitor inference, manage versioning, and validate compliance directly on the device. These tools bring cloud-grade observability to the edge without compromising the security model.

Privacy compliance by design

Global privacy laws mean compliance can’t be retrofitted – it must be engineered into every data flow. Granular consent controls, data minimisation, and transparent user governance are now embedded into the application layer itself. By managing data contextually and locally, developers can prove compliance across jurisdictions without sending personal data off-device.

Brand risk and liability

As AI agents make autonomous decisions on personal data, brand and legal exposure rise sharply. Bias, hallucination, and misinformation pose reputational hazards, particularly in health or finance. To mitigate these risks, developers are building real-time validation and feedback loops into AI workflows, ensuring every decision is traceable, explainable, and reversible before it reaches the user.

Integration complexity

On-device AI can’t exist in isolation. It must connect with enterprise systems like CRMs and CDPs while preserving local data sovereignty. The answer lies in using open, model-agnostic orchestration layers that synchronise device intelligence with enterprise infrastructure through secure metadata and insight sharing, rather than raw data transfer. This maintains analytical value without compromising privacy or compliance.

The next evolution of AI won’t live in the cloud. It will live with us, on the devices we carry. By overcoming fragmentation, tooling gaps, and privacy hurdles, developers can create systems that are faster, safer, and more personal than anything built on remote servers. Device-native AI is the foundation of a new, decentralised era of intelligence.

DataSapien is already offering the tools that let developers overcome these problems and start building for tomorrow. Small models are going to have a huge impact. The new competitive battleground they are creating is not yet dominated by big players like OpenAI, so it offers new opportunities for innovative players as well as brands who will be able to reach their customers in new ways. Developers who overcome the challenges we’ve set out will have a chance to carve out a strong niche in the evolving AI ecosystem.

Author

AIJ Thought Leader

View all posts

AIJ Thought Leader 11 December 2025

5 minutes read

Overcoming the challenges of deploying on-device AI

By StJohn Deakins, co-founder, DataSapien

Siloes and competing giants

Infrastructure and tooling gaps

Fragmented data flows and complexity

Model size and storage constraints

Constrained context windows

Resource management and battery optimisation

Limited device-native tooling

Privacy compliance by design

Brand risk and liability

Integration complexity

Author

Siloes and competing giants

Infrastructure and tooling gaps

Fragmented data flows and complexity

Model size and storage constraints

Constrained context windows

Resource management and battery optimisation

Limited device-native tooling

Privacy compliance by design

Brand risk and liability

Integration complexity

Author

Related Articles

AI’s Growth Curve Is Colliding with Infrastructure Reality

AI-Driven Security Protocols: Navigating the Transition to Fully Autonomous AI Workflows

Consumers trust AI with life decisions. But do they trust brands to use it responsibly?

Reframing Screen Time: Using AI to Transform How Kids Learn