Intelligence at the edge
In the not-so-distant past, AI lived almost exclusively in the cloud. Large-scale inference was the domain of hyperscalers and GPU-rich data centres. But in 2025, that paradigm is dissolving. From Google’s Gemini Nano on smartphones to Meta’s LLaMA models running on Raspberry Pi-class boards, artificial intelligence is going local – embedded, offline, and edge-first.
This is not just a cost or privacy optimisation. It’s a tectonic shift in application architecture. Lightweight inference runtimes such as llama.cpp, GGML, and Mamba variants are turning even modest silicon (think Apple’s Neural Engine, Qualcomm’s Hexagon DSPs, or NVIDIA’s Jetson Nano) into real-time reasoning machines. But while models have leapt forward, the infrastructure underpinning them has barely budged.
Enter the embedded database. Or rather, re-enter it – with questions.
From cloud-bound to context-aware: LLMs at the edge
For decades, edge computing meant fixed-function systems: IoT devices sending sensor data back to the cloud. But with transformer models shrinking through quantisation and architectural distillation, these devices are now capable of contextual understanding and reasoning locally.
Examples abound:
- Gemini Nano powers summarisation and smart reply entirely on Android devices
- Alpaca and LLaMA derivatives are running interactively on laptops and microcontrollers
- PrivateGPT enables retrieval-augmented generation without ever sending tokens to external APIs.
Developers now seek agentic behaviour on-device: models that interpret local data, recall user-specific memory, and respond fluidly – all without an internet connection. This requires persistent, evolving knowledge bases that live beside the model. And it’s here that traditional data infrastructure is showing its age.
Where legacy embedded databases fall short
Embedded databases like SQLite, Berkeley DB, or LevelDB are marvels of compact engineering. But they were never built for the demands of intelligent, schema-evolving, real-time applications.
Key shortcomings include:
- rigid schemas: updating schemas dynamically (e.g. as LLMs evolve representations) is non-trivial or unsupported
- lack of vector support: most cannot store, index, or search vector embeddings natively – making retrieval-augmented generation (RAG) hard or hacky
- no temporal context: these databases aren’t designed to act as memory – tracking evolving states, agents, or conversational threads
- weak concurrency and streaming: handling multiple reasoning threads, event streams, or multi-modal data (e.g. image + text) at once is challenging.
As a result, developers often bolt together multiple systems: a relational core, a key-value cache, an in-memory vector store. This incurs latency, complexity, and fragility – not ideal at the constrained edge.
What a modern edge-ready database must support
To support AI-native applications at the edge, a new generation of databases must go beyond storage. They must become cognitive substrates – the memory, context, and decision fabric for autonomous systems.
Key requirements include:
- Multi-model flexibility. Edge agents may work with structured tabular data, unstructured text, hierarchical documents, time-series metrics, and vector embeddings. A modern database must support multi-model storage and querying without complex glue code.
- Native vector preparations. The future is retrieval-augmented. Whether for summarisation, personalisation, or local knowledge grounding, edge systems need persistent vector memory – with fast approximate nearest neighbor (ANN) search and efficient updates.
- Dynamic schemas and metadata. Applications evolve. So should their schemas. Systems must support on-the-fly schema changes, flexible metadata, and introspective capabilities – enabling autonomous agents to extend their world models without downtime.
- Low latency, high concurrency. Edge inference is reactive. Queries must resolve in milliseconds. Support for concurrent, non-blocking operations, in-memory caching, and direct model-data affinity are crucial.
- Security at the core. Edge deployments are inherently exposed. Databases need zero-trust authentication, encryption at rest and in transit, and fine-grained access control – especially when handling user memory or private embeddings.
- Local-first, cloud-optional. Cloud synchronisation is useful – but optional. The database should function fully offline, with eventual consistency or cloud merge when reconnected. Think Git for memory, not just Dropbox for data.
The database as a decision engine
As LLMs move to the edge, we must rethink the role of the database entirely. It’s no longer just a store for facts – it’s the stateful memory and decision engine of autonomous applications.
In this emerging view:
- The database maintains agentic state – tracking goals, actions, environment, and interactions.
- It hosts semantic context – embedding similarity, user preferences, local observations.
- It guides reasoning – providing structured grounding for inferences, summaries, or responses.
- It supports collaboration – syncing across multiple devices, agents, or even users in mesh networks.
This turns the database into a co-pilot, not just a bookkeeper.
As companies like Open Interpreter, Personal.ai, and Ollama explore multi-agent frameworks, the demand for shared, contextual, low-latency memory becomes paramount. The edge will not be passive. It will reason, adapt, and act – with the database at the core.
Rethinking the stack
The AI edge revolution is not just about smaller models. It’s about re-architecting the entire intelligence stack – from models to memory, runtimes to storage.
The database must evolve accordingly. It must:
- Speak vectors and documents natively
- Adapt its shape on the fly
- Serve agents in real time
- Run on anything from a phone to a cloud cluster
- Guard data as fiercely as it serves it.
In this sense, we do not just need a new database for the edge – we need a new category. One that blends database, vector store, knowledge graph, and stateful memory into a unified, dynamic system.