Future of AI

The Future of AI is Local

By Butian Li, CEO of Bless

While AI tools frequently provide rapid and useful responses to our queries, we’ve all seen ChatGPT or Claude getting stuck. Sometimes the answers aren’t as detailed or as accurate as we want, and sometimes they don’t come at all — we sit staring at the loading dots, and eventually refresh the page. 

We know that AI can get smarter. But fixing the lag that can interrupt our usage of these tools may prove more challenging. 

Indeed, the next frontier in AI is the shift from cloud computing to local-first computing. For us to create lightning-fast virtual assistants — perhaps appearing vividly through augmented reality — we’ll need to reduce the distance between ourselves and the machines that run the compute. Right now, we’re seeing performance and latency issues because of the massive distances involved in cloud inferencing and round-trip data transmission. Besides user experience problems, the current model also consumes large amounts of energy and presents risks in privacy, security and centralized data custody. Users are left with a slow-paced experience that leaves them little control. 

Edge computing, which brings the computational workloads needed for AI closer to end users, has become a necessity for real-world applications like autonomous vehicles, personal agents, robotics and more. Some commentators have described the race to refine and scale edge computing architectures as a new gold rush, with the edge AI market projected to become larger than the current AI-centric cloud computing industry. 

The good news is that personal devices are becoming more and more powerful. For example, the iPhone 16 has a 16-core neural engine, which excels in AI-related computation and is similar to servers used for inference tasks. 

Of course, we won’t see servers replaced anytime soon — by nature, servers have fewer physical constraints than consumer devices, and are more capable of running the most popular systems like large language models (LLMs). Edge devices must prioritize resource efficiency, and so are a better fit for small language models (SLMs), the slimmed-down counterparts of LLMs. Lightweight SLMs are already available, such as Google’s Gemma and Microsoft’s Phi. 

LLMs are designed to support an enormous variety of uses, but most applications we use today are designed to perform specific tasks. From a practical perspective, optimizing SLMs to excel in specific areas could save tremendous resources during both the training process and running applications. 

Single-device inferencing is sufficient for lightweight models involving millions or billions of parameters, but to unleash the true power of edge inferencing — involving complex tasks and continuous updates — edge networks could interoperate with decentralized compute systems to tackle more intensive workloads. 

Using distributed mass computation, edge networks could handle a far broader range of AI tasks by leveraging decentralized computing. This means real-time, lightweight experiences—like personal AI assistants and robotics—would be processed locally on a single device, while more complex, resource-intensive workloads could be offloaded to nearby powerful hardware or a network of collaborating machines.

This distributed approach solves key challenges like congestion, device capability, and availability. Instead of relying solely on centralized cloud servers, a local-first model enables parallel processing across multiple devices, dramatically improving compute speed, efficiency, and responsiveness.

But how do we aggregate insights across a decentralized network without requiring centralized processing? A unified edge architecture could allow peer nodes to share insights in real time, ensuring both the AI models and data remain up to date. Regular model updates would be pushed to edge devices, creating an adaptive AI ecosystem that evolves based on collective learning rather than relying on a single point of control.

Swarm intelligence, where multiple AI agents collaborate autonomously, becomes viable in this framework. Coordinating agents across different devices, ownership structures, and security permissions will require robust trust protocols and decentralized verification mechanisms. However, when implemented effectively, this approach allows for real-time decision-making without manual intervention. For example, in a drone swarm, if one drone detects an obstacle, the entire fleet can instantly adjust its flight path in milliseconds—demonstrating the potential for autonomous, large-scale coordination.

Looking ahead, edge-network inferencing unlocks a scalable path for real-time AI, removing the limitations of local devices. It paves the way for intelligent, autonomous systems that can operate efficiently, securely, and with minimal latency. By shifting AI closer to users, we move toward a future where AI is not only more responsive but also more privacy-preserving, energy-efficient, and resilient, enabling the next generation of intelligent, purpose-built applications.

________

About Butian Li

CEO of Bless, pioneering the world’s first shared computer—a decentralized network where everyday consumer devices power the internet. Under his leadership, Bless is reshaping internet infrastructure by enabling laptops, smartphones, and tablets to contribute their compute power collectively.

Bless’ first-generation product, Tap Compute, allows users to seamlessly share their device’s computing resources through a web browser, supporting AI inference, data processing, and web hosting. By decentralizing these essential services, Bless is shifting control away from large corporations with massive data centers and back into the hands of everyday people.

Author

Related Articles

Back to top button