AI use within businesses has reached a turning point. The era of early exploration – fuelled by the boom of ChatGPT and generative AI – is giving way to execution, as companies face mounting pressures to demonstrate how they are putting their vision and ambition into action.
Organisations are now coming to terms with the realisation that adopting AI also requires adapting systems that came before it. They may be generating more data than ever before, but their infrastructures are often ill-equipped to cope with rising data demands – leaving some business-critical insights trapped in silos while creating data sprawls elsewhere.
Inefficiencies in data handling can not only impede the efficacy of organisations’ AI outputs and create data security blind spots, but also result in significant financial losses. In fact, organisations are losing an average of six percent of their global annual revenue due to underperforming AI models, which equates to approximately 406 million US dollars.
The pursuit of reliability and control
If organisations want to trust the accuracy of AI outputs, they must first have full confidence and control over the way data is fed to models and generative AI tools. This requires reliable and complete datasets, as well as transparent and auditable processes. However, once personal data is input into third-party tools like ChatGPT, control over that data is often lost. The data may become part of the AI tool’s knowledge base and organisations may struggle to satisfy requests to delete or amend data in line with legislation.
To mitigate against this, more and more organisations are localising generative AI tools to run on their own data and systems – a process called Retrieval-Augmented Generation (RAG). This approach enhances the relevance of AI outputs by overlaying trusted proprietary data onto large language models (LLMs), ensuring that internal data remains private. It offers security by eliminating the need to retrain models on sensitive information, and provides an alternative to building end-to-end models, which require significant processing power.
Technical barriers to AI adoption
The potential of RAG has made companies eager to unlock more data, including structured, semi- and unstructured data, from various sources such as applications and databases. However, since the data pipelines carrying this data require frequent maintenance and strict controls, data teams are often stretched thin and forced to make difficult trade-offs to maximise the relevancy of the insights.
Custom data pipelines are difficult to maintain because vendors and SaaS providers frequently update their schemas or APIs, forcing data engineers to constantly update pipelines. If updates are missed, pipelines break, requiring even more engineering resources to fix.
These technical barriers are also to blame for data scientists having to spend two-thirds of their time on basic data preparation instead of building ML models. With inefficient data movement processes leading to wasted resources, underutilised data talent and potential business risks, there has never been a greater urgency for CIOs to resolve this conundrum.
Making security the top priority
If the task of centralising all relevant data in a clean, fast and secure way wasn’t hard enough, organisations in highly regulated industries such as finance and healthcare face additional hurdles. These organisations must take extra care when handling sensitive data for AI-driven insights and may find moving data from on-premise systems to the cloud particularly difficult. O’Reilly’s Cloud Adoption report shows that 55 percent of enterprises still rely on traditionally managed on-premises systems and only five percent plan to switch from cloud to on-premises infrastructure.
What’s more, the sensitivity of certain data may necessitate limiting who processes it and where that processing occurs. To get ahead of this challenge, organisations are increasingly adopting a hybrid approach to data management.
With a hybrid deployment, organisations can retain full control over their sensitive data by performing all the processing in their own secure environment, while tasking a data integration partner to maintain and update pipelines automatically. This way, enterprises can manage both cloud-based and on-premise pipelines from a single, unified control plane – ensuring compliance without compromising performance. Unlike custom-built data pipelines that lack proper visibility and governance, emerging hybrid models offer a secure and scalable approach to data operations, while freeing up teams to focus on value-added tasks.
Creating a sustainable and robust data strategy
AI has the potential to vastly improve our lives, and demand is increasing in many industries, chief among them healthcare, financial services and the public sector. In fact, Gartner predicts that by 2026, eight in 10 enterprises will be using generative AI-enabled applications.
For now, however, major concerns around data management remain. Data readiness for AI is a growing concern, with CIOs increasingly focusing on how to build the right data foundation for AI success. If organisations want to make rapid advancements on their AI roadmaps, they must put in place robust data processes to ensure trust in the data. This will offer them the best chance of minimising data risks and driving sustainable growth.