The most common problems when it comes to data for AI

Your subscription could not be saved. Please try again.

Your subscription has been successful.

The most common AI challenge businesses face is simple – companies realise far too late that they do not have the right data foundations to build an AI program that meets their needs. At this point, businesses tend to acknowledge they have neglected their information management and the quality and structure of their data, their core asset, is lacking.

Good information management

AI works only when it is built on a solid foundation of good information management. If your organisation has not prioritised the creation of structured information and information management practices that provide context and meaning for your data wherever it is needed, you will create data silos and meaningless data. This siloing – or ‘data disconnections’ – makes it extremely difficult to leverage your data and information for new business use cases, as well as AI. Without strong information management, you will not only have issues building your training corpus (the dataset required to train machine learning models with the desired behaviour), but you will also hit issues with explainability, feedback loops, performance optimization, and the efficient ongoing support of a live production system.

The physical separation of data is not a problem – in fact, distributed data solutions have numerous potential benefits. However, regardless of where the 0s and 1s are stored, they must live together conceptually: this is the importance of data governance and coordinated management. Only well-structured and well-described data can carry context and meaning across an enterprise. ‘Data portability’ breaks down existing information silos and prevents the creation of new ones.

Domain modelling

The best way to understand how your company perceives its current data assets is to create a shared, live map of them. As a starting point, I recommend Domain Modelling, which creates a strong foundation for all projects and supports portable structured data – essential for AI implementations. Domain Modelling is part of a broader practice of Domain-Driven Design that focuses on the structure and language of the business domain and helps organisations understand the variety and shape of their data.

Different employees will often have very different perceptions of the business domain model, so involve them in this process. Do a little digging and find out what they think. Trust me, the value locked in their heads is a significant part of your competitive advantage. In essence, this is a de-risking exercise that helps a business to understand the gaps in its shared understanding, and to drive a new shared map of your core business information. Identify together where your data could be better structured to provide value to new services, new experiments and new product ranges.

To support more efficient and continuous innovation, you should challenge your teams to articulate how existing processes should change. If your company wants to implement any kind of AI program, it’s likely that you’ll want to focus on the immediate changes and capabilities required, but looking ahead to the future and creating strong information management foundations will add real value.

As a result of this exercise, you gain a better understanding of your data landscape: Where perspectives differ regarding the domain; and areas that can be improved. Therefore, business value can then be aligned with what drives data improvement, whether it’s sourcing missing data or aligning existing data to a common structure, or simply improving data quality.

Knowledge graphs

To paint a clear picture, organisations that have a strong information management foundation are in a position to launch new products, and AI-powered capabilities rapidly, enjoyably, and with relatively small investments. Conversely, organisations with disconnected and context-less information silos will find it expensive, painful and slow to deliver new products and features and to make things worse, they will tend to create expensive new legacy support overheads with every move.

Many organisations are now considering the role of a Knowledge Graph – effectively a backbone of information adhering to the structure described in the domain model – as a foundation for their information management strategy. A knowledge graph allows the glueing together of business information with context and meaning, allowing both machines and humans to understand and efficiently use the data in a plethora of contexts.

The deeper we delve into these principles, the clearer we see how valuable it is to be able to use and access data in a way that is contextual and meaningful. This facilitates the creation of an information management strategy that covers how data is created, organised, described, and harnessed.

AI programs that add real commercial value require a combination of experienced people, reliable engineering and tools, and good information management practices. By getting it right, you can maximise your chances of success in the marketplace and generate new revenue and value.

Author

Matt Shearer

Matt Shearer is CPO at Data Language, a UK-based data science consultancy and solution provider that has worked with organisations including News UK, Jamie Oliver and Cochrane to deliver AI and knowledge graph-based digital transformation solutions. Prior to working with Data Language, he was Head of BBC News Labs, where he delivered BBC News' R&D and innovation strategy and the #newsHACK innovation event programme and turned BBC News Labs into an active innovation incubator. He holds a Bachelor of Science degree in Biology from University College London.

View all posts