The pace of change brought about by the internet has upended the business models of publishers. Some have adapted quickly to the digital world, but others are struggling to keep up with digital-led publishing processes and the transition from a document-centric landscape to a data-driven one. Publishers are aware of the challenges they face in a disruptive and competitive environment, but many do not know how to unlock the value of their content and data.
The value of good content classification
Many are looking to data science and contemporary AI for answers. One powerful and successful use of AI in publishing is content classification, which applies subject metadata – descriptive information detailing what the content is about – to content automatically.
Subject metadata can provide useful business intelligence by increasing a business’s visibility of content performance against core business “subject” terms. Using user analytics, businesses can facilitate accurate decision-making on when and what to publish, and also open up new opportunities for discovery, targeted advertising and content recommendations, as well as improving search engine optimisation (SEO) if their subject tagging is well executed. Importantly, content classification enables innovation agility, because it’s easier and faster to adapt existing content to new purposes when it’s described and organised and it reduces duplication of effort as content moves through publishing workflows.
How AI auto-tagging transforms publishing
Automated classification software lets publishers categorise any piece of content according to their subject metadata. Using AI to automatically classify content, or to “auto-suggest” the right terms to content editors, can hugely streamline the content production process. It can also provide a route to automatically classify archive content and to re-classify it whenever the subject metadata is updated, in order to enable a business to look at its archive knowledge assets through the same lens as its new content.
Once a publisher has assembled enough examples of best practices, the automated classification software can learn how to emulate that behaviour, both in terms of accuracy and style. At that point, the process can be fully automated. This speeds up the publishing process, taking away a repetitive job from time-poor content creators and experts. By automating the tagging process, publishers can dramatically lower human error, increase consistency, and free up resources to focus on core business processes. Downstream, this unlocks better user journeys for consumers, better content analytics, and provides for better and more rapid innovation.
Knowledge Graphs are revolutionising the classic “tagging” paradigm
Tagging content has been a digital publishing staple for over two decades, helping publishers organise content, perform rudimentary content analytics, and deliver tag-based user journeys and content aggregations. In many cases, publishers use the tag features built into their standard CMSs. WordPress and other CMSs that are so widely used have a rudimentary taxonomy-oriented content tagging capability that enables a publisher to generate aggregations and some
level of meaningful interconnection quickly. While the SEO and discovery gain of simple tag aggregations is a benefit, this approach hails from before 2010 and falls well short of what publishers need today.
In order to innovate rapidly, and to utilise AI to achieve market advantage, publishers now need to make their subject metadata available across the entire enterprise outside of CMS, and they need to expose the meaning and relationships in this subject metadata to both people and machines. Without this, tagging tends to deliver low value, and as a result, is neglected. Symptoms of this include disambiguation issues, duplication, inconsistency, poor user journeys for audiences and for content creators, and most importantly disconnection between business intelligence, audience analytics and commissioning intelligence.
While automated content classification systems have been available for a while in the form of NLP, machine learning and off-the-shelf products, publishers are still struggling with the same issues as they struggle to mobilise this capability: scalability, training overheads, accuracy, aboutness and consistency.
Shaping the future of publishing
Some large publishers have entrusted data scientists to build their own content classification systems. However, building a robust and scalable AI-driven content auto-tagging platform requires extensive software engineering, something that is not always in the core “data scientist” skill set. Publishers should be careful to focus their machine learning investment on core differentiators, rather than on automated classification which has been commoditised as SaaS elsewhere. Although partnerships are often deemed a good compromise in such situations, the end product must be brand-differentiating – which is fundamentally not the case with content classification systems that are driven by AI.
With the publishing sector financially stretched, the total cost of ownership and product differentiation is more important than ever. To drive down the total cost of ownership and to improve the efficiency of innovation publishers should firstly implement a knowledge graph to manage their subject metadata and make it available across their enterprise. Secondly, they should buy in a highly scalable AI auto-tagging service, and then apply metadata to new content that is sent to it for prediction. By integrating this with a CMS and publishing stack, an automated tagging system can learn continuously on-the-fly, and react quickly to new concepts that arise in the ever-changing world of publishing.