Changing consumer needs and expectations mean the global natural language processing (NLP) market is projected to grow from $20.98 billion in 2021 to $127.26 billion in 2028 – a CAGR of 29.4 percent over eight years.

A survey to gain insight into people’s perceptions around the use of AI voice applications was carried out in 2022. Among its findings was a growing expectation among consumers that they will be greeted by automated chatbots and voice systems when engaging with companies. This is evident from the recent launch of ChatGPT which took human fascination with AI to another level. Indeed, those companies that get the NLP experience right can improve the customer experience by reducing waiting times, and free up the time of their customer service representatives for more complex activities.

But getting that experience right isn’t necessarily straightforward. A level of attention to certain details and nuances is needed for the development of AI-powered NLP technologies that isn’t required when building other digital products. Preventing bias and ensuring the accuracy of voice recognition requires organisations to focus on the quantity, quality and diversity of the data used to train the algorithms that power these technologies, and to commit to ongoing testing to address unforeseen flaws.

Expectations and experiences

According to the survey about consumer expectations regarding AI, about a third of consumers today (31%) always expect companies to use automated chatbots, although three in five (61%) said it depended on the nature of the industry. And, while one in ten (11%) people don’t think call centres should use automated IVR systems, almost half of the survey’s respondents (46%) said it should be normal practice. Similarly, 44 percent said they always expect mobile apps to use voice assistants or voice search features, with two in five (41%) saying it depended on the app category.

Despite these expectations, however, users’ actual encounters with such technology weren’t always satisfactory. The top complaints included being unable to find the answer they were looking for, chatbots not interpreting prompts to understand what they were asking, and chatbots wasting their time before eventually connecting them to an agent anyway.

One of the main reasons behind these complaints may stem from the quality of the data and the way the machine learning (ML) algorithms used to power the chat functionality were trained, the complexity of which is often underestimated.

Expanding the range

When training an ML algorithm, data scientists must incorporate a wide range of different data sets and inputs. Depending on the algorithm’s purpose these can comprise a huge number of voices, images, sounds, or documents. But ultimately, the algorithm will only be as good as the data used in its training. Sourcing high quality training data needed to meet an organisation’s requirements can be a huge logistical challenge, especially if those requirements involve powering a mass market, consumer facing chatbot or automated IVR system.

In regard to voice, for instance, it is crucial to engage a large and diverse group of speaker data. In-house software engineers, developers, and QA specialists are often from similar or limited age ranges, locations, gender, and socio-economic backgrounds so, although unintentional, bias can occur if these people are used as the sole source of data collection and labelling.

It would be far more productive to expose the algorithm to people and experiences that have more in common with the customers it will eventually serve.

Sourcing at scale

Bias isn’t the only issue that can arise from limited training data sets. Speech can be simulated to some degree, but AI requires exposure to a diverse range of real voices and accents for anything approaching accuracy. With more than 40 different dialects in the UK, for example, the data scientists at a UK broadcaster involved in a recent voice assistant project found it almost impossible to replicate that level of language, speech, and intonation in a traditional lab. To address this, 100,000 different voice utterances, from 972 people across the UK, were used to train the smart voice assistant by exposing it to a wide variety of different voices and accents.

Superior digital quality does not end with data, however. Testing the experiences with interactions from real people can help unearth unexpected flaws and glitches in the customer experience. A crowdtesting model helps provide the diversity of voices, devices, locations, cultural nuances, and a huge number of other demographics and data points that lab testing or in-house testing cannot provide.

As with any technology that eventually becomes ubiquitous in a marketplace, consumers will continue to have higher expectations of NLP interactions. Companies that find the biggest successes with NLP and voice activated technology moving forward will ultimately be those uncompromisingly focused on quality and providing excellent customer experiences.

AIJ Guest Post 10 February 2023

3 minutes read

Crowdsourcing training data in future-proofing chat applications

By Adonis Celestine, Senior Director, Global Automation Practice Lead at Applause

Expectations and experiences

Expanding the range

Sourcing at scale

Expectations and experiences

Expanding the range

Sourcing at scale

Related Articles

Rethinking Analytics for Real-world Decision Making

That AI Model on Your Feed Was Trained on Your Face. Should You Get Paid?

Engineering Trends Shaping Software and Application Development in 2026

Choosing a SQL Server GUI Tool: What Really Matters in Daily Work