Christopher Weiss, a trilingual actor with a nuanced tenor, used to read newspapers for the visually impaired. In 2012, he was replaced by a rudimentary text-to-speech robot voice. While it didn’t sound particularly good, it was enough.

“The subscribers were able to listen to the whole newspaper instead of a curated version I provided,” recalls Weiss. “The advantage was there and the quality wasn’t that important.”

One of the few voice actors who have embraced AI, Weiss believes it doesn’t make sense financially to have humans read simple texts.

“The quality of untrained text-to-speech generators in English, as well as Finnish and Swedish, which are my main languages, is so good that it sounds like a human voice to an untrained ear,” he said.

Weiss recently recorded a radio commercial using a voice clone powered by text-to-speech technology (TTS). Unlike “speech to speech” (STS), where an initial audio is used as a base, with TTS the written text is transformed directly into speech.

Weiss also uses STS to refine his voice to a more professional quality, surpassing the limitations of the recording environment. He is currently dubbing multiple characters using STS for a movie project.

Inflection and tonality: 12 months to AI disruption

AI is inspiring a new generation of creative work, such as Bioadapted. This “theater documentary” by Tjasa Ferme, an artistic director who merges science with storytelling, is based on real-life stories of AI stripping actors of their identity.

“With no legislation, there’s a market for exploitation, especially for small background actors,” Ferme said. “Some have already signed away the rights to their likeness, and their digital replicas will be filling the backgrounds in movies.”

Although AI voices sound just fine for news and simple texts, they tend to underperform with complex content, such as audiobooks.

“The intonation, tonality, and inflection are completely off in audiobook narration,” Ferme said. “One voice actor can provide personalities for all characters, and AI could never replicate this world of emotions.”

Machines were made to exclude “the probability of unpredictability and eliminate mistakes,” Ferme said. “What makes us humans is the irrational,” she added. “Robots don’t understand the nuances of subtext. They can’t ‘read in between the lines’.”

Despite limitations, a breakthrough might be imminent. According to Weiss, within the next 12 months, AI will be able to change inflection and tonality in a sentence.

Dubbing: The last rubicon

The dubbing of movies and ads is one of the most difficult areas for AI. Non-factual content requires emotional range, tonality, inflection, and a variety of voices which AI can’t yet provide.

Dubbing tools, however, are becoming more sophisticated, explained Heidi Boisvert, Assistant Professor of AI and the Arts at the University of Florida.

“Software is going to be increasingly explicit, posing real threats to certain jobs,” Boisvert said. “However, humans will never be excluded from the process completely.”

This sentiment is echoed by software developers. Anton Dvorkovich, founder of Dubformer, an AI dubbing platform, is convinced that actors and sound engineers will remain in demand.

“Voiceovers in movies immerse viewers in a particular atmosphere, evoking their emotions and experiences through sound,” he said. “It’s a creative process.” Crucially, he argued, AI will never stop making mistakes.

“Only humans can guarantee 100% quality,” said Dvorkovich, whose startup is now using a voiceover method that combines the creativity of actors and AI. Some voiceover professionals are also experimenting with AI to enhance their abilities.

Christopher Weiss records studio-quality commercials on his iPhone using STS. “I can add a character, change my age, accent or sex. I can do pretty much any voice needed by recording a quick audio clip on my phone at a traffic light.”

Weiss believes that AI will have a serious impact on the number of voice actors needed for dubbing and other projects. But it will also open up opportunities for people who are adept in both the creative and technological aspects of voice acting.

What’s next for technology?

As AI continues to advance, it will be able to duplicate humans in the broadcast media industry – a prospect that could be just 3-5 years away, said Dustin Gallegos, founder of Kmeleon, a Miami-based generative AI consultancy.

So to what degree is AI-produced content going to be in demand? Gallegos envisions a world in which there will be both human and AI actors, “including licensed digital replicas like George Clooney-2050.” “Probably, pricing will also play a role, with a subscription to an ‘AI Netflix’ costing a fraction of a traditional streaming service,” he said.

As AI becomes almost indistinguishable from man-made content, human talent and voice will become a more valuable commodity. It will also be harder for us to detect the difference, for a number of reasons – one of which is what is known as “generational shift.” As we age, our own sensory modalities become increasingly limited. “When we can’t make sense of something, our brain is filling the gaps,” said Boisvert.

For that reason, older audiences will perceive AI as human-generated content. Meanwhile, younger generations will simply have different tastes, with lower expectations for naturalness.

“Now, if a movie doesn’t sound human-like, people don’t want to see it,” said Gallegos. “But, eventually, everyone will get used to AI-generated content.”

The researchers are sure about one thing: some limitations will never be broken. Boisvert highlights a fundamental aspect of human creativity: our bodies are in “constant negotiation with the environment” – what she describes as a “multi-sensory engagement” that allows us to constantly create a response. The upshot? AI might remain forever “retarded” in the broadcast media.

Author

Victoria Zavyalova

Victoria Zavyalova is a media professional and freelance journalist, currently the publisher of The Vertical, a publication dedicated to international tech entrepreneurship. She has worked with both artists and tech companies, bringing out the best of both worlds. With over 15 years of experience in media across the U.S. and Europe, Victoria mentors and advises VC funds, startups, and innovative organizations.

View all posts

Victoria Zavyalova 29 May 2024

4 minutes read

Inflection and tonality: 12 months to AI disruption

Dubbing: The last rubicon

What’s next for technology?

Author

Related Articles

Own Your Edge Control your AI

The AI-Driven Pursuit of Comfort and Focus in Modern Lifestyles

How AI Is Helping Consumer Brands Redefine Celebration and Relaxation

From Optics to AI: How Network Engineers Are Redefining Digital Infrastructure