It’s looking like 2022 will be a defining year for synthetic data.
With companies keen to press on with innovation in AI networks, against the backdrop of a still uncertain picture regarding Covid-19, and with changing attitudes towards the way big technology companies handle private information, privacy-complaint, ready-annotated data is going to grow and grow in popularity over the next 12 months.
At Mindtech Global, we’re already starting to see the trends that’ll turn this growing acceptance of synthetic data into a full-blown revolution. I’ve pinpointed three of these trends below.
- A changing world means updating the data AI networks are built on
Our world changes constantly. Whether that’s new rules on social distancing or a new design on your favourite cereal box—we’re greeted by newness every day.
Beautiful as this all sounds, it’s a nightmare for AI networks and those in charge of training them. Every new addition to a location or changes to everyday scenarios helps date the AI operating within them very quickly.
Developers will learn, going into 2022, that creating a ‘timeless’ prototype is the easy part. Harder is the production stage, where models require constant maintenance to keep their intelligence relevant, and developers are forced to constantly be in a mode of reacting to change.
And with 2022 no doubt set to bring with it more significant changes to everyday life, network developers will need to go out and gather and annotate the photos to plug data gaps—that’s if the photos exist. And if companies don’t mind their highly skilled and sought-after machine learning engineers spending their time on what’s really a sophisticated filing job.
Or, instead, we’ll see companies creating their own synthetic images with real-world likeness and extracting data to meet the demands of our perpetually forward moving world.
2. An appreciation of quality and quantity
As our understanding of what AI networks are capable of continues to grow, so do our expectations of the quality of the work they do.
There’s a raft of legislation on the horizon to govern against so-called ‘faulty networks’—networks that are biased or limited in their necessary understanding of the diversity of the world around them.
Companies will soon be responsible for the consequences of these blind spots. There’s no more just putting an AI solution out in the wild and hoping it works. So I predict companies are going to be more focused on ensuring the robustness of their network before launch.
This will be achieved through more testing of corner and edge cases; those awkward and dangerous situations that, although rare, have the potential to compromise the safety of a location or individuals.
The rarity and danger of these cases make data on them scarce—a gap that will be plugged in years to come through simulations that produce synthetic data of the same nature.
That’s what our new Smart Home Application Pack aims to achieve. Released at this month’s CES, customers developing AI vision systems will be able to use our Chameleon platform to quickly build unlimited scenes and scenarios to train Smart Home AIs to improve home security, home automation, and home safety.
3. The age of hyperspeed meets the need for innovation
We’re in the age of hyperspeed. Our expectations of technology, of life, have increased to the extent that everything that needs to be done faster than ever before.
This includes inventing things that have never even existed before. The way our cities, offices, and homes changed through lockdown is an example of the speed at which technology needs to adapt to answer new and unforeseen challenges.
That’s an issue with AI networks reliant on real-world data. Innovation at hyperspeed is an impossibility when it takes a machine learning engineer 20 weeks to gather and annotate the 100,000 real-world images required to train a visual AI system to see and understand the world as a human.
That’s even if the prerequisite images exist or are of high enough quality—see Facebook/Meta deleting its database of Faceprints because of regulatory issues. Huge, real-world datasets don’t exactly lend themselves to innovation anyway. Multiple companies using the same datasets drives down the uniqueness of their AI networks, while large corporations restrict access to their owned data because it hands them a competitive advantage anyway.
In 2022, I expect companies growing tired of slow moving projects using real-world data, and the scarcity of real-world data itself, to level the playing field themselves with a pivot to synthetic sources.