
Googleโs recent decision toย cut the num=100 search parameterย from Google Search might have seemed likeย a fairly innocuousย change, but from the perspective of AI developers it was a crushing move. It has severely hobbled their ability to use the public internet as a source of training data.ย ย
AI systems are extremely reliant on indexed search results, and by slashing visibility to just 10 results instead of 100, Google has made itย almost impossibleย for them to carry out a deeper analysis of the web.ย Itโsย a decision that worsens the alreadyย crippling shortageย ofย high qualityย data for AI training.ย ย
Developers are struggling with limits imposed on the publicly available content they can access. This was highlighted in a 2024 research paper by Data Provenance, whichย analyzed 14,000 websitesย to try and gain insight into the restrictions imposed to limit access to AI models. The authors concluded that in the last 12 months there has been a โrapid crescendo of data restrictionsโ, with content creators actively limiting the ability of algorithms to access their web pages.ย ย
Much ofย whatโsย left of the public internet has already been scraped repeatedly and fed into todayโs top AI models anyway, and the lack of new data is forcing developers to rethink their AI training strategies. Some have switched to using condensed, domain-specific datasets and a more focused approach that involves training models to do one thing specifically, such asย mathsย or image creation, as opposed to building large general-purpose models.ย Itโsย a logical solution. If the datasets are smaller, developersย wonโtย needย nearly asย much.ย ย
Data Tokens Areย Theย Answerย ย ย
The question is, how can we source these narrow datasets and get them into the hands of developers? This is where blockchain comes in. It allows anyone to upload their data to a distributed network and create digital tokens thatย representย ownership of it. The blockchain facilitates seamless transfer of those tokens from creators to developers,ย and alsoย supports โfractionalโ ownership, where dozens of researchers can buy fractions of tokens to access data collectively, lowering their costs.ย ย
Tokenized datasets offer significant advantages. Not only are they easily divisible and tradable,ย theyโreย transparent and publicly verifiable too. Smart contracts can be used to create and enforce revenue streams for data creators, ensuringย theyโreย fairly rewarded. Blockchain-based interactions are driven by supply and demand, which means the richest and most unique datasets will command greater value. In other words, tokenized data can be transformed into a new, investable asset class that drives a fresh wave of innovation in AI.ย ย
The beauty of this concept is that everyone can contribute. For instance, an agricultural modelย thatโsย designed to detect crop disease could be trained on thousands of images supplied by farmers who snap photos of diseased crops using their smartphones. Alternatively, healthcare organizations can donate anonymized images of medical scans to support the development of diagnostic AI models.ย ย
Withย blockchainย we can support two key functions that are necessary for decentralized data markets to thrive. The value of data depends on howย accurateย and reliable it is, and the transparency of blockchain makes it possible for anyone to verify its origins and quality. Users whoย submitย lots ofย good qualityย data will increase theirย reputationsย over time because every dataset they create can be traced back to them. Communities can play a part too, with individuals working to review datasets for their quality, earning rewards based on their honesty.ย ย
Cryptocurrency is borderless, which means tokenized data can be inclusive of contributors from anywhere in the world. Thisย isnโtย possible with fiat, where high fees and a lack of infrastructure make it impossible for many toย participate. So long as someone has a smartphone with an internet connection, they can send and receive micropayments instantly, without a bank account, giving everyone the opportunity toย participateย in the data economy. This means more diverse datasets that originate from every cornerย of the globe, reducing bias in AI outputs.ย ย
Rewarding The Biggest Contributorsย
With crypto-based payments and foolproof verification, we have the foundation in place to create thriving decentralized data markets that willย operateย according to standard supply and demand principles.ย ย
Consider the example ofย agricultureย AI models that can detect crop disease. A farmer from Malawi can play a role in its development by uploading photos of an infected maize crop. The farmerโs images will be tokenized and verified, contributing to a global AI data supply chainย thatโsย coordinated by cryptographic protocols and community governance. The quality of those images willย determineย their value, andย consequently, theย amountย of rewards the farmer earns. When AI models access those images to process a prompt, the interaction will be recorded on theย blockchainย and smart contracts will automatically send a micropayment to the farmer. The more often that model is used, the more it will query that dataset, increasing the rewards the farmer can earn.ย ย ย
For AI startups, this is beneficial because theyย wonโtย have to pay enormous amounts of cash upfront to access training data. Instead,ย theyโllย pay as they grow, once the revenue starts flowing in.ย Itโsย easy to envisage how this ecosystem might expand organically over time. The rewards for providing data will attract contributors seeking an income.ย Theyโllย compete to provide higher quality data, increasing the volume available. This will entice more developersย whoโreย hungry for data to increase the sophistication of their models. As adoption of those models increases, so does the value flowing back to the data providers, making it even more lucrative.ย
A Data Economyย Forย Allย
The AI industry is growing likeย wildfireย and the effects of the data shortage are already being felt, with websites limiting access and content creatorsย firing off lawsuitsย against transgressors likeย thereโsย no tomorrow. Developers are desperate for an alternative to scraping the web, and decentralized data markets are an enticing solution.ย ย
Likely,ย blockchainย wonโtย be the only fix to AIโs data conundrum. Thereโs merit to other ideas around synthetic data and the creation of data consortiums, where companies shareย private dataย with their peers with strict limits on how it can be used. But decentralized data economies areย perhaps theย most romantic and practical, with their potential toย utilizeย existing infrastructure and incentivize everyone toย participateย in the AI revolution.ย ย ย



