It’s been 12 years since Marc Andreessen said software is eating the world. APIs have since eaten software, and data has eaten APIs. Now data is the fuel of business and AI systems, providing value, insight, and innovation, so it’s critical to ensure that your business is running on trustworthy data from verified sources. But that entails building trust in the data supply chain.
Businesses have supply chains for all sorts of things. Software supply chains have gotten a lot of attention due to the Kaseya attack, the SUNBURST attack that affected SolarWinds and others, and the catastrophic bug inside Log4j that opened more than a third of the world’s largest enterprises to cyberattacks overnight.
So, if there’s a supply chain in code, how big is the supply chain in data? Answer: It’s enormous and growing every day due to supply chain evolution and widespread business adoption of AI.
Every time you receive an email or document – or data flows into your software, your database, or your AI models – you need to assess whether you can trust that data or if that data will pose a risk to your organization. That’s making the cyber supply chain harder and harder to manage.
The BBC recently hired 60 people to validate digital content in an effort to maintain trust with its audience in a world in which “audiences are constantly bombarded with mis- and disinformation, and with fake images, including those generated by AI.”
However not everyone can afford to hire 60 people to look after data integrity in their businesses. Even if they could, hiring people to address data integrity amid the growing mountain of data simply does not scale.
Yet it’s important to take steps to avoid using data if you don’t know where it came from. A look at how a man stole $122 million from Google and Facebook simply by asking for it provides a down-to-earth example of why: there was no sophisticated code hack here, this man used fake invoices to extract payments from these tech-savvy companies.
He eventually got caught and went to prison, but he was successful for a while and could have gotten away with it. Those invoices – electronically sent and digitally stored – are a small example of the data that runs the business today.
This may sound like a one-off, but it is not. Different versions of this kind of event occur hundreds of times a day at law offices, local banks, and other small companies. Hackers break into email and other systems, change numbers in PDFs or other documents, and the money is paid. And this kind of thing is becoming increasingly easy to do.
You can use generative AI to make a fake picture of the Pope, which is clearly fun, but you could also make a convincing fake of some other person or place for more nefarious purposes. You can also use the same techniques to create a convincing invoice from a fake company with just a press of a button.
Clearly, times have changed. The world today is data-driven, multiparty, and highly connected. Things operate at lightning speed, and generative AI gives bad actors the upper hand and a speed and scale to generate and fire digital ammunition that we haven’t seen before. And while spending on traditional cybersecurity is at an all-time high, so are supply chain attacks.
So, the traditional and manually-intensive IT security approaches that relied on erecting silos and perimeters and using encryption to keep secrets in and other people out no longer work.
We need a better model of trust – a model that is powered by provenance and authenticity. That way consumers of data within businesses will know if data is trustworthy, and the people and organizations in the data supply chain that produce data can prove that data is trustworthy.
You are likely familiar with authenticity, but you may not know much about data provenance. The basic idea of data provenance is this: don’t use data if you don’t know where it came from.
Leading public and private sector organizations understand the importance of provenance. In fact, provenance is at the center of Microsoft’s AI safety mandate. The National Cyber Security Centre (NCSC) in the U.K. and the National Institute of Standards and Technology (NIST) are concerned with using provenance as a core component of trust. And the Supply Chain Integrity, Transparency and Trust (SCITT) working group within the Internet Engineering Task Force (IEFT) has been building on an architecture for trustworthy and transparent digital supply chains.
As the draft standards of IETF SCITT layout, provenance requires three critical things.
One, you need strong identification of the source of data, and you need to know that the data you have received is the same data that was sent. This enables you to look out for data corruption, tampering, and modification and prevents fraud in the data supply chain.
Two, you need immutable provenance records, so organizations in your supply chain can’t rewrite history and take back a statement after they have made it. This is valuable because when something goes wrong, communications between supply chain partners tend to shut down, which makes it hard to fix the problem. But if you have an immutable record, you can keep it and check it forever.
Three, you need to prevent equivocation: having split histories. You don’t want to get into a situation in which a company tells one thing to one person, such as the organization consuming the data, and another thing to another person, such as its own auditor. In other words, you need to build an appropriate level of transparency for the supply chain by establishing a single source of truth for each important fact or statement.
Moving beyond traditional IT security and embracing a data-centric, verify-then-trust model based on transparency is possible today. You can now instantly prove who did what, and when, for any digital asset. With that visibility, you can decide with confidence what data to use.