
AI companies often promise safety, but is that claim holding up? Dmitrii Volkov, Research Lead at Palisade, is one of the few people putting those claims to the test. Under his leadership, Palisade runs open, reproducible experiments that expose how today’s most advanced AI systems can fail — or be exploited. The team’s work has been featured in top media outlets, cited in the U.S. Senate, viewed millions of times online, and even reposted by Elon Musk.
We spoke with Dmitrii about the real risks behind AI, why transparency matters, and how Palisade is helping shift the global conversation around safety and accountability.
Dmitrii, I would like to begin this interview by asking what drew you to this field in the first place. Why did you choose to focus on AI risk research rather than advancing innovation through product development?
There are very few organizations studying AI risks, so we have enormous freedom to explore. It’s far more engaging than building a product by following standard business playbooks. Our field is different — it demands constant experimentation, interdisciplinary thinking, and entirely new approaches. We regularly uncover insights that feel completely original — and that’s incredibly motivating.
AI safety is also a pressing challenge that won’t resolve on its own. I know I can have a real impact here. Our research enables us to work directly with major players like OpenAI, Meta, and Google DeepMind — helping make their systems more transparent. At the same time, we help policymakers assess AI risks based on experiments, not PR spin.
How did Palisade come about, and how is your organization fundamentally different from product-driven AI companies?
Palisade was founded by cybersecurity expert Jeffrey Ladish. It’s one of the few nonprofit organizations dedicated to AI safety.
Unlike product-based companies, we operate more as a research lab or think tank. Our mission is to surface and demonstrate AI-related risks, and make these findings public. Based on our research, we publish articles — and notably, some of them have drawn attention from figures like Elon Musk and Turing Award winner Yoshua Bengio, a pioneer in deep learning.
We absolutely see AI as a force for progress and economic growth. But like any powerful technology, it comes with risks. It’s a bit like nuclear energy: if managed carefully, the benefits are immense. However, if something goes wrong, the consequences can be serious. That’s why with today’s rapid AI development, we find it critical to consider the risks.
AI lobbyists have their own agenda. They may claim their technologies are entirely safe, but their real priority is often cash — not safety. In some cases they overlook the risks. We aim to bring balance to the conversation around AI by highlighting the aspects no one else is talking about.
For example, in one of our recent projects, we ran a hacking competition where both humans and AI models competed. In the very first round, the AI solved every challenge but one — and at the same pace as professional teams of five. They outperformed most of the human participants. We published a paper on the event and plan to run similar experiments in the future to measure how AI might amplify hacking threats.
We also test AI models released by major companies. For instance, Meta released its LLaMA model, describing it as safe and secure. But we showed that its guardrails can be completely eliminated in just under 30 minutes. As a result, this information reached the U.S. Senate — and Zuckerberg was even asked why his model was so easy to exploit.
You lead Palisade’s global research team. How is your team structured, and how do you ensure transparency and credibility in your research culture?
Palisade is made up of two teams — the California Team, led by our founder Jeffrey in the U.S., and the Global Team, which I built from the ground up. The California Team handles communications and government relations while my team focuses on the technical side — research, experiments, presentations, and writing.
Our workflow begins with an idea. From there we carry out the technical work, conduct experiments. Based on the results, we write an article or prepare a demo. Then, we publish the output online, and pass it to the California Team for further distribution. I oversee every stage of this process — except for the California team’s briefings.
Since joining Palisade as a founding engineer, I’ve grown to lead a team of 10 people. Today I’m taking on more product-owner responsibilities, finding ideas that make sense to the public and policymakers shaping AI regulations.
At the same time, in pushing their agendas, some players make bold claims or fund research designed to promote their products. We strive for full transparency and aim to leave no doubt about the credibility of our work. Palisade isn’t about pressuring stakeholders into regulation — it’s about fostering thoughtful dialogue around risk.
Our main goal is to find solutions that are broadly acceptable. And I believe transparency works better than slogans or sensationalism. Articles we publish are accompanied by full experiment logs written by an independent third party. Anyone can verify or reproduce our results.
I’m directly involved in maintaining the transparency of our work. I also developed standards for our research, inspired by academic best practices from other disciplines. For example, before we publish an article, I do a quality assurance check on our data. If an experiment cannot be reproduced, we can’t publish it.
This high bar helped us earn trust from leading global media outlets — our findings have been featured in TIME, MIT Technology Review, Computerworld, and more. That sets a clear standard for us: we can no longer publish anything that doesn’t meet this level.
Thanks to our values, we attract researchers who are genuinely interested in advancing science and technology. That’s what has allowed me to build a strong, motivated team that is passionate about conducting high-quality research.
Your work is regularly cited by policymakers and the media, with features in Senate hearings and TV highlights. How have you built such effective communication with policymakers and influenced decision-making?
Many policymakers today are seriously thinking about AI safety. I see a growing demand for a more balanced conversation — one that includes not only tech lobbyists but also independent researchers like us.
Beyond our AI experience, we bring a unique strength: deep expertise in cybersecurity. It’s a complex field with relatively few qualified experts. Our California Team also has strong ties to policymakers, which enhances the value and reach of our research.
We primarily speak to two audiences: technical experts and policymakers. It’s generally easier to connect with the technical community — they can dive into our experiments and often reproduce them, so they tend to take us seriously right away. Policymakers, by contrast, are more challenging to reach, since their expertise lies in other areas. That’s where we benefit from intermediaries — people deeply engaged in AI who work directly with legislators.
One such ally is Max Tegmark, the MIT professor and well-known AI safety advocate. He regularly speaks with senators and often takes the initiative to share our work with them. Turing Award winner Yoshua Bengio is also among those who support our work and help amplify it in key discussions.
Some organizations also serve in that intermediary role. We collaborate with top research institutions — some of which are closely connected to governments but still committed to serious, complex research. These institutions act as bridges between us and policymakers.
For example, one of our most significant projects was a collaboration with the RAND Corporation. I built a relationship with them through consulting, and RAND hosted numerous briefings based on our findings. We also drew the attention of the UK’s AISI — the country’s leading institute for AI safety — and completed a follow-up project with them based on our earlier research.
One of your most widely recognized projects was the “hacking” of a chess-playing AI. What is specification gaming, and why did this project become a turning point in the AI risk debate?
Concerns about AI safety have existed for a long time — especially around the idea of misalignment, where an AI acts in pursuit of its own goals the user did not intend. We were aware of the issue, but hadn’t been able to demonstrate it clearly using real-world models. That changed in 2024, when OpenAI released its first reasoning LLM, o1. This model could not only generate text, but also reason toward outcomes — making it more likely to do “whatever it takes” to reach a goal.
In the o1 system card, one case stood out: during a cybersecurity capabilities test, there was an error in the test setup itself. The error that should have caused it to fail. But the LLM scanned the network, accessed the test controls, corrected the flaw, and completed the task successfully. Many found this a concerning overreach: the AI was supposed to stay contained, not fiddle with admin controls.
I found the concept of AI trying to solve the problem at all costs fascinating, and decided to design a public-facing experiment to explain the concept more clearly. We chose a familiar format: chess. The models were tasked with beating a powerful chess engine. What we observed was impressive: during the experiments, we saw that the most advanced models used manipulative tactics during gameplay — even attempting to infiltrate the opponent’s code to win.
This clearly demonstrated to the public that misalignment is real — AI can indeed follow its own logic, bypassing rules. The results went viral, with our post on X garnering over 16 million views, including reposts from Elon Musk and Yoshua Bengio.
Why is transparency — open publications, GitHub logs, reproducibility — more important in AI safety than closed corporate tests? And can public debate hinder progress?
It’s true that public discourse can attract voices with limited expertise, which sometimes makes professional discussion more difficult. Even so, I believe frontier labs should be more transparent such that policymakers and the public are informed about the risks. A company’s goal is to make money — and that creates a natural incentive to downplay or hide issues during model testing. However, that approach isn’t sustainable. The problems will eventually surface. I’m not saying every internal process should be made public, but the industry needs more transparency.
What’s next? Which research directions are you prioritizing for future work with society and governments?
We’re currently focused on two main tracks: cybersecurity and misalignment. We want to push our AI agent to the top of the global leaderboard in hacking — to clearly show how powerful AI has become in breaching systems. We’re also continuing our work on specification gaming, aiming to uncover more real-world examples of rule-bending AI behaviour.
In addition, we’re preparing new demos — for example, showing how an AI agent could access a computer through a USB charging cable, or how two AI agents could secretly exchange information during a Zoom call while a human remains unaware. I have many more ideas in the pipeline, and I’m excited about bringing them to life in the near future.