
AI will inevitably rise in importance as a tool for organizations to use for conducting penetration testing (pen testing). But so far, the tools have not yet reached the level of maturity necessary for widespread application in this domain. Consequently, pen testing practitioners should approach AI with caution.
To an extent, AI tools already have proven themselves to provide limited value as a tool for pen testing. For example, the area in which it has demonstrated the most value so far has been as a reporting tool for pen testers to help translate the results of tests into a consumable format. Traditionally, it can take up to a week for pen testers to generate a detailed report. With AI, pen testers have seen excellent value from AI as a tool for producing written reports, presentations, visuals and charts on results of tests – especially after the tools have learned the tester’s unique tone and the proper context.
Most of the benefits of AI in these scenarios we’ve seen to date can be measured in terms of efficiency and saved time – not necessarily in the quality of reports. In the aforementioned example, AI won’t necessarily produce a more eloquent report than a pen tester with decent writing skills. It’s more a matter of how long it takes the average human tester to create these reports, where for AI, it’s nearly instantaneous. AI is also useful in enabling concurrent analyses of findings to accelerate testing time to results.
But despite its utility in limited aspects of pen testing, other aspects of pen testing currently remain out of the reach of AI tools. In many cases, AI tools for pen testing have to date demonstrated little, if any, benefit beyond those delivered by traditional pen testing tools. For example, a client recently performed an AI pen test done against their Azure Active Directory, which was unable to find anything that hadn’t already been identified through traditional vulnerability scanners.
The most limiting aspect of AI for pen testing is the potential for it to introduce risk when pen testers attempt to use it for active exploitation. When pen testers determine whether or not to exploit a potential vulnerability, we apply business logic – taking into consideration issues like which applications may be affected by the exploitation, where the vulnerability exists on the network, and what would happen if the system or network went down. Today’s AI solutions aren’t equipped to make those decisions; rather, they are more likely to identify a vulnerability and just go after it.
For that reason, AI tools are not yet viable for applications such as operational technology (OT) or industrial control systems where the potential of failure could have severe consequences. In industries such as energy utilities, manufacturing and healthcare, we can envision the possibility that applying AI to pen testing could have catastrophic results.
AI tools also show limitations around practicality in that we’ve seen cases in which they prioritize things that are not important to the enterprise but have good “textbook value.” Consequently, the tools often over categorize or increase the severity levels of certain types of criteria, resulting in “chasing ghosts” scenarios that create additional work and distraction for testers. We’ve also witnessed occurrences where it has misprioritized entire attack chains resulting in wasted efforts to prove false positives.
Despite concerns regarding the current use of AI, these tools do show a potential for pen testing as tools mature going forward. We might soon see AI models trained to do analyses on pen testing tools, and in some cases, could replace them. The safest future application for AI would be related to external-facing tests, because the internet is already hitting enterprise infrastructures at the maximum rate possible so AI could do little to exacerbate the risk in those scenarios. And as attackers look for holes, AI can learn by observing those exploitations, analyze them and give testers data on possible attack paths for them to consider based on those observations.
With AI in the picture, testers would still have to use their judgment on the risks involved in attacking the infrastructure, the benefits gained, communication with the customer and other tasks. But even then, there would be a difficult balance to strike in terms of limiting risk versus allowing AI to increase risk by conducting the exploitation necessary to determine that the attack path is viable.
In many cases, AI acts like a bulldozer where a scalpel is needed. If you’re a “good guy,” AI can be difficult to use because you have to be cautious of what you knock over when attempting to defend your organization. That isn’t an issue for the “bad guys” who don’t care what they might destroy in order to achieve their objectives.
In the cybersecurity world, we often hear the term “trust but verify.” That concept also applies to AI for pen testing. As with most AI applications that are used correctly, users can ask it questions about all kinds of things but then have to verify that information for themselves. Because we’re not at a point yet where there’s a completely reliable AI model that doesn’t hallucinate, that remains an imperative for pen testers looking to exploit the potential of AI.



