
Securing software code is a complex risk-management exercise that involves understanding your developers’ strengths and weaknesses to determine who is best suited to which tasks, and where security issues are most likely to arise. As Generative AI large language models (LLMs) write more code, managing them requires the same approach. Each model has its own preferred coding style, including having strong points in areas like code creation, and weak ones regarding security blind spots. In short, they each have their own personalities, too.
AI-generated code may be “made by machine”, but taking a cookie-cutter approach to securing that code would fall well short of mitigating the vulnerabilities LLMs can introduce. Organizations need to establish precise security reviews, with human developers anchoring the process to implement effective security controls while also managing the specific coding temperament of each LLM used. AI-generated code must undergo the same personalized risk assessments as code written by human developers.
With Code Creation, LLMs Are People Too
For its latest State of the Code report, Sonar undertook an in-depth analysis of five leading LLMs: Anthropic’s Claude Sonnet 4 and 3.7, OpenAI’s GPT-4o, Meta’s Llama 3.2 90B and the open source OpenCoder-8B. The report details the strengths shared to different degrees among the models tested, such as producing syntactically valid code, demonstrating technical competence and working across different programming languages. It also highlighted common weaknesses, including a glaring lack of security awareness, particularly in leaving software vulnerable to attacks of the highest severity levels, a lack of engineering discipline and a penchant for producing messy code.
Another significant finding relevant to risk management addresses the fact that no LLMs are quite alike; they each have their own quirks, preferences and security blind spots. Much like individual human developers, each has its own style, referred to in the report as a “coding personality”, that can affect how human developers review AI-generated code.
The report identified three primary, measurable traits of the LLMs tested: complexity and communication, verbosity and documentation. A more verbose model, for instance, generates many more lines of code to perform a task than a comparatively taciturn model, which can make code review more involved. Likewise, verbose models tend to produce more complex solutions. They may be necessary for certain tasks, but they require more cognitive effort from code reviewers than a model with a more straightforward approach. The models tested also exhibited a range of communication styles, as evidenced by the amount of documentation they provide to explain their work.
Together, these metrics constitute a personality type, comparable to human developer types, according to the report.
The senior architect (Claude Sonnet 4). Good at building complex, enterprise systems, but all that impressive code could be hiding serious bugs.
The rapid prototyper (OpenCoder 8B). Fast and furious, good at getting projects underway quickly, but with a font of technical debt, which can create challenges later.
The unfulfilled promise (Llama 3.2 90B). Lots of potential, but middle-of-the pack performance, with critical security blind spots.
The efficient generalist (GPT-4o). Neither exceedingly verbose nor concise, it can be used for a variety of jobs and tends to avoid the most serious security issues, but it can be careless, opening the door to mistakes that compromise quality and reliability.
The balanced predecessor (Claude 3.7 Sonnet). Produces highly functional code with excellent documentation but can also produce high-severity vulnerabilities.
Each type has specific strengths and weaknesses, which are good to know when deciding which LLM should be applied to certain projects, just as you might trust a senior architect with a critical application before giving the job to a junior programmer. But it also underscores the importance of providing developers with the skills they need to effectively and efficiently review AI-generated code.
Developers Need the Skills to Manage AI Personalities
Sonar’s findings are echoed in Secure Code Warrior’s own research, in which we examined how 20 LLMs performed on specific code-security tasks and compared them with the performance of human developers at high, medium and low security proficiency levels.
SCW’s Learning Platform gave participants—whether human or AI—challenges in three areas:
Identify. Presented participants with a vulnerable code snippet and asked them to select the correct vulnerability category.
Locate. Gave them a vulnerability category and asked them to select the vulnerable snippet of code that falls into that category.
Fix. Asked them to identify the most effective fix for a specific vulnerability.
The LLMs performed pretty well on straightforward or superficial challenges, such as dealing with injection categories, but they had difficulty with vague or subjective categories such as DoS protection, insufficient logging or misconfigured permissions (the latter a very common—and commonly exploited—vulnerability, according to OWASP tests). The best LLMs were on par with top developers in a range of straightforward tasks but showed a drop in consistency across other tasks, languages and vulnerability categories. The more subjective the problem, such as authentication, the worse the LLMs performed.
Overall, top developers outperformed all of the LLMs, while average developers did not. Developers with mid-level skills were in the middle of the pack, and those with low-level skills were at the bottom.
The results from the two studies clearly demonstrate that LLMs, while efficient at improving code, must be carefully managed within the software development lifecycle (SDLC), and that developers need to be upskilled in security proficiency to ensure that risk can be managed. Humans must be in the loop, but they need the education and skills to ensure that AI-generated code is trustworthy and that technical debt doesn’t pile up in a development environment accelerated by AI assistants. Developers must also be able to prompt AI models in a way that helps generate secure code and also perform competent security reviews of the AI’s output. Understanding an AI model’s coding personality can help, but upskilling developers is essential.
The best way to ensure that software is secure is to stop vulnerabilities early in the SDLC. Whether code is created by developers or LLMs, developer education is the key to implementing a process of secure coding, code review and testing. CISOs can be sure to choose models that are less risky for the tasks at hand and empower developers through education and upskilling. They can close gaps in security knowledge, target technical debt and help mitigate the growing risk within their own codebases.



