AI Business Strategy

Understanding LLM Coding Personalities Is Now Key to Developer Risk Management

By Matias Madou, Ph.D., CTO & Co-Founder, Secure Code Warrior

Securing software code is a complex risk-management exercise that involves understanding your developers’ strengths and weaknesses toย determineย who is best suited to which tasks, and where security issues are most likely to arise. As Generative AI large language models (LLMs) write more code, managing them requires the same approach. Each model has its own preferred coding style, including having strong points in areas like code creation, and weak onesย regardingย security blind spots. In short, they each have their own personalities, too.ย 

AI-generated code may be โ€œmade by machineโ€,ย but taking a cookie-cutter approach to securing that code wouldย fall wellย short of mitigating the vulnerabilities LLMs can introduce. Organizations need toย establishย precise security reviews, with human developers anchoring the process to implement effective security controls while also managing the specific coding temperament of each LLM used. AI-generated code must undergo the same personalized risk assessments as code written by human developers.ย 

With Code Creation, LLMs Are People Tooย 

For its latestย State of the Codeย report, Sonar undertook an in-depth analysis of five leading LLMs:ย Anthropic’sย Claude Sonnet 4 and 3.7, OpenAI’s GPT-4o, Meta’s Llama 3.2 90B and the open source OpenCoder-8B. The report details the strengths shared to different degrees among the models tested, such as producing syntactically valid code,ย demonstratingย technicalย competenceย and working across different programming languages. It also highlighted common weaknesses, including a glaring lack of security awareness, particularly in leaving software vulnerable to attacks of the highest severity levels, a lack of engineeringย disciplineย and a penchant for producing messy code.ย ย 

Another significant finding relevant to risk management addresses the fact that no LLMs are quite alike; they each have their own quirks,ย preferencesย and security blind spots. Much like individual human developers, each has its own style, referred to in the report as a โ€œcoding personalityโ€, thatย can affect how human developers review AI-generated code.ย ย 

The reportย identifiedย three primary, measurable traits of the LLMs tested: complexity and communication,ย verbosityย and documentation. A more verbose model, for instance, generates many more lines of code to perform a task than a comparatively taciturn model, which can make code review more involved. Likewise, verbose models tend to produce more complex solutions. They may be necessary for certain tasks, but theyย requireย more cognitive effort from codeย reviewers than a model with a more straightforward approach. The models tested alsoย exhibitedย a range of communication styles, asย evidencedย by the amount of documentation they provide to explain their work.ย 

Together, these metricsย constituteย a personality type, comparable to human developer types, according to the report.ย 

The senior architectย (Claude Sonnet 4). Good at buildingย complex, enterprise systems, but all that impressive code could be hiding serious bugs.ย 

The rapidย prototyperย (OpenCoderย 8B). Fast and furious, good at getting projects underway quickly, but withย a fontย of technical debt, which can create challenges later.ย 

The unfulfilled promiseย (Llama 3.2 90B). Lots of potential, but middle-of-the pack performance, with critical security blind spots.ย 

The efficient generalistย (GPT-4o). Neither exceedingly verbose nor concise, it can be used for a variety of jobs and tends to avoid the most serious security issues, but it can be careless, opening the door to mistakes that compromise quality and reliability.ย 

The balanced predecessorย (Claude 3.7 Sonnet). Produces highly functional code with excellent documentation but can also produce high-severity vulnerabilities.ย ย 

Each type has specific strengths and weaknesses, which are good to know when deciding which LLM should be applied to certain projects, just as you might trust a senior architect with a critical application before giving the job to a junior programmer. But it also underscores the importance of providing developers with the skills they need toย effectively and efficiently review AI-generated code.ย 

Developers Need the Skills to Manage AI Personalitiesย 

Sonarโ€™s findings are echoed in Secure Code Warriorโ€™s own research, in which we examined how 20 LLMs performed on specific code-security tasks and compared them with the performance of human developers at high,ย mediumย and low securityย proficiencyย levels.ย ย 

SCWโ€™s Learning Platform gave participantsโ€”whether human or AIโ€”challenges in three areas:ย 

Identify.ย Presented participants with a vulnerable code snippet and asked them to select the correct vulnerability category.ย 

Locate.ย Gave them a vulnerability category andย askedย them to select the vulnerable snippet of code that falls into that category.ย 

Fix.ย Asked them toย identifyย the most effective fix for a specific vulnerability.ย 

The LLMs performedย pretty wellย on straightforward or superficial challenges, such as dealing with injection categories, but they had difficulty with vague or subjective categories such as DoS protection, insufficientย loggingย or misconfigured permissions (the latterย a very commonโ€”and commonly exploitedโ€”vulnerability, according toย OWASP tests). The best LLMs were on par with top developers in a range of straightforward tasks but showed a drop in consistency acrossย other tasks,ย languagesย and vulnerability categories. The more subjective the problem, such as authentication, the worse the LLMs performed.ย ย 

Overall, top developers outperformedย all ofย the LLMs, while average developers did not. Developers with mid-level skills were in the middle of the pack, and those with low-level skills were at the bottom.ย ย 

The results from the two studies clearlyย demonstrateย that LLMs, while efficient at improving code, must be carefully managed within the software development lifecycle (SDLC), and that developers need to be upskilled in securityย proficiencyย to ensure that risk can be managed. Humans must be in the loop, but they need the education and skills to ensure that AI-generated code isย trustworthyย and that technical debtย doesnโ€™tย pile up in a development environment accelerated by AI assistants. Developers must also be able toย promptย AI models in a way that helps generate secure codeย and alsoย perform competent security reviews of the AIโ€™s output. Understanding an AI modelโ€™s coding personality can help, but upskilling developersย isย essential.ย 

The best way to ensure that software is secure is to stop vulnerabilities early in the SDLC. Whether code is created by developers or LLMs, developer education is the key to implementing a process of secure coding, codeย reviewย and testing. CISOs can be sure to choose models that are less risky for the tasks at hand and empower developers through education and upskilling. They can close gaps in security knowledge, target technicalย debtย and help mitigate the growing risk within their own codebases.ย 

Author

Related Articles

Back to top button