The risks of AI-generated code are real — here’s how enterprises can manage the risk

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

Not that long ago, humans wrote almost all application code. But that’s no longer the case: The use of AI tools to write code has expanded dramatically. Some experts, such as Anthropic CEO Dario Amodei, expect that AI will write 90% of all code within the next 6 months.

Against that backdrop, what is the impact for enterprises? Code development practices have traditionally involved various levels of control, oversight and governance to help ensure quality, compliance and security. With AI-developed code, do organizations have the same assurances? Even more importantly, perhaps, organizations must know which models generated their AI code.

Understanding where code comes from is not a new challenge for enterprises. That’s where source code analysis (SCA) tools fit in. Historically, SCA tools have not provide insight into AI, but that’s now changing. Multiple vendors, including Sonar, Endor Labs and Sonatype are now providing different types of insights that can help enterprises with AI-developed code.

“Every customer we talk to now is interested in how they should be responsibly using AI code generators,” Sonar CEO Tariq Shaukat told VentureBeat.

Financial firm suffers one outage a week due to AI-developed code

AI tools are not infallible. Many organizations learned that lesson early on when content development tools provided inaccurate results known as hallucinations.

The same basic lesson applies to AI-developed code. As organizations move from experimental mode into production mode, they have increasingly come to the realization that code is very buggy. Shaukat noted that AI-developed code can also lead to security and reliability issues. The impact is real and it’s also not trivial.

“I had a CTO, for example, of a financial services company about six months ago tell me that they were experiencing an outage a week because of AI generated code,” said Shaukat.

When he asked his customer if he was doing code reviews, the answer was yes. That said, the developers didn’t feel anywhere near as accountable for the code, and were not spending as much time and rigor on it, as they had previously.

The reasons code ends up being buggy, especially for large enterprises, can be variable. One particular common issue, though, is that enterprises often have large code bases that can have complex architectures that an AI tool might not know about. In Shaukat’s view, AI code generators don’t generally deal well with the complexity of larger and more sophisticated code bases.

“Our largest customer analyzes over 2 billion lines of code,” said Shaukat. “You start dealing with those code bases, and they’re much more complex, they have a lot more tech debt and they have a lot of dependencies.”

The challenges of AI developed code

To Mitchell Johnson, chief product development officer at Sonatype, it is also very clear that AI-developed code is here to stay.

Software developers must follow what he calls the engineering Hippocratic Oath. That is, to do no harm to the codebase. This means rigorously reviewing, understanding and validating every line of AI-generated code before committing it — just as developers would do with manually written or open-source code.

“AI is a powerful tool, but it does not replace human judgment when it comes to security, governance and quality,” Johnson told VentureBeat.

The biggest risks of AI-generated code, according to Johnson, are:

Security risks: AI is trained on massive open-source datasets, often including vulnerable or malicious code. If unchecked, it can introduce security flaws into the software supply chain.
Blind trust: Developers, especially less experienced ones, may assume AI-generated code is correct and secure without proper validation, leading to unchecked vulnerabilities.
Compliance and context gaps: AI lacks awareness of business logic, security policies and legal requirements, making compliance and performance trade-offs risky.
Governance challenges: AI-generated code can sprawl without oversight. Organizations need automated guardrails to track, audit and secure AI-created code at scale.

“Despite these risks, speed and security don’t have to be a trade-off, said Johnson. “With the right tools, automation and data-driven governance, organizations can harness AI safely — accelerating innovation while ensuring security and compliance.”

Models matter: Identifying open source model risk for code development

There are a variety of models organizations are using to generate code. Anthopic Claude 3.7, for example, is a particularly powerful option. Google Code Assist, OpenAI’s o3 and GPT-4o models are also viable choices.

Then there’s open source. Vendors such as Meta and Qodo offer open-source models, and there is a seemingly endless array of options available on HuggingFace. Karl Mattson, Endor Labs CISO, warned that these models pose security challenges that many enterprises aren’t prepared for.

“The systematic risk is the use of open source LLMs,” Mattson told VentureBeat. “Developers using open-source models are creating a whole new suite of problems. They’re introducing into their code base using sort of unvetted or unevaluated, unproven models.”

Unlike commercial offerings from companies like Anthropic or OpenAI, which Mattson describes as having “substantially high quality security and governance programs,” open-source models from repositories like Hugging Face can vary dramatically in quality and security posture. Mattson emphasized that rather than trying to ban the use of open-source models for code generation, organizations should understand the potential risks and choose appropriately.

Endor Labs can help organizations detect when open-source AI models, particularly from Hugging Face, are being used in code repositories. The company’s technology also evaluates these models across 10 attributes of risk including operational security, ownership, utilization and update frequency to establish a risk baseline.

Specialized detection technologies emerge

To deal with emerging challenges, SCA vendors have released a number of different capabilities.

For instance, Sonar has developed an AI code assurance capability that can identify code patterns unique to machine generation. The system can detect when code was likely AI-generated, even without direct integration with the coding assistant. Sonar then applies specialized scrutiny to those sections, looking for hallucinated dependencies and architectural issues that wouldn’t appear in human-written code.

Endor Labs and Sonatype take a different technical approach, focusing on model provenance. Sonatype’s platform can be used to identify, track and govern AI models alongside their software components. Endor Labs can also identify when open-source AI models are being used in code repositories and assess the potential risk.

When implementing AI-generated code in enterprise environments, organizations need structured approaches to mitigate risks while maximizing benefits.

There are several key best practices that enterprises should consider, including:

Implement rigorous verification processes: Shaukat recommends that organizations have a rigorous process around understanding where code generators are used in specific part of the code base. This is necessary to ensure the right level of accountability and scrutiny of generated code.
Recognize AI’s limitations with complex codebases: While AI-generated code can easily handle simple scripts, it can sometimes be somewhat limited when it comes to complex code bases that have a lot of dependencies.
Understand the unique issues in AI-generated code: Shaukat noted that while AI avoids common syntax errors, it tends to create more serious architectural problems through hallucinations. Code hallucinations can include making up a variable name or a library that doesn’t actually exist.
Require developer accountability: Johnson emphasizes that AI-generated code is not inherently secure. Developers must review, understand and validate every line before committing it.
Streamline AI approval: Johnson also warns of the risk of shadow AI, or uncontrolled use of AI tools. Many organizations either ban AI outright (which employees ignore) or create approval processes so complex that employees bypass them. Instead, he suggests businesses create a clear, efficient framework to evaluate and greenlight AI tools, ensuring safe adoption without unnecessary roadblocks.

What this means for enterprises

The risk of Shadow AI code development is real.

The volume of code that organizations can produce with AI assistance is dramatically increasing and could soon comprise the majority of all code.

The stakes are particularly high for complex enterprise applications where a single hallucinated dependency can cause catastrophic failures. For organizations looking to adopt AI coding tools while maintaining reliability, implementing specialized code analysis tools is rapidly shifting from optional to essential.

“If you’re allowing AI-generated code in production without specialized detection and validation, you’re essentially flying blind,” Mattson warned. “The types of failures we’re seeing aren’t just bugs — they’re architectural failures that can bring down entire systems.”

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

Source link