The case for embedding audit trails in AI systems before scaling

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more

Editor’s note: Emilia will lead an editorial roundtable on this topic at VB Transform this month. Register today.

Orchestration frameworks for AI services serve multiple functions for enterprises. They not only set out how applications or agents flow together, but they should also let administrators manage workflows and agents and audit their systems.

As enterprises begin to scale their AI services and put these into production, building a manageable, traceable, auditable and robust pipeline ensures their agents run exactly as they’re supposed to. Without these controls, organizations may not be aware of what is happening in their AI systems and may only discover the issue too late, when something goes wrong or they fail to comply with regulations.

Kevin Kiley, president of enterprise orchestration company Airia, told VentureBeat in an interview that frameworks must include auditability and traceability.

“It’s critical to have that observability and be able to go back to the audit log and show what information was provided at what point again,” Kiley said. “You have to know if it was a bad actor, or an internal employee who wasn’t aware they were sharing information or if it was a hallucination. You need a record of that.”

Ideally, robustness and audit trails should be built into AI systems at a very early stage. Understanding the potential risks of a new AI application or agent and ensuring they continue to perform to standards before deployment would help ease concerns around putting AI into production.

However, organizations did not initially design their systems with traceability and auditability in mind. Many AI pilot programs began life as experiments started without an orchestration layer or an audit trail.

The big question enterprises now face is how to manage all the agents and applications, ensure their pipelines remain robust and, if something goes wrong, they know what went wrong and monitor AI performance.

Choosing the right method

Before building any AI application, however, experts said organizations need to take stock of their data. If a company knows which data they’re okay with AI systems to access and which data they fine-tuned a model with, they have that baseline to compare long-term performance with.

“When you run some of those AI systems, it’s more about, what kind of data can I validate that my system’s actually running properly or not?” Yrieix Garnier, vice president of products at DataDog, told VentureBeat in an interview. “That’s very hard to actually do, to understand that I have the right system of reference to validate AI solutions.”

Once the organization identifies and locates its data, it needs to establish dataset versioning — essentially assigning a timestamp or version number — to make experiments reproducible and understand what the model has changed. These datasets and models, any applications that use these specific models or agents, authorized users and the baseline runtime numbers can be loaded into either the orchestration or observability platform.

Just like when choosing foundation models to build with, orchestration teams need to consider transparency and openness. While some closed-source orchestration systems have numerous advantages, more open-source platforms could also offer benefits that some enterprises value, such as increased visibility into decision-making systems.

Open-source platforms like MLFlow, LangChain and Grafana provide agents and models with granular and flexible instructions and monitoring. Enterprises can choose to develop their AI pipeline through a single, end-to-end platform, such as DataDog, or utilize various interconnected tools from AWS.

Another consideration for enterprises is to plug in a system that maps agents and application responses to compliance tools or responsible AI policies. AWS and Microsoft both offer services that track AI tools and how closely they adhere to guardrails and other policies set by the user.

Kiley said one consideration for enterprises when building these reliable pipelines revolves around choosing a more transparent system. For Kiley, not having any visibility into how AI systems work won’t work.

“Regardless of what the use case or even the industry is, you’re going to have those situations where you have to have flexibility, and a closed system is not going to work. There are providers out there that’ve great tools, but it’s sort of a black box. I don’t know how it’s arriving at these decisions. I don’t have the ability to intercept or interject at points where I might want to,” he said.

Join the conversation at VB Transform

I’ll be leading an editorial roundtable at VB Transform 2025 in San Francisco, June 24-25, called “Best practices to build orchestration frameworks for agentic AI,” and I’d love to have you join the conversation. Register today.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

Source link