Blog

March 27, 2026

Runtime vs Periodic Certification: Why Annual Audits Fail for AI Agents

Traditional software certification assumes that what you audit is what runs in production. AI agents break that assumption completely. An agent certified as safe today may behave materially differently tomorrow — same code, same deployment, different behaviour. This isn't a bug. It's how agents work.

The Certification Model We Inherited

For twenty years, software compliance has followed a predictable rhythm: audit, certify, operate, re-audit. SOC 2 reviews happen annually. ISO certifications have three-year cycles with surveillance audits. PCI DSS assessments are annual. The entire apparatus assumes a crucial property: determinism. Software version 2.4.1 behaves the same way on the day of the audit as it does six months later.

This model works brilliantly for traditional software. Deploy a version, test it, certify it, run it. The code doesn't change between audits unless someone pushes a new release — and new releases trigger new testing.

Now apply this model to an AI agent. The agent is deployed. It passes the audit. And then, between audits, its behaviour changes. Not because anyone deployed new code. Because that's what agents do.

Five Reasons Agents Drift Between Audits

1. Non-Deterministic Outputs

Ask the same agent the same question twice and you may get different answers. This isn't a defect — it's a design property. Temperature settings, sampling strategies, and context window state all introduce controlled randomness. A financial agent that correctly identifies a fraudulent transaction pattern today might miss a similar pattern tomorrow because its context window contained different recent examples.

Auditors test a representative sample of interactions. But with non-deterministic systems, no sample can predict all future outputs. The audit captures a distribution of behaviours at a point in time. The distribution shifts continuously.

2. Foundation Model Updates

Agent behaviour depends on the underlying foundation model. When Anthropic, OpenAI, or Google updates a model — which happens regularly, sometimes without version bumps — every agent built on that model may behave differently. The deployer didn't change anything. The agent's code is identical. But the outputs have shifted because the substrate changed.

AIUC-1 addresses this with quarterly technical testing, which is more responsive than annual audits. But quarters are still 90 days. A model update on Day 1 means 89 days of potentially altered behaviour before the next assessment.

3. Retrieval and Context Drift

Agents that use RAG (Retrieval-Augmented Generation) base their responses on external knowledge. When the knowledge base is updated — new documents added, old ones modified, embeddings re-indexed — the agent's effective knowledge changes. A healthcare agent certified as providing accurate drug interaction information becomes unreliable the moment its pharmaceutical database includes an incorrect entry.

This isn't a model problem. It's a data problem. And data changes constantly.

4. Behavioural Degradation Over Time

Research published in January 2026 documented a phenomenon that practitioners had long suspected: agent performance degrades with extended use. In controlled studies, agents showed up to 46% behavioural degradation over 500 sustained interactions. Accuracy decreased, hallucination rates increased, and adherence to safety guidelines weakened — gradually, without any obvious trigger.

An annual audit cannot detect gradual degradation. Even quarterly testing may miss a slow decline if each quarter's starting point is "good enough." By the time degradation crosses a compliance threshold, it may have been non-compliant for weeks.

5. Adversarial Evolution

New jailbreak techniques, prompt injection methods, and adversarial attacks emerge weekly. The OWASP MCP Top 10, published earlier this year, catalogued attack classes that didn't exist when many agents were last audited. An agent that was robust against known attacks at audit time may be vulnerable to attacks discovered the following month.

AIUC-1's quarterly adversarial testing across "more than a thousand enterprise risk scenarios" is the most rigorous periodic assessment available. But the attack surface evolves between quarters.

What Periodic Certification Gets Right

None of this means periodic certification is useless. It solves genuine problems that runtime attestation cannot:

Organisational governance: ISO 42001 ensures companies have AI management systems — roles, responsibilities, policies, and risk management processes. No amount of runtime monitoring can substitute for organisational discipline.
Product-level security: AIUC-1's six risk domains — data and privacy, security, safety, reliability, accountability, and society — provide a comprehensive product assessment. Runtime attestation validates behaviour against these domains but doesn't define them.
Third-party accountability: Human auditors bring judgement, context, and the ability to assess qualitative factors like corporate culture and risk appetite. Automated systems can't evaluate whether a company's AI ethics board is effective.
Market trust signals: CSA STAR for AI Level 2, achieved by Microsoft and Zendesk, signals market credibility in ways that automated certificates cannot. Buyers trust human auditors because they understand the implications of getting it wrong.

The standards bodies — ISO, CSA, AIUC — have built something valuable and necessary. The question is not "do we need periodic certification?" The question is "is periodic certification sufficient on its own?"

The Gap: What Happens Between Audits

Between one audit and the next, an AI agent may process millions of interactions, make thousands of autonomous decisions, and access sensitive data across multiple systems. During that period, the organisation operates on an assumption: the agent is still compliant.

For deterministic software, this assumption is safe. For AI agents, it's a leap of faith.

The Core Problem

Traditional certification answers: "Was this agent compliant when we checked?"

The question that matters: "Is this agent compliant right now?"

This gap isn't theoretical. Consider the real-world scenarios:

A customer service agent gradually stops escalating edge cases to human supervisors. No code change triggered this — the model update shifted its confidence calibration. The quarterly audit is two months away.
A financial agent begins recommending products outside its authorised scope after a RAG database update added new product documentation. The scope boundary wasn't re-tested because there was no deployment event.
A healthcare agent's error rate on drug dosage calculations increases by 15% after a foundation model patch. The change is within the model provider's acceptable variation range but violates the deployer's safety threshold. Nobody notices until the next review cycle.

In each case, the agent had a valid certification. In each case, it wasn't operating within its certified parameters.

Continuous Attestation: Filling the Gap

Runtime attestation doesn't replace periodic certification. It fills the gap between audits with continuous evidence.

The model is straightforward:

Instrument the agent. An SDK captures structured evidence during operation — every LLM call, tool invocation, retrieval query, human escalation, error, and configuration change becomes a signed, timestamped event in a hash chain.
Evaluate against frameworks. Evidence is automatically graded against compliance frameworks — EU AI Act Article 12, Singapore MGF, AIUC-1 risk domains, or any custom framework. Each framework defines what "compliant" looks like in terms of observable behaviour.
Issue time-limited certificates. Compliant agents receive a cryptographically signed certificate. Critically, the certificate expires — typically within 24 hours. The agent must re-attest with fresh evidence to maintain its compliance status.
Enable verification. Any party — another agent, a service, a regulator, a customer — can verify the certificate independently. The cryptographic signature proves the certificate was issued by a trusted attestation authority. The expiry proves it reflects recent behaviour.

This creates a fundamentally different compliance dynamic. Instead of "certify once, assume compliance," it's "prove compliance continuously, or lose your certificate."

What NIST Calls This

The concept isn't new to security. NIST has advocated for continuous monitoring since SP 800-137 (Information Security Continuous Monitoring) was published in 2011. The framework explicitly replaces "single point-in-time assessments" with monitoring "at a frequency sufficient to support risk-based security decisions."

A security control assessment and risk determination process, otherwise static between authorizations, is thus transformed into a dynamic process that supports timely risk response actions and cost-effective, ongoing authorizations.
— NIST SP 800-137

NIST SP 800-53 control CA-7 requires "ongoing control assessments" and "ongoing monitoring of system and organization-defined metrics." The NIST AI Risk Management Framework (AI 100-1) extends this principle to AI with its MEASURE function, calling for tools to "analyze, assess, benchmark, and monitor AI risk."

Runtime attestation for AI agents is the natural application of these existing principles to the new reality of non-deterministic, autonomous systems.

The Three-Layer Trust Stack

The mature approach to agent trust will combine all three layers:

Identity (who is this agent?) — solved by NIST's AI Agent Identity initiative, hardware attestation from Yubico, PKI from HID Global, and protocol-level auth in MCP, A2A, and ACP.
Certification (has the organisation been audited?) — solved by AIUC-1, ISO 42001, CSA STAR for AI, and emerging sector-specific standards.
Runtime Attestation (is this agent behaving right now?) — the missing layer that continuous attestation fills.

Each layer answers a question the others cannot. Identity without certification tells you who the agent is but not whether it's been vetted. Certification without runtime attestation tells you the agent was compliant at audit time but not whether it still is. Runtime attestation without identity tells you something is behaving well but not what it is.

The companies and standards bodies building each layer are not competitors. They're building different floors of the same building.

What Comes Next

The NIST AI Agent Standards Initiative, launched in February 2026, is actively convening the industry to address these gaps. Their April listening sessions on barriers to AI adoption in healthcare, finance, and education will surface the specific scenarios where periodic certification falls short.

The NCCoE's concept paper on AI Agent Identity and Authorisation asks the right foundational questions: "what exactly is the agent, who delegated authority to it, what can it do, and how should its actions be logged and constrained?" Runtime attestation answers the follow-up question those foundations enable: "and is it doing what it should?"

We are still early. The standards are being written. The frameworks are being defined. The opportunity — and the responsibility — is to get the architecture right: identity at the base, certification in the middle, and continuous runtime attestation at the top. Not because any single layer is insufficient, but because agents deserve a trust stack as sophisticated as they are.

Start Building the Third Layer

AgentApproved provides runtime attestation across EU AI Act, Singapore MGF, AIUC-1, and more. Add continuous compliance to your trust stack.

See Full Comparison Get API Key