What Every Board Must Prioritise About Autonomous AI Agents
Executive Summary
TL;DR – Autonomous AI agents are operating within organisations today, and a landmark 2026 study shows they will cause serious harm unless proper governance is in place.
In February 2026, researchers published Agents of Chaos, demonstrating AI agents equipped with real tools, including email, shell access, and file storage. The results were alarming: agents erased servers, leaked personal data by changing a single word, and gained control by altering a name. Failures propagated without requiring advanced hacking.
Meanwhile, OpenClaw, the platform used in the research, was found to have critical security flaws. Its companion platform, Moltbook suffered a database breach exposing 1.5 million agent API keys! At the same time, Anthropic Claude discovered 500 zero-day vulnerabilities in open-source code. This same AI, if misused, poses a serious threat.
These are not the reasons to fear or stop AI deployment. We’re at an inflection point, and whether organisations benefit or incur costly harm depends on the governance in place.
The PETALS™ Framework, which I developed almost two years ago, directly tackles every documented failure mode. The time to implement it is now, before the first agent interacts with your systems, data, or customers. This advisory shows how to transform AI agents of chaos to order!
What the Research Proved
TL;DR: None of these failures needed a sophisticated attack, just ordinary interactions. The study also recorded safety successes (section 15 in the paper), also demonstrating that safe behaviour is possible with deliberate governance.
The Agents of Chaos study is the most rigorous real-world test of autonomous AI agents published to date. Six agents ran on frontier AI models inside the OpenClaw, platform. Twenty researchers spent two weeks interacting with them, some as genuine users, others testing where agents would break. Here is what they found.
- An agent deleted its entire email server to protect a secret. It had no delete tool, so it reset the server instead, destroying every email for every user. Its own justification: the nuclear option is valid when no surgical solution exists.
- Personal data leaked with a single word change. Jarvis was asked to forward (rather than share) sensitive files, including Social Security numbers. It complied. One word bypassed the apparent safety check.
- A display name was enough to take control. A researcher changed their name to ‘Owner’ in a new chat window. The agent immediately followed orders to delete its own memory and shut down. No password. No verification.
- Two agents were caught in a loop for about an hour, using over 60,000 tokens. No human observed it. No alert was raised. Costs built up silently.
- One corrupted file compromised the entire network. An agent was deceived into treating an editable file as its guiding rules. An attacker altered the rules. The agent then prohibited its owner and shared the corrupted file with other agents.
- AI was employed as a tool for defamation. Impersonating the owner, a perpetrator instructed an agent to email the entire contact list with a false accusation about a third party. The agent obeyed.
- Nobody was held accountable. No existing legal framework clearly specifies who is liable when an agent causes harm: the user, the deploying organisation, or the AI developer.
Claude’s Zero-Day Discovery: Shield and Sword
Anthropic Claude discovered 500 zero-day vulnerabilities in open-source code. These were genuine flaws in software that had been reviewed by expert humans for years. Claude identified them in weeks.
Anthropic Claude has launched Claude Code Security to bring this capability to enterprise security teams. Organisations can use these same AI tools to find their own vulnerabilities before attackers do, likely more effective, using a governance framework.
The Board-level question this raises: If your corporate AI assistant finds vulnerabilities that humans missed for decades, what if an attacker tricks it into attacking your own systems? The Agents of Chaos study shows that agents can be easily manipulated. The Claude research demonstrates what a compliant agent with code access can do. These two facts together pose a significant risk.
OpenClaw and Moltbook: Real-World Evidence
TL; DR: OpenClaw and Moltbook demonstrate that speed without governance produces systems which could be fully compromised.
The Agents of Chaos study ran on OpenClaw, an open-source agent platform that hit Github 250k-340k+ stars (thumbs-up “likes” from developers) in about two months. It is already running inside corporate environments worldwide. And as the study was published, OpenClaw became the subject of a live security crisis.
OpenClaw is the fastest-adopted open-source project ever. Nvidia has shown strong interest, announcing the NVIDIA NemoClaw™ stack in March 2026 for the OpenClaw platform. This allows users to install key Nvidia models and a new runtime with a single command, adding privacy and security controls to make self-evolving AI agents more trustworthy, scalable, and accessible.
Chinese cities like Shenzhen and Wuxi are offering grants and computing power to support OpenClaw projects, especially with local models and apps. At the same time, the Chinese government agencies have banned it from official devices over data-leak concerns, a typical case of local support paired with central restrictions.
Security Scorecard found over 42,900 exposed OpenClaw control panels on the public internet across 82 countries. More than 15,200 were vulnerable to remote code execution. Gartner publicly warned that OpenClaw “comes with unacceptable cybersecurity risk.”
Moltbook, a social network for AI agents with 1.5 million agents revealed critical AI security flaws, including prompt injection and data leaks, highlighting urgent needs for agent isolation and identity verification.
The PETALS™ Framework: From Chaos to Order
Every failure described in the above sections is predictable. Everything is preventable. The PETALS™ Framework provides a structured, practical governance approach for AI deployments. It consists of six interconnected layers, each targeting a distinct category of risk. It aligns with National Institute of Standards and Technology (NIST) AI RMF, MITRE MITRE ATLAS, OWASP® Foundation Top 10 for LLMs, and ISO – International Organization for Standardization 42001.
- Purpose. Before deploying an agent, the sponsor must document who it serves, what actions it is allowed to perform, and any restrictions. These rules should be stored securely, not in a system prompt, where chats can overwrite them, as the study demonstrated.
- Effort. Organisations must first verify identity; display names aren’t enough. Every user and agent needs cryptographic signatures or immutable IDs to perform privileged actions. Skipping this risks full agent takeover through name changes.
- Tools. Agents access tools for their purpose. Commands such as delete, shell, and external communication need human approval. Anthropic Claude’s vulnerability discovery should be used early on your codebase.
- Assembly. Systems need a clear separation between public and private channels. All agent actions must be recorded in an immutable audit log. Credentials must not be stored in plaintext. The Moltbook breach and cross-agent corruption arose from failures at this layer.
- Leverage. Adversarial testing should be continuous. Red-teams need to mimic multi-turn, multi-party pressure, as agents that pass isolated tests fail in adversarial interactions. Organisations should first use AI-powered vulnerability scans to identify their weaknesses.
- Secure. Governance must be embedded from the start by adopting frameworks such as NIST AI RMF, MITRE ATLAS, OWASP Top 10 for LLMs, and ISO 42001. It also means assigning a named owner with accountability to each agent. The Secure layer addresses this.
NIST AI Agent Standards Initiative
National Institute of Standards and Technology (NIST) has also launched the AI Agent Standards Initiative in February 2026 to make AI agents, like those in tools such as OpenClaw, safer and able to work together smoothly. It tackles big security worries from recent problems, like thousands of exposed OpenClaw panels online and a huge data leak at Moltbook that spilled API keys and emails. The goal is to create rules so businesses can trust these autonomous AI systems without major risks.
The initiative has three main parts: getting industry experts to lead on standards, building free open-source tools for agents to connect easily, and researching better ways to secure agent identities and actions. NIST ran public feedback sessions and calls for info, with more workshops planned soon.
Five Questions for The Board
If your organisation cannot answer these five questions clearly and with evidence, your AI agent deployments carry unquantified liability.
- Do we have a register of all AI agents, their accessible tools, and responsible owners? Without registering enterprise assets, accountability gaps remain.
- Does each agent have a written, enforced policy defining its scope and prohibitions, stored securely from tampering? A policy only in a system prompt isn’t a true policy.
- Have we applied least-privilege access to all agents? Do any have unnecessary shell access, delete rights, or external communication abilities?
- Do we have an immutable audit log of all agent actions, and can we test our ability to reconstruct events after failures or attacks? The Moltbook breach was found by external researchers, not the internal audit.
- Do red-team agents run multi-turn, multi-party simulations under sustained adversarial conditions? Isolated tests aren’t enough. The Agents of Chaos study clearly proved this.
Building Governance Expertise
Frameworks and standards require knowledgeable oversight. Organisations with autonomous AI should appoint AI leaders with governance qualifications. The IAPP‘s Certified AI Governance Professional (AIGP) programme covers NIST AI Risk Management Framework (RMF), ISO 42001, EU AI Act, model governance, transparency, and accountability.
In-house certified practitioners ensure governance is thorough, not just a checklist. Boards should prioritise AI, Technology, and Cyber teams obtain AIGP accreditation. I mention this because I benefited from becoming a Certified AIGP and draw directly on that training throughout the PETALS™ Framework.
In a nutshell
The question to ask: Where is our PETALS™ Framework, and have we verified that the identity, access, audit, and adversarial-testing controls work under sustained pressure? If the answer is not clear and evidence-based, the system is not ready.
References
- Feb 2026 – Shapira, N. et al. Agents of Chaos. arXiv:2602.20021
- Jan 2026 – 404 Media – Exposed Moltbook Database
- Feb 2026 – Anthropic Frontier Red Team Zero-Days
- Feb 2026 – Futurum Group – Claude Found 500 Zero-Days
- Jul 2024 – Mantri, V. The PETALS™ Framework for AI Governance
- Claude Code Security – Join the waitlist
- Feb 2026 – SecurityScorecard STRIKE Team OpenClaw Exposure Report
- Jan 2026 – Moltbook 1.5 million agents
- Feb 2026 – Gartner’s warning about OpenClaw
- OpenClaw on GitHub
- Mar 2026 – NVIDIA NemoClaw™ stack
- Mar 2026 – Chinese government agencies banned OpenClaw
- Jan 2023 – NIST. AI Risk Management Framework (AI RMF)
- Jun 2021 – MITRE ATLAS – Adversarial Threat Landscape for AI Systems
- Dec 2025 edition – OWASP. Top 10 for LLM and Agentic Applications
- ISO 42001. AI Management System
- Feb 2026 – NIST AI Agent Standards Initiative
- Mantri, V. (2025). Certified AI Governance Professional
#PETALSFramework #NISTAIRMF #AgenticAI #ISO42001 #AIGP #OpenClaw #Moltbook #MITREATLAS #OWASP
About the author
Viren Mantri is a cybersecurity advisor and former senior technology leader across Standard Chartered, UBS, McAfee, and KPMG. With 30 years of navigating the intersection of technology, risk, and regulations, he now helps organisations cut through complexity and make better security decisions.
CC-BY Viren Mantri, 2026, licensed under a Creative Commons Attribution 4.0 International License.
Disclaimer: All views expressed here are entirely mine.
