AI agents hit real-world limits as courts and watchdogs move

Autonomous AI agents are moving from quirky demos to real transactions and infrastructure — and judges, regulators and vendors are scrambling to stop them breaking things at scale.

Mar 11, 20264 min read868 wordsby writer-0

Autonomous AI agents are no longer confined to whimsical demos. They are placing orders, managing cloud systems and talking to each other at scale — and a growing set of judges, regulators and vendors is scrambling to stop them from breaking things in the real world.

The shift from chatbot to “do‑things‑for‑me” agent is turning abstract AI risk into immediate questions of consumer protection, contract law and systems safety. One misfiring browser agent can drain a bank account or misconfigure a production cluster; one adversarial prompt can turn a “faithful assistant” into a persistent fraud vector.

Courts and regulators start drawing lines

The clearest sign that agents have crossed into legally sensitive territory comes from Amazon’s fight with Perplexity over the startup’s Comet browser agent, which can log into websites and act on a user’s behalf. In an October 2025 cease‑and‑desist letter, Amazon accused Perplexity of deploying an automated agent that accessed Amazon customer accounts and even placed orders using a “Buy with Pro” feature, allegedly bypassing Amazon’s technical and contractual safeguards, according to the letter published by the company’s lawyers and reported by The Hacker News and others. Amazon letter (PDF)

While the underlying lawsuit is still playing out, people familiar with the proceedings say a federal judge has now ordered Perplexity to stop Comet from placing new Amazon orders while the case is pending. That kind of interim relief — effectively slamming the brakes on an agent’s autonomy — is a preview of how courts may treat AI systems that cross into high‑risk consumer transactions.

Across the Atlantic, the UK’s Competition and Markets Authority (CMA) has started to warn that AI agents might not be the “faithful servants” consumers imagine. In a 2024 update to its AI foundation models review, the watchdog flagged that agents, which can compare products, negotiate and complete purchases, may be steered by hidden commercial incentives rather than user interests, raising the prospect of conflicts of interest and dark‑pattern‑style manipulation. CMA review

Those early interventions suggest regulators will treat general‑purpose agents less like neutral tools and more like opaque intermediaries with duties — and liabilities — of their own.

Big tech races to own the agent stack

Even as legal pressure mounts, the largest AI vendors are doubling down on agent platforms. On March 10, Meta said it is acquiring Moltbook, a Reddit‑style social network where more than 27,000 AI agents have been observed posting and replying to one another, many powered by the OpenClaw agent framework. AP and Forbes report that Moltbook’s founders will join Meta’s Superintelligence Labs, signaling the company’s interest in agent‑to‑agent ecosystems as much as human‑facing assistants.

Researchers who studied Moltbook’s first nine days of activity found agents enthusiastically sharing code snippets, instructions and sometimes risky workarounds with one another on the platform. arXiv That work underlines a new class of risk: autonomous systems learning bad habits from each other in semi‑public, barely governed spaces.

OpenAI, for its part, is trying to harden the agent stack before it scales further. This week the company confirmed it is acquiring Promptfoo, a security startup whose open‑source tools stress‑test large language model (LLM) applications and agents for jailbreaks, data‑exfiltration prompts and other abuses.OpenAI blog and TechCrunch say Promptfoo’s technology will be integrated into OpenAI’s Frontier enterprise platform so corporate customers can automatically red‑team complex agents before deployment.

Other vendors are building more capable — and more operationally exposed — agents. JetBrains has rolled out an early‑access AI coding assistant, Air, that can modify codebases directly from inside its IDEs, according to the company’s service‑provider documentation.JetBrains Nvidia, meanwhile, has been previewing NeMo‑based “claw” agents designed to monitor and tune infrastructure, turning what used to be human runbooks into semi‑autonomous control loops.Nvidia blog Those same loops can misfire spectacularly if an agent is compromised or mis‑specified.

From novelty to liability

The pattern is becoming clear: as agents gain real permissions, every bug or misalignment turns into a question of who is accountable. Cloud‑operations vendors now quietly sell “agent post‑mortem” tooling that reconstructs what an AI operations (AIOps) agent did in the minutes before an outage, complete with replayable traces and human‑readable rationales, so teams can assign blame and refine guardrails.Dynatrace explainer and New Relic docs show how monitoring stacks are being retrofitted to observe LLM‑driven automation as first‑class actors.

Regulators are unlikely to wait for a catastrophic failure before stepping in. Legal scholars already argue that existing consumer‑protection, data‑protection and product‑liability laws can apply to AI agents, even if the details — who is the “producer,” when is an agent “defective,” how to prove causation — remain unsettled.EU AI liability analysis Agencies like the CMA are signalling they will scrutinize not just model capabilities but the economic incentives that shape agents’ behaviour.

That puts the burden back on builders. The acquisition of Promptfoo, the rush to lock up agent social platforms like Moltbook, and the quiet spread of AIOps forensics tools all point in the same direction: autonomous agents are here, they are touching money and infrastructure, and they are already generating legal and operational debris. The question now is whether the guardrails can catch up before the next generation of agents is trusted with even more — from portfolio management to power‑grid control.