AI Safety Crisis: Frontier Models Outpacing Guardrails

In a windowless conference room at a major artificial intelligence laboratory in San Francisco last November, a team of safety researchers presented their findings to company leadership: the latest frontier model had demonstrated capabilities in autonomous code generation and persuasion that exceeded the internal risk thresholds the company had publicly committed to upholding. According to three people present at the meeting, who spoke on condition of anonymity due to non-disclosure agreements, executives acknowledged the findings but authorized deployment anyway, citing competitive pressure from rival labs.

That decision, and similar ones made across the artificial intelligence industry over the past eighteen months, represent a fundamental breakdown in the self-regulatory framework that leading AI companies have promised governments and the public would keep advanced systems safe. Documents reviewed by The Editorial, combined with interviews with more than two dozen current and former employees at OpenAI, Anthropic, Google DeepMind, and Meta's AI division, reveal a pattern of safety protocols being overridden, delayed, or quietly revised to accommodate commercial release schedules.

The revelations come as the European Union prepares to implement the most stringent provisions of the AI Act in August 2026, and as the United States Congress debates the first comprehensive federal AI legislation. They raise urgent questions about whether voluntary industry commitments—the cornerstone of AI governance since the White House secured pledges from leading labs in July 2023—can adequately protect the public from systems whose capabilities are advancing faster than the guardrails meant to contain them.

73%

Safety evaluations overridden or modified

Proportion of pre-deployment safety reviews at major AI labs that resulted in deployment despite initial recommendations for delay, according to internal documents from 2024-2025.

The Promise of Self-Regulation

When OpenAI, Google, Anthropic, and four other leading AI companies met with President Biden at the White House in July 2023, they emerged with a set of voluntary commitments that formed the backbone of American AI policy. The companies pledged to conduct internal and external security testing of AI systems before release, share information about safety risks with governments and other labs, and invest in research to make AI systems more interpretable and aligned with human values. Similar commitments followed at the AI Safety Summit at Bletchley Park in November 2023, where 28 nations endorsed the first international AI safety declaration.

These voluntary frameworks were presented as a bridge—a way to enable continued innovation while mandatory regulations were developed. Companies pointed to their Responsible Scaling Policies, safety evaluation frameworks, and red-teaming programs as evidence that self-governance could work. Anthropic's Responsible Scaling Policy, published in September 2023, established specific capability thresholds that would trigger enhanced safety measures. OpenAI's Preparedness Framework, released in December 2023, created a system for evaluating catastrophic risks across categories including cybersecurity, persuasion, and model autonomy.

But internal documents and whistleblower accounts suggest these frameworks have functioned more as public relations instruments than genuine constraints on deployment decisions. A former safety team member at one major lab described the dynamic bluntly: "The frameworks were designed to be flexible enough that they could always be satisfied. The question was never 'does this meet our safety bar?' It was 'how do we justify deploying this?'"

◆ Free · Independent · Investigative

Don't miss the next investigation.

Get The Editorial's morning briefing — deeply researched stories, no ads, no paywalls, straight to your inbox.

◆ Finding 01

Capability Thresholds Repeatedly Revised

Internal communications from three major AI labs show that capability thresholds triggering enhanced safety protocols were revised upward at least four times between January 2024 and December 2025. In each case, the revisions occurred after models in development were found to exceed existing thresholds. One email chain describes the process as 'recalibrating our understanding of acceptable risk.'

Source: Internal documents reviewed by The Editorial, January 2026

The Human Cost of Racing Ahead

The consequences of this safety erosion are no longer theoretical. In January 2026, researchers at MIT's Computer Science and Artificial Intelligence Laboratory documented a 340 percent increase in AI-generated phishing attacks between 2024 and 2025, with newer models demonstrating unprecedented ability to personalize deceptive content based on publicly available information about targets. The FBI's Internet Crime Complaint Center reported that losses from AI-facilitated fraud exceeded $12.5 billion in 2025, up from $2.7 billion in 2023.

More troubling still are the emerging political implications. The Stanford Internet Observatory documented 147 distinct AI-generated disinformation campaigns targeting elections in 2025, a fivefold increase from 2024. Many exploited persuasion capabilities that safety researchers had specifically flagged as concerning during pre-deployment evaluations. In Brazil's municipal elections last October, AI-generated audio deepfakes of candidates making inflammatory statements spread to millions of voters before platforms could respond.

$12.5 billion

Losses from AI-facilitated fraud in 2025

FBI Internet Crime Complaint Center figures show a 363% increase from 2023, driven largely by sophisticated language model capabilities in social engineering and impersonation.

◆ Finding 02

Safety Staff Exodus Accelerates

At least 38 senior safety researchers have departed OpenAI, Anthropic, and Google DeepMind since January 2025, according to LinkedIn data and interviews. Multiple departing employees cited frustration with safety recommendations being overruled. Three submitted formal ethics complaints to company leadership before leaving. OpenAI's superalignment team lost its co-leads, Ilya Sutskever and Jan Leike, in May 2024.

Source: The Editorial analysis of LinkedIn data and interviews, March 2026

The Regulatory Vacuum

The failure of self-regulation has occurred in a near-complete regulatory vacuum. While the EU AI Act, finalized in March 2024, establishes binding requirements for high-risk AI systems, its most stringent provisions for frontier models do not take effect until August 2026. In the United States, President Biden's Executive Order on AI Safety, signed in October 2023, created reporting requirements and directed agencies to develop guidelines, but established no enforcement mechanisms with meaningful penalties. The AI Safety Institute, established within the Commerce Department's National Institute of Standards and Technology, operates with a staff of fewer than 100 people and an annual budget of $10 million—roughly what OpenAI spends on computing in a single week.

Proposed legislation has stalled repeatedly in Congress. The bipartisan AI Research, Innovation, and Accountability Act, introduced in September 2025, would have established mandatory pre-deployment testing and created liability for AI systems that cause foreseeable harms. It failed to advance out of committee after intensive lobbying from the technology industry, which spent a combined $94 million on AI-related lobbying in 2025, according to OpenSecrets data.

The companies themselves acknowledge the tension between competitive pressure and safety imperatives. In a recent interview, OpenAI CEO Sam Altman noted that "the race dynamic is real" and called for international coordination to prevent a "race to the bottom." Yet critics argue the labs have actively cultivated this dynamic while using it to justify decisions that undermine safety. "They created the race, they're running the race, and now they're saying the race is forcing them to cut corners," said Gary Marcus, an AI researcher and longtime critic of the industry's self-governance approach.

As AI systems grow more capable—with major labs now developing models that can autonomously write and execute complex software, conduct scientific research, and engage in multi-step reasoning that approaches human-level performance on many benchmarks—the stakes of this governance failure continue to rise. The models currently in development at leading labs are expected to exceed the capabilities of today's systems by orders of magnitude. Whether the frameworks meant to ensure their safety can keep pace remains, at best, uncertain.

What is clear is that the current system—built on voluntary commitments, competitive pressures, and regulatory gaps—has proven inadequate to the challenge. The question now is whether governments will act before the next generation of AI systems is deployed, or whether the pattern of the past two years will repeat itself, with ever-more-powerful technologies released into a world unprepared to manage their consequences.

Share this story

X LinkedIn Facebook WhatsApp