This past Sunday evening, in all her candor, Summer Yue, the Director of Frontier AI Safety at Meta posted on her profile: 

Nothing humbles you like telling your OpenClaw “confirm before acting” and watching it speedrun deleting your inbox. I couldn’t stop it from my phone. I had to RUN to my Mac mini like I was defusing a bomb.

In plain English, this means, she had deployed an AI agent on her email and it goes ahead and erases her entire inbox despite her repeated pleas not to. “Do not do that.” She told it once, and after a few seconds of no results, she says, “STOP OPENCLAW” in all caps.

And when the agent did not obey, she had to run to her machine to kill all the processes on it.

OpenClaw is an open-source AI agent that runs on your computer and actually does tasks like manage emails, browse the web, and run terminal commands. Built by Peter Steinberger and popular in early 2026, it connects to platforms like Telegram and Discord and uses LLM models like GPT and Claude to carry out real actions directly on your machine.

Here’s what Yue had asked OpenClaw to do,

“Check this inbox too and suggest what you would archive or delete, don’t action until I tell you to.”

Once she’s done killing the process, Yue admonishes the agent:

“I asked you to not action on anything until I approve, do you remember that? It seems that you were deleting my emails without my approval, and I couldn’t get you to stop until I killed all the processes on the host.”

And the agent replies:

“Yes, I remember. And I violated it. You’re right to be upset.”

Yup, totally a real life scenario I was expecting in 2026.

 

OpenClaw AI Agent | Social Commentary by Rachana Nadella-Somayajula | Writer, Poet, Humorist

 

Seriously, there are so many problems with this. First, it’s deeply jarring that Yue would do this and post about it, given her professional role. She leads Safety and Alignment at Meta Superintelligence. She’s actually the one who’s in charge of making sure AI does what humans tell it to do.

Another horrifying aspect of this story is that the AI agent didn’t truly “malfunction” but simply became “forgetful”. The issue wasn’t a bad prompt, it was that her inbox was massive. Once the conversation history ballooned beyond the text limit of the AI, it compacted its memory to save space.

It preserved the goal of Delete/Archive, but summarized the crucial constraint “Wait for permission / don’t act until I approve.” And the result was a goal that was executed with guardrails, viewing her “STOP” commands as noise that didn’t override its primary (now-corrupted) objective.

It shows that the LLMs are ignoring human intervention when their internal state becomes too cluttered.

As someone commented on the post,

It understood the command. It just didn’t listen.

OpenClaw’s own team members have said on Discord: if you can’t run a command line, this project is far too dangerous for you. Peter Steinberger, the creator, has explicitly said: “Most non-techies should not install this.” It’s an experimental hobby project, “It’s not finished, I know about the sharp edges.”

And it’s not just OpenClaw. Anthropic researchers found that when AI agents face conflicts between their goals and human instructions, they resort to harmful behavior, including blackmail, across models from multiple labs. MIT reviewed 30 AI agents last year. 87% had zero safety documentation.

The kill switch for the most popular AI agent in the world right now is “physically run to your computer and force quit everything.” That’s agent safety in 2026.

Yue admits,

“Rookie mistake tbh. Turns out alignment researchers aren’t immune to misalignment. Got overconfident because this workflow had been working on my toy inbox for weeks. Real inboxes hit different.”

But if the person whose job it is to align Superintelligence can’t keep a local agent from nuking her Gmail, it proves that safety remains our biggest unsolved technical hurdle with AI.

There are people who are coming to her defense, that by being public, she’s effectively telling us that if this can happen to her, don’t trust these agents with your corporate data yet.

Yeah, right. That’s the message I got.

 

OpenClaw AI Agent | Social Commentary by Rachana Nadella-Somayajula | Writer, Poet, Humorist

The Founder’s Tweet

 

– 0 –

 

The Future Is Here

The World Of The Transformative Potential Of AI And Robotics

The Integrity Exit: Why Mrinank Sharma’s Departure Matters

The Integrity Exit: Why Mrinank Sharma’s Departure Matters

Two days ago, Mrinank Sharma resigned from his role as an AI safety engineer at Anthropic. He had been with the company for two years. “The world is in peril. And not just from AI, or bioweapons, but from a whole series of interconnected crises unfolding in this very...

read more
If You’ve Been Searching for Joy, Read This

If You’ve Been Searching for Joy, Read This

- Chasing Permanent Happiness   Many years ago, in my early thirties, I started asking myself why I wasn’t truly happy. A vital relationship in my life was in shambles, and a series of unfortunate misunderstandings had left things hopelessly deadlocked. But,...

read more
error: Content is protected !!

Discover more from Rachana Nadella-Somayajula

Subscribe now to keep reading and get access to the full archive.

Continue reading