
OpenAI has shipped a security update to ChatGPT Atlas aimed at prompt injection in AI browsers, attacks that hide malicious instructions inside everyday content an agent might read while it works.
Atlas’s agent mode is built to act in your browser the way you would: it can view pages, click, and type to complete tasks in the same space and context you use. That also makes it a higher-value target, because the agent can encounter untrusted text across email, shared documents, forums, social posts, and any webpage it opens.
The company’s core warning is simple. Hackers can trick the agent’s decision-making by smuggling instructions into the stream of information it processes mid-task.
A hidden instruction, big consequences
OpenAI’s post highlights how quickly things can go sideways. An attacker seeds an inbox with a malicious email that contains instructions written for the agent, not the human.
Later, when the user asks Atlas to draft an out-of-office reply, the agent runs into that email during normal work and treats the injected instructions as authoritative. In the demo scenario, the agent sends a resignation letter to the user’s CEO, and the out-of-office never gets written.
If an agent is scanning third-party content as part of a legitimate workflow, an attacker can try to override the user’s request by hiding commands in what looks like ordinary text.
An AI attacker gets practice runs
To find these failures earlier, OpenAI says it built an automated attacker model and trained it end-to-end with reinforcement learning to hunt for prompt-injection exploits against a browser agent. The goal is to pressure-test long, realistic workflows, not just force a single bad output.
The attacker can draft a candidate injection, run a simulated rollout of how the target agent would behave, then iterate using the returned reasoning and action trace as feedback. OpenAI says privileged access to those traces gives its internal red team an advantage external attackers don’t have.
What to do with this now
OpenAI frames prompt injection as a long-term security problem, more like online scams than a bug you patch once. Its approach is to discover new attack patterns, train against them, and tighten system-level safeguards.
For users, you should use logged-out browsing when you can, scrutinize confirmations for actions like sending email, and give agents narrow, explicit instructions instead of broad “handle everything” prompts. If you’re still curious what AI browsing can do, then go with browsers that ship updates that benefit you.