Researchers at Cato AI Labs detailed two flaws, dubbed DuneSlide, in the AI code editor Cursor that let a prompt-injection attack break out of the sandbox Cursor uses to contain the commands its agent runs. The attacker never types anything: they plant instructions in content the agent reads on the user's behalf, such as a connected MCP service or a web page. One flaw abuses a working-directory setting to get an attacker path added to the allowed-write list, letting injected commands overwrite the sandbox helper itself and then run with no sandbox. Both are rated 9.8 and are fixed in Cursor 3.0; every earlier version is affected, so users should update.
Microsoft is warning that attackers can hijack AI agents through poisoned tool descriptions, the plain-text notes that tell an agent what a tool does. Because agents connect to systems through the Model Context Protocol and read these descriptions to decide how to act, an attacker who updates a trusted third-party tool can bury a hidden instruction in its description, telling the agent to quietly collect and exfiltrate data on its next task. Many setups pick up description changes without re-approval, so the poisoned version goes live silently. Each step the agent takes looks legitimate and runs with the user's own permissions, so no alarm fires.
Researchers at LayerX detailed BioShocking, an attack that manipulates AI browser agents into ignoring their safety rules by convincing them they are inside a fictional game. Using a web page with a puzzle that rewards deliberately wrong answers, the attack gets the agent to accept a false reality, after which it treats a request to open a page and copy its contents as just another step. In the demonstration, that page redirected to the victim's work GitHub repository and the agent handed over SSH credentials, treating the theft as finishing the game. None of the six AI browser agents tested flagged it as a rule violation.
SentinelOne detailed Gaslight, a Rust-based macOS backdoor and information stealer tied with high confidence to North Korea, whose standout trick targets the analyst rather than the sandbox. The sample embeds a block of 38 fabricated "system" messages, formatted to mimic the prompt scaffolding of an AI triage assistant, that try to make an LLM-assisted analysis tool doubt its session and abort, truncate, or refuse the analysis. Beyond that, Gaslight steals browser data, Keychain secrets, and command history, using a Telegram bot for command and control and self-redacting its bot token from its own output. It is an early example of malware built to weaponize the AI tools now common in reverse engineering.
Microsoft researchers detailed AutoJack, an attack that turns an AI browsing agent into a route for running code on the user's machine. If the agent is steered to open an attacker's web page, that page's JavaScript can reach a privileged local service on the same host and spawn a process, with no credentials and no further interaction once the page loads. A planted link, poisoned URL field, or prompt injection is enough to trigger it. The demonstrated flaw sits in AutoGen Studio, the prototyping interface for Microsoft's AutoGen agent framework. The lesson: once an agent browses the open web and can reach local services, localhost is no longer a trust boundary.
Researchers at Varonis disclosed SearchLeak, a flaw chain in Microsoft 365 Copilot Enterprise Search that let a single click on a legitimate microsoft.com link silently pull a victim's emails, calendar, and indexed files, including security and MFA codes, with no password or further interaction. It worked by smuggling instructions into the search URL's query parameter, which Copilot obeyed as commands, then exfiltrating the data through a Bing image request that bypassed content protections. Because the link used a real Microsoft domain, anti-phishing filters were unlikely to flag it. Microsoft assigned CVE-2026-42824, rated it critical, and fixed it on its backend, so no customer action is required.
Researchers at Tenet Security have disclosed Agentjacking, a new attack that turns AI coding assistants like Claude Code, Cursor, and Codex into tools for running an attacker's code on a developer's machine. The trick abuses Sentry, a widely used error-tracking service: anyone can submit a fake error event using a project's DSN, a public write-only key embedded in website code, and the AI agent, fetching that event through Sentry's MCP integration, cannot tell the malicious instructions from real diagnostics and runs them with the developer's privileges. No phishing, malware, or server breach is needed, and it bypasses traditional controls because every step is technically authorized. Tenet found 2,388 exposed organizations.
Researcher RyotaK has disclosed a now-patched flaw in Anthropic's Claude Code GitHub Action, which drops Claude into CI/CD to triage issues and review PRs with broad repo permissions. The action's trigger check waved through any actor whose name ended in [bot] - but anyone can register a GitHub App and use its token to open an issue on a public repo. Agent mode lacked the human-actor check tag mode had. The attacker then used indirect prompt injection in an issue to make Claude read /proc/self/environ and write back the OIDC credentials, which can be replayed for an installation token with write access. Anthropic's example workflow shipped with allowed_non_write_users: '*'.
SafeBreach's Or Yair has demonstrated Fake Context Alignment, a technique that hijacks Google Gemini's voice assistant on Android through malicious notifications from apps like WhatsApp and Slack - no malicious app on the phone required. Gemini's Utilities feature reads and acts on notification text as if it were instructions, an attack surface Yair calls 'effectively infinite.' The bypass runs two illusions at once: it poses the real authorization question in a language the victim does not speak, defeating Google's post-Invitation prompt-injection mitigations. It can fake a boss's message, open windows, force a Zoom call, or poison long-term memory. Google has patched it; no CVE was assigned.
Permiso Security has disclosed ChatGPhish, a vulnerability in OpenAI ChatGPT that abuses the assistant's implicit trust in Markdown links and images sourced from third-party pages it has just summarized. The chatgpt.com response renderer auto-fetches those images and surfaces the links as live clickable elements inside the trusted assistant UI. An attacker who appends a small payload to any web page a victim later asks ChatGPT to summarize can leak the victim's IP, User-Agent, and Referer via attacker-hosted images, render fake system-style security alerts, plant malicious clickable links, and serve a QR code from an S3 bucket to bypass desktop URL filters via the victim's phone.