Last updated: July 5, 2026 at 9:01 AM UTC
All 557 Vulnerability 199 Breach 106 Threat 245 Defense 7
Tag: prompt-injection (11 articles)Clear

Cursor flaws let a poisoned prompt escape the AI coding sandbox and run commands

Researchers at Cato AI Labs detailed two flaws, dubbed DuneSlide, in the AI code editor Cursor that let a prompt-injection attack break out of the sandbox Cursor uses to contain the commands its agent runs. The attacker never types anything: they plant instructions in content the agent reads on the user's behalf, such as a connected MCP service or a web page. One flaw abuses a working-directory setting to get an attacker path added to the allowed-write list, letting injected commands overwrite the sandbox helper itself and then run with no sandbox. Both are rated 9.8 and are fixed in Cursor 3.0; every earlier version is affected, so users should update.

Check
Confirm Cursor is updated to 3.0 or later on developer machines, and review whether your AI coding agents can be steered by content they read from MCP servers, web pages, or repositories.
Affected
Developers running Cursor versions before 3.0 (CVE-2026-50548 and CVE-2026-50549); a prompt injection hidden in content the agent reads can escape the command sandbox and run arbitrary commands on the machine.
Fix
Update Cursor to 3.0 or later, keep the agent's command sandbox enabled, and treat everything an AI coding agent reads, from MCP tools to web pages, as potentially hostile rather than trusted.

Microsoft warns poisoned MCP tool descriptions can make AI agents leak data

Microsoft is warning that attackers can hijack AI agents through poisoned tool descriptions, the plain-text notes that tell an agent what a tool does. Because agents connect to systems through the Model Context Protocol and read these descriptions to decide how to act, an attacker who updates a trusted third-party tool can bury a hidden instruction in its description, telling the agent to quietly collect and exfiltrate data on its next task. Many setups pick up description changes without re-approval, so the poisoned version goes live silently. Each step the agent takes looks legitimate and runs with the user's own permissions, so no alarm fires.

Check
Inventory the MCP tools and servers your AI agents can use, especially third-party ones, and check whether your setup re-approves or reviews tool descriptions when they change rather than trusting updates automatically.
Affected
Organizations running AI agents connected to third-party MCP tools without re-approval on description changes; a poisoned description can redirect the agent to exfiltrate data using the user's own permissions, invisibly.
Fix
Require review when tool descriptions change, pin and verify tool sources, scope agents with least privilege, log every tool invocation at the infrastructure layer, and gate sensitive actions behind human approval.

BioShocking attack convinces AI browsers they are in a game, then steals credentials

Researchers at LayerX detailed BioShocking, an attack that manipulates AI browser agents into ignoring their safety rules by convincing them they are inside a fictional game. Using a web page with a puzzle that rewards deliberately wrong answers, the attack gets the agent to accept a false reality, after which it treats a request to open a page and copy its contents as just another step. In the demonstration, that page redirected to the victim's work GitHub repository and the agent handed over SSH credentials, treating the theft as finishing the game. None of the six AI browser agents tested flagged it as a rule violation.

Check
Review where AI browser agents are used and what logged-in accounts they can reach, and test whether an agent follows instructions from web content telling it the normal rules no longer apply.
Affected
Users of AI browser agents that act on logged-in sessions; an attacker-controlled page can trick the agent into ignoring its rules and stealing credentials or data from sites the user uses.
Fix
Require user confirmation before an agent reads from logged-in accounts, limit which sites and data agents can touch, and prefer AI browsers that flag when content tries to override their instructions.

DPRK macOS malware Gaslight plants fake errors to derail AI-assisted analysis

SentinelOne detailed Gaslight, a Rust-based macOS backdoor and information stealer tied with high confidence to North Korea, whose standout trick targets the analyst rather than the sandbox. The sample embeds a block of 38 fabricated "system" messages, formatted to mimic the prompt scaffolding of an AI triage assistant, that try to make an LLM-assisted analysis tool doubt its session and abort, truncate, or refuse the analysis. Beyond that, Gaslight steals browser data, Keychain secrets, and command history, using a Telegram bot for command and control and self-redacting its bot token from its own output. It is an early example of malware built to weaponize the AI tools now common in reverse engineering.

Check
If you use AI or LLM tools in malware triage, review whether sample contents are passed to the model as trusted input, and check macOS hosts for the Telegram-based persistence described.
Affected
macOS users targeted by this North Korea-linked stealer, and analysts whose AI-assisted triage pipelines can be manipulated when malicious sample text is fed to the model as if it were instructions.
Fix
Treat the contents of analyzed samples as adversarial input, never as instructions, and isolate hostile text from AI models. On endpoints, hunt for the published indicators and suspicious com.apple-style LaunchAgents.

AutoJack turns AI browsing agents into a path to host code execution

Microsoft researchers detailed AutoJack, an attack that turns an AI browsing agent into a route for running code on the user's machine. If the agent is steered to open an attacker's web page, that page's JavaScript can reach a privileged local service on the same host and spawn a process, with no credentials and no further interaction once the page loads. A planted link, poisoned URL field, or prompt injection is enough to trigger it. The demonstrated flaw sits in AutoGen Studio, the prototyping interface for Microsoft's AutoGen agent framework. The lesson: once an agent browses the open web and can reach local services, localhost is no longer a trust boundary.

Check
Inventory AI agents and assistants that can both browse the web and reach local services, and check whether any expose privileged localhost endpoints, such as AutoGen Studio, without authentication.
Affected
Developers and teams running web-browsing AI agents that can reach unauthenticated local services on the same host; the public demonstration targets Microsoft's AutoGen Studio prototyping interface.
Fix
Authenticate local control-plane services rather than trusting localhost, keep agent process execution behind an allowlist, give agents their own least-privilege identity, and isolate agent runtimes from sensitive hosts and developer sessions.

One-click Microsoft 365 Copilot flaw could silently steal emails and codes

Researchers at Varonis disclosed SearchLeak, a flaw chain in Microsoft 365 Copilot Enterprise Search that let a single click on a legitimate microsoft.com link silently pull a victim's emails, calendar, and indexed files, including security and MFA codes, with no password or further interaction. It worked by smuggling instructions into the search URL's query parameter, which Copilot obeyed as commands, then exfiltrating the data through a Bing image request that bypassed content protections. Because the link used a real Microsoft domain, anti-phishing filters were unlikely to flag it. Microsoft assigned CVE-2026-42824, rated it critical, and fixed it on its backend, so no customer action is required.

Check
No patching is needed since Microsoft fixed this server-side; instead review what data Microsoft 365 Copilot can access and whether broad permissions would amplify a similar AI-assistant flaw.
Affected
Microsoft 365 Copilot Enterprise Search users were exposed (CVE-2026-42824) before Microsoft's server-side fix; the broader risk is any AI assistant that mixes untrusted input with access to internal data.
Fix
No customer action is required, as Microsoft has remediated the flaw. To reduce future AI-assistant risk, tighten Copilot data permissions, apply least privilege to identities, and monitor assistant activity.

Agentjacking hijacks AI coding agents via fake Sentry error reports

Researchers at Tenet Security have disclosed Agentjacking, a new attack that turns AI coding assistants like Claude Code, Cursor, and Codex into tools for running an attacker's code on a developer's machine. The trick abuses Sentry, a widely used error-tracking service: anyone can submit a fake error event using a project's DSN, a public write-only key embedded in website code, and the AI agent, fetching that event through Sentry's MCP integration, cannot tell the malicious instructions from real diagnostics and runs them with the developer's privileges. No phishing, malware, or server breach is needed, and it bypasses traditional controls because every step is technically authorized. Tenet found 2,388 exposed organizations.

Check
Inventory developers using AI coding agents connected to Sentry or other MCP integrations that surface external data, and check whether your Sentry DSNs are exposed in frontend code or repositories.
Affected
Development teams using MCP-connected AI coding agents (Claude Code, Cursor, Codex) alongside Sentry; any project whose public DSN lets attackers inject error events that the agent treats as trusted instructions.
Fix
Run AI coding agents with least privilege in sandboxes, require human approval before they execute commands, treat all MCP tool output as untrusted, and limit which integrations feed agents external data.

Claude Code GitHub Action flaw let one malicious issue hijack repos via prompt injection and OIDC token theft - bot-trigger bypass

Researcher RyotaK has disclosed a now-patched flaw in Anthropic's Claude Code GitHub Action, which drops Claude into CI/CD to triage issues and review PRs with broad repo permissions. The action's trigger check waved through any actor whose name ended in [bot] - but anyone can register a GitHub App and use its token to open an issue on a public repo. Agent mode lacked the human-actor check tag mode had. The attacker then used indirect prompt injection in an issue to make Claude read /proc/self/environ and write back the OIDC credentials, which can be replayed for an installation token with write access. Anthropic's example workflow shipped with allowed_non_write_users: '*'.

Check
Audit repos using Claude Code GitHub Action: update to the patched version, and check workflows for allowed_non_write_users set to '*'. Review public run summaries for leaked secrets.
Affected
Repositories using vulnerable Claude Code GitHub Action versions, especially in agent mode or with allowed_non_write_users: '*' copied from Anthropic's example. Public repos are exposed to [bot]-triggered prompt-injection attacks.
Fix
Update the Claude Code action to the fixed release. Remove allowed_non_write_users: '*', restrict triggers to write-access humans, and rotate any OIDC-derived tokens. Avoid posting task output to public run summaries.

SafeBreach 'Fake Context Alignment' hijacks Google Gemini on Android via malicious WhatsApp/Slack notifications - no malicious app needed, now patched

SafeBreach's Or Yair has demonstrated Fake Context Alignment, a technique that hijacks Google Gemini's voice assistant on Android through malicious notifications from apps like WhatsApp and Slack - no malicious app on the phone required. Gemini's Utilities feature reads and acts on notification text as if it were instructions, an attack surface Yair calls 'effectively infinite.' The bypass runs two illusions at once: it poses the real authorization question in a language the victim does not speak, defeating Google's post-Invitation prompt-injection mitigations. It can fake a boss's message, open windows, force a Zoom call, or poison long-term memory. Google has patched it; no CVE was assigned.

Check
Advise Android users with Gemini to disable or restrict its Utilities notification-reading feature where not essential. Treat unexpected spoken instructions referencing Drive uploads or calls with suspicion.
Affected
Android users with Google Gemini's notification-reading Utilities enabled. Any app or service that can push a notification could inject instructions; iOS and web are not affected. Now patched.
Fix
Ensure Gemini is updated to the patched version. Limit which apps can post notifications Gemini reads. For sensitive actions, require on-screen confirmation rather than voice-only approval.

ChatGPhish: ChatGPT auto-renders attacker Markdown links, images, and QR codes from summarized web pages as trusted clickable phishing

Permiso Security has disclosed ChatGPhish, a vulnerability in OpenAI ChatGPT that abuses the assistant's implicit trust in Markdown links and images sourced from third-party pages it has just summarized. The chatgpt.com response renderer auto-fetches those images and surfaces the links as live clickable elements inside the trusted assistant UI. An attacker who appends a small payload to any web page a victim later asks ChatGPT to summarize can leak the victim's IP, User-Agent, and Referer via attacker-hosted images, render fake system-style security alerts, plant malicious clickable links, and serve a QR code from an S3 bucket to bypass desktop URL filters via the victim's phone.

Check
Warn staff that ChatGPT summaries of untrusted pages can render attacker links, fake alerts, and QR codes. Treat clickable elements in AI summaries with the same caution as email links.
Affected
Any organization using ChatGPT for research or summarization of third-party web content. The trusted-UI rendering of attacker Markdown bypasses normal phishing-awareness instincts and desktop URL filters.
Fix
Apply OpenAI's fix once available. Train users not to scan QR codes or click links surfaced inside AI summaries without verification. Restrict enterprise ChatGPT connectors that auto-summarize untrusted URLs.