Indirect Prompt Injection

Sovereign AI

Indirect prompt injection is an attack in which an adversary plants malicious instructions inside external content that a large language model later ingests as part of its context — rather than typing the attack directly into the chat box. When an AI agent retrieves a web page, reads an email, opens a document, or pulls a record during tool use, any attacker-controlled text in that source can hijack the model's behavior. It is the stealthier, more dangerous sibling of direct prompt injection: the victim never knowingly invites the attacker in. NIST and the OWASP Gen AI Security Project both rank prompt injection as the top risk for LLM applications, and the indirect variant is the half that quietly turns a helpful assistant into a confused deputy.

How the payload hides

The malicious instructions do not need to be human-readable to work, because the model reads the raw text, not the rendered page. Common concealment tricks include white text on a white background, zero-width Unicode characters, text tucked into HTML comments or image alt attributes, and instructions buried deep in a long document where a human reviewer is unlikely to look. When a retrieval-augmented pipeline pulls that content into the prompt, the model has no built-in way to distinguish the developer's instructions, the user's request, and the attacker's smuggled commands — they all arrive as the same undifferentiated token stream.

What an attacker can do with it

Once parsed, a payload can exfiltrate conversation history, coax the agent into unauthorized tool calls, quietly rewrite its goals, or plant instructions that persist into later turns. Documented zero-click exploits against production assistants have chained retrieval and tool use to leak data with no user action beyond asking an ordinary question. As agents gain the ability to browse, send messages, and touch real systems, the blast radius grows: an injection is no longer just "the model said something wrong," it can become "the model did something wrong on your behalf."

A particularly nasty class of attack uses the agent's own outbound capabilities as the exfiltration channel. Hidden text can instruct the model to encode secrets from earlier in the conversation into a URL it then fetches, or into an image it renders, so private data leaves through a request that looks perfectly ordinary in the logs. Because the instruction and the leak are both dressed as normal assistant behavior, nothing obviously alarming happens on screen — which is what makes the indirect variant so well suited to quiet, persistent compromise rather than a noisy one-off.

Reducing the blast radius

There is no single patch; defense in depth is the only durable answer. Treat all retrieved content as untrusted input, keep instruction channels separated from data channels, constrain which tools an agent may call and with what privileges, and require explicit human confirmation for anything sensitive or irreversible. Structured red-teaming before deployment surfaces these paths early. This threat also sits alongside other input-manipulation attacks — see data poisoning and adversarial examples for related ways untrusted data subverts a model.

Why sovereign AI operators should care

If you self-host models to keep your data off someone else's servers, indirect injection is the attack that most directly threatens that goal, because it turns your own trusted assistant into the leak. The same content you feed it for convenience — your notes, your inbox, a scraped page — is the vector. The mitigation is cultural as much as technical: assume anything your agent reads could be trying to instruct it, give the agent the least authority that still gets the job done, and log its actions so a hijack is visible after the fact. Running the stack yourself does not eliminate the risk, but it puts every relevant control — network egress, tool permissions, and audit — in your hands rather than a vendor's.

Indirect prompt injection is an attack in which an adversary plants malicious instructions inside external content that a large language model later ingests as part…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners