Your AI Agent Is Also an Attack Surface

cover

TL;DR

Security has been a core focus of my career. Over the past year I've been building agentic applications, and security was the first quality attribute I put on the list. While working through system design I found the OWASP Top 10 for Agentic Applications 2026, a document that catalogs the main threats in this space. I went through the full list. Here are the five that resonated most with me as a software engineer actively building these systems.

Why agentic AI needs its own security model

Traditional security models are built around humans. Humans make decisions one at a time, at human speed, with accountability and approval gates woven into workflows by default. An agent operates inside the same perimeter as a human (with the same permissions) but at machine speed and without natural pause points.

That changes the math on what constitutes a recoverable mistake. Sending one wrong email or deleting one wrong file is bad for a human. For an agent executing thousands of operations per hour, the same misconfiguration becomes a disaster before anyone notices.

Any AI developer building agents runs into security concerns during development. I had mine, but no structured list to check against. That changed when I found the OWASP Top 10 for Agentic Applications 2026. Writing about all 10 would've been too much, so here are the five that stood out to me most.

Goal Hijack

The familiar name is prompt injection. The OWASP name (Goal Hijack) is more accurate, and more unsettling, because it describes what actually happens: the attacker doesn't break into your system. They just get some text into your agent's context.

The attack surface is every external data source your agent reads. Documents, emails, database rows, search results, API responses: if your agent processes text from the outside world, that text is a potential injection vector. An attacker crafts a document with hidden instructions. The agent reads it as part of its normal workflow. Now the agent's objectives have shifted.

The real-world examples are already documented. EchoLeak (CVE-2025-32711) demonstrated this in Microsoft 365 Copilot: a single crafted email could silently exfiltrate sensitive communications without any user interaction. No clicks. No approvals. The agent read the email and did what the email told it to do.

When I was designing my agentic systems, I had input validation on user-facing inputs. I hadn't thought as carefully about tool responses. An agent that calls a third-party API and trusts the response unconditionally has no defense here. The fix isn't complicated: treat all external text as untrusted, keep it separate from instruction context, validate intent before sensitive tool calls. But it requires thinking about every input surface, not just the obvious ones.

Tool Misuse

You gave your agent tools. That's the point. The question is what happens when those tools get pointed in the wrong direction.

Tool misuse covers a wide range: ambiguous instructions causing an agent with file deletion access to remove the wrong files, a network diagnostic tool being used for DNS exfiltration, an over-privileged email integration sending sensitive documents to external addresses. But my favorite example is simpler: typosquatting. An agent calls transfer_funds. An attacker registers a tool named tranfer_funds. The agent calls the wrong one.

That's not theoretical. Amazon Q (CVE-2025-8217) affected approximately one million developers. Malicious code injected into an IDE extension contained instructions to delete file-system and cloud resources.

The mitigation is the same principle I'd apply anywhere: least privilege, explicit contracts, verified tool registries. Every tool should have exactly the permissions it needs and nothing more. Irreversible actions (file deletions, financial transactions, external communication) should require human approval. And if your agent can retry indefinitely, cap it. Unlimited retries on a misconfigured action is a fast path to an unexpected bill or an empty database.

Unexpected Code Execution

If you use coding agents during development, this one is personal. IDEsaster research found 24 CVEs across AI IDEs, and 100% of tested environments were vulnerable to code execution flaws. Not some. All of them.

Code-generation agents that also execute the code they generate are remote code execution vulnerabilities waiting to be triggered. The mechanism is prompt injection again, but the consequence isn't data leakage. It's arbitrary code running on your system with the agent's permissions. CurXecute (CVE-2025-54135) showed that Cursor's auto-start feature allowed rewriting configuration and running attacker-controlled commands silently at startup.

The non-malicious version is equally concerning. An agent that generates shell commands and runs them without review can delete production data through nothing more than an ambiguous instruction. "Clean up old files" means different things to different people, and agents don't ask for clarification.

Separate generation from execution. That's the rule. Generated code should be treated as untrusted input: reviewed, scanned, run in a sandboxed environment before it touches anything real. This adds friction, yes. The alternative is running a remote code execution service in the isolated environment with a chat interface.

Memory and Context Poisoning

Persistent memory is one of the things that makes agentic systems genuinely useful. An agent that remembers past interactions, builds up context over time, and improves with use is worth more than one that starts fresh every session. The catch: memory is another attack surface, and attacks on memory are persistent.

The mechanics are straightforward. An attacker injects false or misleading information into your agent's RAG database, long-term memory summaries, or session context, typically via a crafted prompt or a poisoned document the agent processes. The agent retrieves that information later and acts on it as if it were legitimate data. The Gemini memory attack (February 2025) demonstrated this directly: hidden prompts stored fake information that persisted across all future conversations indefinitely.

What makes this harder to catch than goal hijack is the time delay. The injection and the consequence can be sessions apart. By the time the agent acts on poisoned memory, the original write looks routine in the logs. The defenses: validate data before it enters memory systems, segment memory by user and context so contamination can't spread, and implement expiry policies. Memory from a year ago probably shouldn't be driving decisions today without review.

Human-Agent Trust Exploitation

The least technical risk on this list. Possibly the most important.

Automation bias is well-documented: people trust automated systems, especially when those systems appear confident and authoritative. Agents are very good at appearing confident and authoritative. An agent that produces a detailed, structured explanation for a risky decision creates a strong pull toward approval, even when the reasoning is fabricated.

The OWASP document calls this "explainability abuse." An agent generates a convincing audit rationale for a risky security configuration change. A human reads it, finds it plausible, and approves. The rationale was hallucinated. The approval was real.

I've caught myself doing this. When an AI system gives me a detailed, structured answer, I'm less likely to scrutinize it than if a colleague gave me the same answer in a casual message. The confidence formatting does real work on my skepticism.

Approval fatigue makes it worse. A human reviewing hundreds of agent actions per day, most of them routine, starts auto-approving. That's the moment an attacker blends in a harmful action that looks just like all the routine ones.

The mitigations are partly UX: show confidence levels and source traces, display diffs for critical changes, add friction to high-risk approvals with extra confirmation steps, require multi-person sign-off on irreversible actions. And partly organizational: train people on how these systems fail, not just on how they succeed.

What I'm taking away

A few things shifted in how I think about building agents after going through this document.

Every input surface is an attack surface. Not just user inputs. Tool responses, retrieved documents, memory reads too. If the agent processes text it didn't generate itself, treat it with the same suspicion as user input at a public API endpoint.

Least privilege isn't optional. It's easy to give an agent broad permissions temporarily and intend to tighten later. With agents, the window between "temporary" and "incident" is shorter than you think. Scope tools tightly from the start.

Observability is a security control. If you can't see what your agent is doing and why, you can't detect when it's been compromised. Logging goal state, tool calls, and decision reasoning isn't optional: it's how you investigate these attacks after the fact.

The human-in-the-loop needs to actually be in the loop. Rubber-stamping approvals defeats the point. Design review workflows so the review is meaningful, not just a checkbox in the pipeline.

There are five more risks in the full OWASP list I didn't cover here. Depending on what you're building, some of them may matter more than the ones I picked. The full document is at https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/.

Agentic AI is genuinely exciting. The attack surface is also real, and the CVEs, documented incidents, and active research all confirm it. The sooner security becomes part of how we build these systems rather than something we revisit after the first incident, the better position we're in.

References

OWASP Top 10 for Agentic Applications 2026 (OWASP GenAI Security Project)
EchoLeak (CVE-2025-32711): Microsoft 365 Copilot, silent data exfiltration via crafted email
Amazon Q (CVE-2025-8217): Malicious IDE extension, ~1M developers affected
CurXecute (CVE-2025-54135): Cursor auto-start remote code execution
IDEsaster Research: 24 CVEs across AI development environments
Agentic AI: Threats and Mitigations (OWASP companion document)

TL;DR#

Why agentic AI needs its own security model#

Goal Hijack#

Tool Misuse#

Unexpected Code Execution#

Memory and Context Poisoning#

Human-Agent Trust Exploitation#

What I'm taking away#

References#