Title

Threat Modeling AI Apps

Why Threat Modeling Still Matters
When Prevention Isn’t Possible
Threat Modeling for Detection Engineering
A Practical Threat Modeling Workflow
Using the Microsoft Threat Modeling Tool
Worked Example: Agentic AI Support Assistant
Closing Thoughts

Why Threat Modeling Still Matters

Threat modeling is just structured thinking about how a system can be abused, and at a practical level we’re trying to answer a few simple questions:

What are we protecting?
How can an attacker reach it?
What can go wrong at each step?
What controls do we have (or need)?
If we can’t prevent it cleanly, how do we detect and respond quickly?

For detection engineers this is huge, because instead of writing detections from random ideas (or only from previous incidents), we can build detections from known attack paths in our own environment and get better coverage with fewer “interesting but not relevant” detection rules.

When Prevention Isn’t Possible

“Prevent everything” sounds great on paper, but production systems are messy.

A few common reasons preventive controls fall short:

Business friction: hard blocks can break critical user workflows.
Legacy constraints: older systems can’t support modern controls quickly.
Third-party dependencies: we don’t fully control external services.
Operational tradeoffs: strict controls can create reliability pain.
Model uncertainty: LLM behavior is probabilistic, so hard prevention isn’t always deterministic.

This gets even more real in agentic AI systems - we should absolutely use guardrails, system prompts, policy checks, and least privilege, but teams still need to prepare for bypass attempts, indirect prompt injection, tool abuse, and data exfiltration patterns that slip past prevention. That’s where detection engineering acts as a compensating control.

Threat Modeling for Detection Engineering

A good model should create detection work, not just documentation.

One pattern that works well:

Detection engineering pipeline: Threat, Attack Path, Telemetry, Detection Theory, Detection Logic, Response Guidelines

If you can fill in each stage you have something actionable.

Example:

Threat: indirect prompt injection through untrusted retrieved content, such as malicious instructions embedded on a fetched website.
Attack path: attacker plants malicious instructions on a compromised website, agent ingests it, and then performs unintended sensitive actions.
Telemetry: HTTP request/fetch logs (URLs visited by the agent), retrieved content/payload logs, prompt and completion metadata, tool invocation audit logs, session identity and auth events.
Detection theory: using the telemetry we have, how can we detect the attack narrative at various stages of the attacker’s path?
Detection logic: alert when anomalous or malicious instruction-like content is retrieved and sensitive tool calls follow soon after.
Response: revoke credentials/tokens if needed, quarantine source, investigate timeline for RCA and blast radius.

Prioritization That Actually Helps

I like a simple framework that’s easy to follow:

Impact
Likelihood
Preventability gap
Detectability with current telemetry

High impact plus high likelihood plus low preventability is the sweet spot for incentive to get detections out ASAP.

A Practical Threat Modeling Workflow

Keep it lightweight and repeatable - a model you actually use beats a perfect model you never update.

1) Define scope and critical assets

Pick one clear workflow or feature boundary.

For AI apps, common assets include:

internal documents and user data
API keys and service credentials
tool/action permissions
system prompts and policy context
downstream systems (ticketing, email, CRM, code repos)

2) Diagram system and data flows

Map how data and actions move:

user input
retrieval (RAG/vector/search)
model runtime
tool calls
external integrations
logging/monitoring pipeline

This is where attack paths become obvious.

3) Identify trust boundaries

A trust boundary is any point in the system where the level of trust changes between two components. In other words, it’s where data or control crosses from one security context into another, like when an unauthenticated internet user’s input reaches your backend, or when your agent runtime calls a privileged tool that can read customer records. Every time data crosses a trust boundary, we should be asking: “what could go wrong if this input is malicious or this component is compromised?”

Trust boundaries are critical for detection engineering because they’re natural instrumentation points - if you can log and inspect traffic at each boundary crossing, you have visibility into where attacks transition from one stage to the next.

Typical boundaries for agentic systems include:

internet user to app backend (untrusted input enters the system)
app to retrieval/indexed content sources (content from external or shared sources could be poisoned)
app to LLM provider (prompts and completions cross an external API boundary)
agent runtime to privileged tools (the agent escalates from reasoning to action with real-world impact)
app to customer data stores (access to sensitive data that could be exfiltrated)

4) Enumerate threats with STRIDE (plus AI-specific abuse)

STRIDE still works well, and for AI systems we layer on additional abuse patterns:

STRIDE categories plus AI-specific threat categories: Direct Prompt Injection, Indirect Prompt Injection, Jailbreak Attempts, Context Poisoning, Tool Invocation Hijacking, Data Exfiltration via Output

5) Decide controls (preventive and detective)

For each threat:

what can we prevent now?
what can we partially or are very limited in preventing?
what must we detect and respond to?

Make detection outputs explicit and avoid leaving this as a vague “SOC will monitor it” note.

6) Prioritize and assign owners

Turn findings into backlog items with owners across:

app engineering
platform/security engineering
detection engineering
SOC/IR

Revisit the model when architecture changes, tools are added, or major incidents occur.

Using the Microsoft Threat Modeling Tool

The Microsoft Threat Modeling Tool is a great option because it’s free, structured, and fast to adopt. It gives teams a good out-of-the-box solution to start visualizing an application’s architecture.

Practical Setup Flow

Create a model and lock scope to one workflow.
Build the DFD (entities, processes, stores, flows).
Add trust boundaries.
Generate threats (the tool does this automatically based on your DFD) and triage the results, since not every generated threat will be relevant to your environment.
Tag each threat as Prevent, Detect, Respond, or Accept.
Export and convert high-priority findings into implementation tickets.

Keeping Signal High

model reality, not ideal architecture
split giant systems into smaller models
involve app owners in triage
prioritize high-impact abuse cases first

Worked Example: Agentic AI Support Assistant

Let’s use a simple scenario that mirrors a lot of real-world internal AI deployments.

Scenario

An internal support assistant has agentic capabilities:

employees chat through a web app
app uses an LLM and retrieval from internal KB docs
agent can call tools:
- search_kb
- create_ticket
- read_customer_record
- send_email_summary
app logs prompts, responses, tool calls, and auth events

Architecture diagram of the agentic AI support assistant showing data flows and trust boundaries

Threat Focus: Prompt Injection Flavors

Direct prompt injection

User attempts to override policy directly in chat input (for example, asking the model to ignore controls or expose hidden instructions).

Detection ideas:

detect jailbreak/bypass language patterns
alert on high-risk prompt score plus sensitive tool invocation
baseline repeated bypass attempts by user/session/source

Indirect prompt injection

Malicious instructions are embedded in retrieved documents and influence tool selection or output behavior.

Detection ideas:

scan retrieved chunks for instruction-like markers
correlate risky retrieval with privileged tool calls
alert when post-retrieval behavior deviates from baseline

Context poisoning / memory abuse

Attacker attempts to persist malicious context so that future responses and agent decisions drift from intended behavior.

Detection ideas:

monitor writes to memory/state with policy-aware rules
alert on abrupt shifts in agent decision patterns
version and diff memory/state artifacts for suspicious changes

Tool invocation hijacking

Prompt manipulation nudges the agent into technically valid but abusive tool usage, like bulk reading customer records under the guise of a summary request.

Detection ideas:

detect unusual tool chains (sequence + timing)
alert on role-to-action mismatches
detect argument volume anomalies for sensitive tools

Data exfiltration through model output

The model is coerced into disclosing sensitive context or data returned by tool calls in its response to the user.

Detection ideas:

run DLP-like checks on model output streams
alert on credential-like, token-like, or PII-heavy responses
correlate exfil indicators with earlier sensitive tool access

Example Detection Backlog

Detect risky retrieval-to-tool chain in a short time window. An attacker plants malicious instructions in a document or webpage that gets retrieved by the agent, and shortly after the agent invokes a sensitive tool it wouldn’t normally call in that context.
Detect bypass campaign behavior across users/sessions. A user (or multiple users) repeatedly attempts prompt injection or jailbreak variations in a short period, probing for a technique that slips past guardrails.
Detect anomalous privileged tool usage by role. A user’s session triggers tool calls that don’t match their role or typical usage pattern, like a help desk employee’s session bulk-reading customer records it normally never touches.
Detect sensitive data exfil indicators in model output. The model’s response contains credential-like strings, PII patterns, or unusually large volumes of sensitive data that suggest the prompt coerced it into leaking context or tool-returned information.
Detect suspicious memory poisoning followed by behavior drift. An attacker writes or modifies persistent agent memory/state in a way that causes future sessions to behave differently, such as silently skipping policy checks or routing actions to unintended targets.
Detect anomalous command execution on the underlying infrastructure. An attacker chains prompt injection into tool calls that lead to suspicious command execution on the underlying host or platform, indicating activity like persistence mechanisms or data exfiltration.

Tie Each Detection to Response

Detection without response is only half a control, so map each detection rule to concrete actions:

terminate suspicious sessions
revoke or rotate impacted credentials
temporarily disable high-risk tools (likely hard to do without a very high severity incident + CSO)

Closing Thoughts

Threat modeling for AI systems doesn’t need to be heavyweight to be useful, and done well it gives detection engineering a clear map of where attackers can push, where prevention is weak, and where monitoring needs to be strongest.

If you can’t fully prevent a path today, that doesn’t mean you’re stuck - it means we define the path, instrument it properly, and build response-ready detections that reduce blast radius.

Posted 01/19/2026

ThreatModeling

Threat Modeling AI Apps

Table of contents

Why Threat Modeling Still Matters

When Prevention Isn’t Possible

Threat Modeling for Detection Engineering

Prioritization That Actually Helps

A Practical Threat Modeling Workflow

1) Define scope and critical assets

2) Diagram system and data flows

3) Identify trust boundaries

4) Enumerate threats with STRIDE (plus AI-specific abuse)

5) Decide controls (preventive and detective)

6) Prioritize and assign owners

Using the Microsoft Threat Modeling Tool

Practical Setup Flow

Keeping Signal High

Worked Example: Agentic AI Support Assistant

Scenario

Threat Focus: Prompt Injection Flavors

Direct prompt injection

Indirect prompt injection

Context poisoning / memory abuse

Tool invocation hijacking

Data exfiltration through model output

Example Detection Backlog

Tie Each Detection to Response

Closing Thoughts