Free sample lesson: Lesson 1.2, What's different about securing AI systems

Opening hook

A lot of AI security looks like traditional security. Authentication, authorization, network segmentation, encryption at rest and in transit, audit logging, secrets management. All of it still applies. What is genuinely different is a smaller set of properties unique to AI systems that produce failure modes traditional security was not designed to address. This lesson is about those differences specifically. The traditional security work is assumed; the new work is the focus.

Core teaching

The first principle: AI systems do not have a clean instruction-data boundary. In traditional software, code is code and data is data, and the runtime knows the difference. In an AI system, instructions and data are both natural language tokens flowing through the same context window. A prompt injection attack works because the model cannot reliably distinguish "this is a system instruction the developer wrote" from "this is text in a document the user uploaded that asks me to ignore previous instructions." Simon Willison has written extensively about why this is fundamentally hard (Willison, on prompt injection, ongoing). This single property is the source of the largest class of AI-specific vulnerabilities.

The second principle: model outputs are non-deterministic. Same input, different output. This breaks the traditional security verification model. A traditional vulnerability scan finds an issue, you patch, you run the scan again, you confirm the fix. An AI-system patch (changed system prompt, deployed input filter, swapped to a fine-tuned model) requires statistical evaluation: does the fix reduce the failure rate, and does it introduce regressions elsewhere? Module 7 of Course 2 covers eval-driven development. The same discipline applies to security work.

The third principle: training data is part of the attack surface. Traditional software has a code attack surface (vulnerabilities in code) and a data attack surface (data injection, SQL injection). AI systems add a training data attack surface. Data poisoning, where malicious examples in training data alter model behavior, is a real attack class. Backdoor attacks, where the model behaves normally except on a specific trigger, have been demonstrated in research. The supply chain for training data (where it came from, how it was filtered) is a security concern that traditional software does not have in the same way (MITRE ATLAS, ongoing).

The fourth principle: the model is itself an attack surface. Adversarial examples, demonstrated by Goodfellow et al. in 2014 for image classifiers, generalize to language models in different forms (Goodfellow et al., Explaining and Harnessing Adversarial Examples, 2014). Adversarial inputs to models can produce specific malicious outputs while looking benign. Model extraction attacks attempt to reconstruct a model's behavior or weights through query patterns. Membership inference attacks attempt to determine whether specific data was in the training set. Each of these is a class of attack that does not have a direct analog in traditional software.

The fifth principle: agents change the threat model. An LLM that only generates text has a contained blast radius: bad outputs are bad text. An agent that can call APIs, execute code, browse the web, and modify systems has a blast radius that scales with the tools it has access to. The mature AI security engineer thinks about agents the way a traditional security engineer thinks about privileged automation: capability boundaries, audit logging, kill switches, monitoring. But the agent's "decision making" is a probabilistic model that can be steered through prompt injection or unexpected inputs. Module 9 of this course covers securing agents in depth.

The sixth principle: defense in depth has new layers. Traditional defense in depth includes network, host, application, identity, and data layers. AI systems add new layers: input validation specific to AI (filtering for injection patterns), output validation (checking model output for harmful content or instruction leakage), model behavior monitoring (detecting drift or jailbreak patterns), and content provenance (citation, grounding, watermarking). A mature AI application has all of these layers, with each one designed to catch the attacks the others miss. OWASP LLM Top 10 covers the practical implementation guidance for many of these layers (OWASP, 2025).

The seventh principle: the model vendor is part of your security posture. When you build on Anthropic Claude, OpenAI GPT-4, Google Gemini, or any hosted model, the vendor's security and safety work is part of your stack. Their refusal training catches some attacks before they reach you. Their content filters catch some outputs before they leave you. Their published vulnerability research informs your defenses. Vendor model updates change your application's security profile. This is unlike most other vendor relationships in security: the security-relevant changes happen continuously, not just at version bumps. Lesson 1.3 covers the frontier-lab-vs-enterprise role contrast, which is partly about whether you are building the safety stack or relying on it.

The eighth principle: hallucination is a security concern. A model that confidently generates wrong information is a vulnerability when that information drives decisions. A code-generation model that hallucinates a function signature can introduce bugs. A medical-information model that hallucinates a drug interaction can cause harm. A legal-research model that hallucinates a case citation can mislead. The AI security engineer treats hallucination as a class of failure to defend against, not just a quality issue. Grounding, citation, retrieval, and confidence calibration are mitigations.

The ninth principle: the supply chain extends to model weights. Traditional software supply chain security covers source code, dependencies, build systems, and artifacts. AI extends it to training data sources, fine-tuning datasets, and model weights themselves. A model weight artifact downloaded from a public model hub may contain a backdoor. The hub itself may have been compromised. Verifying model artifacts (cryptographic signatures, reproducible training where feasible) is a practice the field is still developing.

The tenth principle: the threat actor is sometimes the user. In traditional software, the user is usually the one being protected. In AI systems, the user is sometimes the attacker, especially in consumer products: jailbreak attempts to get the model to produce harmful content, prompt injection attempts to extract system prompts or training data, abuse patterns that violate terms of service. The security model has to handle adversarial users without breaking the experience for legitimate ones. This is a familiar tension from anti-abuse work but takes specific forms in AI products.

AI-specific application

For the security engineer building defenses against AI-specific attacks in 2026, three operational priorities matter most.

Priority one: prompt injection defense as default. Any AI system that processes external content (documents, web pages, emails, user-provided text) has prompt injection exposure. The defense is layered: input filtering, separation of instructions and data through structural prompts, system prompt hardening, output validation. None of these alone is sufficient. The combination raises the cost of attack meaningfully. Module 3 of this course goes deep on prompt injection.

Priority two: agent capability boundaries. Any agent that takes actions has a capability boundary that defines what it can and cannot do. The boundary is enforced at the tool layer, not just the prompt layer. An agent that has access to read-only file operations cannot delete files even if a prompt convinces it to try. This is enforcement, not request. The traditional principle of least privilege applies; the implementation looks different in agent systems.

Priority three: red teaming as a continuous practice. Red team exercises are not a quarterly checkpoint. They are an ongoing function that probes deployed systems for failure modes, documents findings, and grows the test suite. Apollo Research, Anthropic Frontier Red Team, and the broader AI red-team community publish material that practitioners can study. Module 10 of this course covers AI red teaming as a discipline.

Practice exercises

Map an AI system to the OWASP LLM Top 10. Pick an AI system you work on or know well. For each of the ten entries, identify whether the system has exposure and what mitigations exist. Note gaps.
Trace a prompt injection through one application. Pick an application that processes external content. Sketch the data flow from external content to model context. Identify every point where injection could occur and what would catch it.
Inventory an agent's capability boundary. Pick an agent (Claude with computer use, GitHub Copilot, Cursor, or another). List the tools it has access to. For each, note the worst-case action it could take if prompt-injected. Identify which tools enforce boundaries vs which rely on prompt-level constraints.

Knowledge check

Question 1. What is the deepest reason prompt injection is hard to defend against? a) Models are slow b) AI systems do not have a clean instruction-data boundary; both flow through the context window as natural language [correct] c) Models are too small d) Prompts are too long
Question 2. Why is statistical evaluation required for AI security fixes? a) Tradition b) Because model outputs are non-deterministic, so fix verification requires measuring failure rate reduction and checking for regressions, not point-checking [correct] c) Because vendors require it d) Because regulators require it
Question 3. What is the AI-specific addition to the supply chain attack surface? a) None b) Training data sources, fine-tuning datasets, and model weights, including artifacts downloaded from public model hubs [correct] c) Hardware d) Cloud regions
Question 4. Why does the agent threat model differ from the standalone language model threat model? a) Agents are smaller b) Agents take actions through tools, so the blast radius scales with tool access, requiring capability boundaries enforced at the tool layer [correct] c) Agents are deterministic d) Agents do not use models
Question 5. What new layers does defense in depth gain in AI applications? a) None b) AI-specific input validation, output validation, model behavior monitoring, content provenance [correct] c) Hardware encryption d) Network segmentation
Question 6. Why is hallucination a security concern, not just a quality issue? a) It is not b) Confidently wrong outputs that drive decisions create real harms (bug introduction, medical misadvice, legal misdirection), so they are a failure class to defend against [correct] c) Hallucinations cost money d) Hallucinations confuse vendors
Question 7. Why is the user sometimes the threat actor in AI systems? a) Users are malicious b) Especially in consumer products, jailbreaks, prompt injection, and abuse patterns come from users; the security model has to handle adversarial users without breaking the experience for legitimate ones [correct] c) Users have access to source code d) Users own the model

Slide deck outline

Title slide: "Lesson 1.2, What's different about securing AI systems"
Hook: traditional security still applies; this lesson covers what is new
The instruction-data boundary problem (Willison)
Why prompt injection is fundamentally hard
Non-deterministic outputs and statistical evaluation
Training data as attack surface
Data poisoning and backdoor attacks (MITRE ATLAS)
Adversarial examples (Goodfellow et al., 2014)
Model extraction and membership inference
Agents change the threat model
Capability boundaries at the tool layer
New layers in defense in depth
The model vendor as part of the security stack
Vendor updates as continuous changes to security posture
Hallucination as a security concern
Supply chain for model weights and training data
Adversarial users in consumer products
The OWASP LLM Top 10 mapping exercise
Practical priorities: injection, boundaries, red teaming
Common AI security mistakes
Citations: OWASP, MITRE, Willison, Goodfellow
Practice exercises summary
Transition to Lesson 1.3

Reference reading

OWASP Top 10 for LLM Applications (2025): https://owasp.org/www-project-top-10-for-large-language-model-applications/
Willison, S., prompt injection writing: https://simonwillison.net/tags/prompt-injection/
Goodfellow, I., et al., Explaining and Harnessing Adversarial Examples, 2014: https://arxiv.org/abs/1412.6572
MITRE ATLAS: https://atlas.mitre.org/

Transition

The differences are clear. Where the differences play out depends heavily on whether you are working at a frontier lab or in an enterprise. Lesson 1.3 contrasts the two roles in detail.

Opening hook

Core teaching

AI-specific application

For the security engineer building defenses against AI-specific attacks in 2026, three operational priorities matter most.

Practice exercises

Map an AI system to the OWASP LLM Top 10. Pick an AI system you work on or know well. For each of the ten entries, identify whether the system has exposure and what mitigations exist. Note gaps.
Trace a prompt injection through one application. Pick an application that processes external content. Sketch the data flow from external content to model context. Identify every point where injection could occur and what would catch it.
Inventory an agent's capability boundary. Pick an agent (Claude with computer use, GitHub Copilot, Cursor, or another). List the tools it has access to. For each, note the worst-case action it could take if prompt-injected. Identify which tools enforce boundaries vs which rely on prompt-level constraints.

Knowledge check

Question 1. What is the deepest reason prompt injection is hard to defend against? a) Models are slow b) AI systems do not have a clean instruction-data boundary; both flow through the context window as natural language [correct] c) Models are too small d) Prompts are too long
Question 2. Why is statistical evaluation required for AI security fixes? a) Tradition b) Because model outputs are non-deterministic, so fix verification requires measuring failure rate reduction and checking for regressions, not point-checking [correct] c) Because vendors require it d) Because regulators require it
Question 3. What is the AI-specific addition to the supply chain attack surface? a) None b) Training data sources, fine-tuning datasets, and model weights, including artifacts downloaded from public model hubs [correct] c) Hardware d) Cloud regions
Question 4. Why does the agent threat model differ from the standalone language model threat model? a) Agents are smaller b) Agents take actions through tools, so the blast radius scales with tool access, requiring capability boundaries enforced at the tool layer [correct] c) Agents are deterministic d) Agents do not use models
Question 5. What new layers does defense in depth gain in AI applications? a) None b) AI-specific input validation, output validation, model behavior monitoring, content provenance [correct] c) Hardware encryption d) Network segmentation
Question 6. Why is hallucination a security concern, not just a quality issue? a) It is not b) Confidently wrong outputs that drive decisions create real harms (bug introduction, medical misadvice, legal misdirection), so they are a failure class to defend against [correct] c) Hallucinations cost money d) Hallucinations confuse vendors
Question 7. Why is the user sometimes the threat actor in AI systems? a) Users are malicious b) Especially in consumer products, jailbreaks, prompt injection, and abuse patterns come from users; the security model has to handle adversarial users without breaking the experience for legitimate ones [correct] c) Users have access to source code d) Users own the model

Slide deck outline

Title slide: "Lesson 1.2, What's different about securing AI systems"
Hook: traditional security still applies; this lesson covers what is new
The instruction-data boundary problem (Willison)
Why prompt injection is fundamentally hard
Non-deterministic outputs and statistical evaluation
Training data as attack surface
Data poisoning and backdoor attacks (MITRE ATLAS)
Adversarial examples (Goodfellow et al., 2014)
Model extraction and membership inference
Agents change the threat model
Capability boundaries at the tool layer
New layers in defense in depth
The model vendor as part of the security stack
Vendor updates as continuous changes to security posture
Hallucination as a security concern
Supply chain for model weights and training data
Adversarial users in consumer products
The OWASP LLM Top 10 mapping exercise
Practical priorities: injection, boundaries, red teaming
Common AI security mistakes
Citations: OWASP, MITRE, Willison, Goodfellow
Practice exercises summary
Transition to Lesson 1.3

Reference reading

OWASP Top 10 for LLM Applications (2025): https://owasp.org/www-project-top-10-for-large-language-model-applications/
Willison, S., prompt injection writing: https://simonwillison.net/tags/prompt-injection/
Goodfellow, I., et al., Explaining and Harnessing Adversarial Examples, 2014: https://arxiv.org/abs/1412.6572
MITRE ATLAS: https://atlas.mitre.org/

Transition

The differences are clear. Where the differences play out depends heavily on whether you are working at a frontier lab or in an enterprise. Lesson 1.3 contrasts the two roles in detail.

Lesson 1.2, What's different about securing AI systems

Opening hook

Core teaching

AI-specific application

Practice exercises

Knowledge check

Slide deck outline

Reference reading

Transition

That was one lesson. The course has 74.

Get cybersecurity career insights delivered weekly

Lesson 1.2, What's different about securing AI systems

Opening hook

Core teaching

AI-specific application

Practice exercises

Knowledge check

Slide deck outline

Reference reading

Transition

That was one lesson. The course has 74.

Get cybersecurity career insights delivered weekly