Opening hook
AI security engineering is a discipline that did not exist as a defined role in 2020 and is now a hiring priority at every frontier lab and most large companies deploying AI. The work spans threat modeling AI systems, building defenses against adversarial attacks, securing agents that take actions in the world, and integrating AI safety practices into product engineering. This lesson maps the landscape so the rest of the course is anchored to a clear picture of the field.
Core teaching
The first principle: AI security engineering is a synthesis discipline. It draws from traditional application security, ML systems engineering, adversarial ML research, and AI safety. None of those parent disciplines is sufficient on its own. A traditional appsec engineer who has not engaged with adversarial ML cannot threat-model a model extraction attack. An ML researcher who has not done production security cannot ship an input validation pipeline. The role exists because the synthesis is rare and increasingly necessary.
The second principle: the landscape is shaped by frameworks and practitioner literature. The reference frameworks include OWASP Top 10 for LLM Applications, OWASP ML Security Top 10, MITRE ATLAS for adversarial threat landscape, NIST AI RMF for risk management, and the responsible scaling policies published by frontier labs (OWASP, ongoing; MITRE ATLAS, ongoing; NIST AI RMF, 2023; Anthropic, Responsible Scaling Policy, 2023). The practitioner literature includes Simon Willison's prompt injection writing, Riley Goodside's adversarial prompt research, Apollo Research's deception studies, and the published red-team work from frontier labs. The field is young enough that staying current with primary sources is feasible and necessary.
The third principle: there are five major categories of AI security work. First, threat modeling AI systems: applying STRIDE and similar frameworks to AI architectures, identifying failure modes, documenting trust boundaries. Second, building defenses against attacks: prompt injection mitigation, jailbreak resistance, adversarial example defenses, model extraction protection. Third, securing AI infrastructure: input validation pipelines, output validation, content filtering, audit logging, sandboxing for code-executing agents. Fourth, red teaming: structured adversarial testing of AI systems. Fifth, AI safety mechanisms: refusal training validation, capability evaluations, alignment auditing.
The fourth principle: the role differs at frontier labs and enterprises. Frontier labs (Anthropic, OpenAI, Google DeepMind, others) have large security teams that work directly on the models and the safety infrastructure around them. The work involves reading research, contributing to safety papers, building red-team automation, and shipping the security stack that ships with the model. Enterprises (large companies deploying AI) have smaller AI security teams that focus on application-level defenses, vendor risk assessment, AI usage policy, and detection of AI-related threats. Both are real roles. The skill mixes and day-to-day work differ. Lesson 1.3 covers the contrast in detail.
The fifth principle: AI security is engineering responsibility, not just policy. Many organizations treat AI security as a governance and compliance topic with a small policy team. This is insufficient. Real defenses against prompt injection, jailbreaks, model extraction, and adversarial examples are engineering systems: input validation pipelines, monitoring infrastructure, evaluation suites, runtime defenses. The team that ships these is an engineering team, often security engineering specifically. Policy without engineering is theater.
The sixth principle: the threat landscape is evolving. New attack patterns emerge regularly. In 2024-2025 the field saw indirect prompt injection mature as a threat against AI assistants reading external content, agent misuse patterns develop as code-executing agents proliferated, and jailbreak techniques against frontier models continue evolving despite safety training. The AI security engineer in 2026 is reading new findings monthly and updating defenses accordingly. Static security postures fail.
The seventh principle: red teaming is foundational. The strongest AI security teams have a continuous red-teaming function that probes deployed AI systems for failure modes. The probes are documented, mitigations are deployed, and the test suite grows. Apollo Research, Anthropic's Frontier Red Team, OpenAI's red-teaming program, and several specialized vendors (HiddenLayer, Robust Intelligence, Lakera) have published meaningful work in this space. The discipline of structured red teaming for AI systems is one of the highest-impact capabilities the field has produced.
The eighth principle: AI safety and AI security overlap. Safety (the model behaves as intended without harmful outputs) and security (the system resists adversarial use) are distinct concepts but the work overlaps in production. Refusal training that makes a model decline harmful instructions is a safety property; circumventing it is a security failure. Constitutional AI principles inform both. The mature AI security engineer has working knowledge of safety techniques and contributes to the safety-security boundary engineering.
The ninth principle: the regulatory surface is real. The EU AI Act applies obligations to high-risk AI systems. NIST AI RMF gives a non-mandatory but increasingly expected framework for AI risk management (NIST AI RMF, 2023). Sectoral regulations in healthcare, finance, and employment add specific obligations. The AI security engineer is not the lead on regulatory work but is a key contributor: the technical implementations that make compliance demonstrable are engineering deliverables. Course 6 covers the governance side in depth.
The tenth principle: the career landscape is favorable for the people doing the work. Demand outstrips supply. Compensation at frontier labs has been competitive with senior software engineering, with significant equity. Enterprise AI security engineering compensation has been at typical security engineering levels with upward pressure. Verify with primary sources. The field is hiring hard.
AI-specific application
For the security engineer transitioning into AI security in 2026, three operational realities matter.
Reality one: technical depth is required. Hand-waving knowledge of "AI security" without working understanding of how models behave, what tokenization is, why prompts produce specific outputs, and how attacks work mechanically does not get hired at frontier labs and increasingly does not get hired at enterprises either. Course 2 (AI Engineering Mastery) builds the depth from first principles. Engineers transitioning into AI security should build that foundation in parallel with the security-specific material.
Reality two: published work matters. Practitioners with public writing, conference talks, or open-source contributions in AI security have stronger positioning than those without. The field rewards visible work because it is small and reputation-dense. Building public artifacts (a blog post analyzing a published attack, a reproduction of a defense technique, contributions to OWASP LLM Top 10) compounds.
Reality three: read the primary sources. The OWASP LLM Top 10 document, the MITRE ATLAS framework, the responsible scaling policies from frontier labs, and the published research from Anthropic, OpenAI, and Google DeepMind safety teams are the canon. AI security engineers read this material directly, not summaries. The investment of reading the source material once pays off across many engagements.
Practice exercises
Read OWASP LLM Top 10 (2025). All ten entries. Take notes on which you have seen in practice and which are unfamiliar. Identify three you want to deepen knowledge in.
Map MITRE ATLAS to one AI system you know. Pick a public AI product (ChatGPT, Claude, Cursor, GitHub Copilot, or your own product). Walk through ATLAS tactics and identify which apply, with one-sentence justification per applicable tactic.
Identify a primary source you should read. Pick one published paper or report in AI security from a frontier lab or recognized practitioner (Anthropic, OpenAI Safety, Google DeepMind, Apollo Research, or Simon Willison). Read it. Write a 200-word summary in your own words.
Knowledge check
Question 1. Why is AI security engineering a synthesis discipline? a) It is not b) Because it draws from traditional appsec, ML systems engineering, adversarial ML research, and AI safety, with no parent discipline sufficient on its own [correct] c) Because regulators require synthesis d) Because vendors require synthesis
Question 2. What are the major reference frameworks for AI security? a) None exist b) OWASP LLM Top 10, OWASP ML Security Top 10, MITRE ATLAS, NIST AI RMF, and the responsible scaling policies from frontier labs [correct] c) Only OWASP d) Only NIST
Question 3. What are the five major categories of AI security work? a) Compliance, audit, training, reporting, vendor management b) Threat modeling, building defenses, securing infrastructure, red teaming, AI safety mechanisms [correct] c) Hardware, software, network, cloud, endpoint d) Identify, protect, detect, respond, recover
Question 4. What is the difference between AI security at a frontier lab and at an enterprise? a) No difference b) Frontier labs work directly on models and the safety stack; enterprises focus on application-level defenses, vendor risk, usage policy, and AI threat detection [correct] c) Frontier labs do less work d) Enterprises do more research
Question 5. Why is "AI security is engineering responsibility, not just policy" the correct framing? a) It is not b) Because real defenses against prompt injection, jailbreaks, and adversarial attacks are engineering systems requiring input validation, monitoring, evaluation, and runtime defenses, not just written policies [correct] c) Because regulators say so d) Because policy is unimportant
Question 6. Why is red teaming foundational to AI security? a) Tradition b) Because structured adversarial probing surfaces failure modes that safety training and static design reviews miss, and the discipline is one of the highest-impact capabilities the field has produced [correct] c) Vendors require it d) Auditors require it
Question 7. Why does primary source reading matter in this field? a) Vanity b) Because the field is young, the canon is finite, and the practitioners doing the most influential work read the sources directly rather than relying on summaries [correct] c) Because secondary sources are illegal d) Because regulators require it
Slide deck outline
- Title slide: "Lesson 1.1, The AI security engineering landscape"
- Hook: a discipline that did not exist five years ago
- AI security engineering as a synthesis discipline
- The parent disciplines: appsec, ML systems, adversarial ML, AI safety
- Reference frameworks overview
- OWASP LLM Top 10 (2025)
- OWASP ML Security Top 10
- MITRE ATLAS
- NIST AI RMF
- Frontier lab responsible scaling policies
- The five categories of AI security work
- Threat modeling AI systems
- Building defenses against attacks
- Securing AI infrastructure
- Red teaming
- AI safety mechanisms
- Frontier lab vs enterprise role contrast
- AI security as engineering responsibility
- The evolving threat landscape
- Career landscape and compensation
- Citations: OWASP, MITRE, NIST, Anthropic
- Practice exercises summary
- Transition to Lesson 1.2
Reference reading
- OWASP Top 10 for LLM Applications (2025): https://owasp.org/www-project-top-10-for-large-language-model-applications/
- OWASP ML Security Top 10: https://owasp.org/www-project-machine-learning-security-top-10/
- MITRE ATLAS: https://atlas.mitre.org/
- NIST AI Risk Management Framework: https://www.nist.gov/itl/ai-risk-management-framework
- Anthropic Responsible Scaling Policy: https://www.anthropic.com/news/anthropics-responsible-scaling-policy
- Simon Willison on prompt injection: https://simonwillison.net/tags/prompt-injection/
Transition
You see the field. The next lesson zooms in on what specifically is different about securing AI systems compared to traditional software. Lesson 1.2 covers the unique properties of AI systems that produce new failure modes.