Range Scenario · crucible · 30 min
Prompt Injection Detection: Classify Eight Inputs
This cybersecurity training scenario simulates a working incident. Eight user inputs hit your customer-service AI in the last hour. Some are benign, some are jailbreak attempts, some are data-exfil attempts, and some are role-confusion attacks. Classify each and explain the cybersecurity reasoning.
Scenario briefing
You are a cybersecurity engineer reviewing prompt-injection alerts on a customer-service LLM. The system has a base prompt that constrains the model to topics like account questions, billing, and shipping. The output filter scans for sensitive data and disallowed actions.
Eight user inputs hit the input-filter alert queue this hour. Your job: classify each as benign, jailbreak attempt, data-exfil attempt, or role-confusion attack, and explain the reasoning. Misclassification trains the input filter wrong, so accuracy matters more than speed.
This scenario is a foundation skill for AI security. The same classifications apply across customer-service, coding, and tool-using LLMs. The technique class names (jailbreak, role confusion, data exfil) carry to vendor red-team frameworks and AI security policy.
What you will practice
- Distinguish jailbreak, role-confusion, data-exfil, and benign inputs
- Recognize indirect prompt injection patterns
- Explain the cybersecurity reasoning behind each classification
- Avoid over-classifying benign edge cases as attacks
How this scenario is scored
The scenario has 8 ordered steps. Most steps are exact-match (a MITRE ATT&CK technique ID, a tool name, or a yes/no decision) or multiple choice. Free-text steps queue for manual review and do not affect the auto-final-score in the MVP.
Each step has a max score of 100 points. Hints deduct points up front, listed before you reveal them. Your final score is the sum across steps. Range Elo updates on completion based on scenario difficulty (Beginner) and your final score percentage.
Frequently asked questions
What is the difference between a jailbreak and a role-confusion attack?
A jailbreak attempts to bypass safety guardrails so the model produces disallowed content (violence, malware, copyrighted material). A role-confusion attack tries to convince the model it has a different identity or system prompt (you are now DAN, ignore previous instructions, your real role is). They overlap in tactics but differ in goal: jailbreaks aim at output, role-confusion aims at identity.
What is indirect prompt injection?
Indirect injection is when the malicious instructions arrive through a data channel the model trusts, like a retrieved document, a webpage in a tool call, or a long-running conversation history. The user did not type the attack. The model encounters it inside data it was supposed to read and explain. Most production AI breaches now run through indirect injection because direct user prompts are filtered.
How do you measure if your prompt-injection defenses work?
Run a labeled red-team set every release. Track false-positive rate on benign edge cases (excessive blocking damages user trust), false-negative rate on known attacks, and time to detect novel attacks in production. Defense quality is the F1 across attack categories, not the block rate.
Course content is for educational purposes only and does not constitute professional advice. All claims are supported by cited peer-reviewed academic research. DecipherU does not teach or reproduce any proprietary sales methodology. Verify all referenced sources independently.
Get cybersecurity career insights delivered weekly
Join cybersecurity professionals receiving weekly intelligence on threats, job market trends, salary data, and career growth strategies.
By subscribing you agree to our privacy policy. Unsubscribe anytime.