Range Scenario · crucible · 40 min
Training Data Poisoning: Detect and Mitigate
This cybersecurity training scenario simulates a working incident. A cybersecurity fine-tune corpus came from public scraped data plus a vendor feed. Detect poisoning attempts, design the mitigation, set the model-release gate.
Scenario briefing
You are an AI cybersecurity engineer at Example AI Lab. The team is fine-tuning an open-weight model on a 4 million-document cybersecurity corpus. 70 percent comes from public web scrape, 25 percent from a paid threat-intel feed, 5 percent from internal incident write-ups.
The vendor feed had a contract-renewal dispute last quarter and access was briefly given to a third-party data broker. You are asked to design the poisoning-detection pipeline before the next release.
This scenario tests OWASP LLM03:2025 Training Data Poisoning, the MITRE ATLAS technique AML.T0010 Poisoning Training Data, and practical detection methods. Sources: OWASP LLM Top 10 (2025), MITRE ATLAS, Carlini et al. 2024 'Poisoning Web-Scale Training Datasets is Practical'.
What you will practice
- Map training data poisoning to OWASP LLM03 and ATLAS AML.T0010
- Design provenance, deduplication, and adversarial-eval gates
- Choose a poisoning-detection method appropriate to corpus scale
- Set release-gate criteria that block poisoned models
How this scenario is scored
The scenario has 6 ordered steps. Most steps are exact-match (a MITRE ATT&CK technique ID, a tool name, or a yes/no decision) or multiple choice. Free-text steps queue for manual review and do not affect the auto-final-score in the MVP.
Each step has a max score of 100 points. Hints deduct points up front, listed before you reveal them. Your final score is the sum across steps. Range Elo updates on completion based on scenario difficulty (Advanced) and your final score percentage.
Frequently asked questions
What is a poisoning attack on training data?
Poisoning is the insertion of carefully chosen documents into the training set to bias model behavior at deployment. Attacks include backdoor triggers (specific phrases that cause the model to output attacker-chosen text), data inversion (planting false facts about the attacker), and refusal poisoning (making the model refuse legitimate queries about the attacker).
What does Carlini et al. 2024 demonstrate?
Their paper 'Poisoning Web-Scale Training Datasets is Practical' shows that an attacker with modest resources can purchase expired domains hosting documents in widely used training corpora and replace the content. The attack is cheap, scalable, and currently undetected by most pipelines. Defense requires content-hash provenance and adversarial sampling.
What does MITRE ATLAS AML.T0010 cover?
AML.T0010 in MITRE ATLAS is Poison Training Data, sitting in the Initial Access tactic for adversarial ML. It covers any modification of training data with adversarial intent. ATLAS pairs each technique with mitigations and case studies, creating an adversarial-ML analog to ATT&CK.
Course content is for educational purposes only and does not constitute professional advice. All claims are supported by cited peer-reviewed academic research. DecipherU does not teach or reproduce any proprietary sales methodology. Verify all referenced sources independently.
Get cybersecurity career insights delivered weekly
Join cybersecurity professionals receiving weekly intelligence on threats, job market trends, salary data, and career growth strategies.
By subscribing you agree to our privacy policy. Unsubscribe anytime.