Training Data Poisoning: Detect and Mitigate

Cybersecurity for AI · 6 steps

Briefing

You are an AI cybersecurity engineer at Example AI Lab. The team is fine-tuning an open-weight model on a 4 million-document cybersecurity corpus. 70 percent comes from public web scrape, 25 percent from a paid threat-intel feed, 5 percent from internal incident write-ups.

The vendor feed had a contract-renewal dispute last quarter and access was briefly given to a third-party data broker. You are asked to design the poisoning-detection pipeline before the next release.

This scenario tests OWASP LLM03:2025 Training Data Poisoning, the MITRE ATLAS technique AML.T0010 Poisoning Training Data, and practical detection methods. Sources: OWASP LLM Top 10 (2025), MITRE ATLAS, Carlini et al. 2024 'Poisoning Web-Scale Training Datasets is Practical'.

How Crucible mode works

One ordered pass through every step. No clock. Each answer scores against the canonical solution.

Hints reduce the points you can earn for that step. Free-text steps queue for manual review.

What you will practice

01Map training data poisoning to OWASP LLM03 and ATLAS AML.T0010
02Design provenance, deduplication, and adversarial-eval gates
03Choose a poisoning-detection method appropriate to corpus scale
04Set release-gate criteria that block poisoned models

Back to Range