You are an AI cybersecurity engineer at Example AI Lab. The team is fine-tuning an open-weight model on a 4 million-document cybersecurity corpus. 70 percent comes from public web scrape, 25 percent from a paid threat-intel feed, 5 percent from internal incident write-ups.
The vendor feed had a contract-renewal dispute last quarter and access was briefly given to a third-party data broker. You are asked to design the poisoning-detection pipeline before the next release.
This scenario tests OWASP LLM03:2025 Training Data Poisoning, the MITRE ATLAS technique AML.T0010 Poisoning Training Data, and practical detection methods. Sources: OWASP LLM Top 10 (2025), MITRE ATLAS, Carlini et al. 2024 'Poisoning Web-Scale Training Datasets is Practical'.
One ordered pass through every step. No clock. Each answer scores against the canonical solution.
Hints reduce the points you can earn for that step. Free-text steps queue for manual review.