You are an AI cybersecurity engineer at Example Defense Tech. Your image classifier reviews photos uploaded to the platform for safety violations. A red-team report shows adversarially perturbed images (epsilon-bounded L-infinity perturbations) bypass the classifier with 78 percent attack success rate.
Design the defense stack and the evaluation gate that catches regressions. The model team has access to adversarial training, randomized smoothing, and input preprocessing.
This scenario tests MITRE ATLAS AML.T0015 Evade ML Model, the Goodfellow et al. 2014 'Explaining and Harnessing Adversarial Examples' baseline, and the practical defense stack. Sources: Goodfellow et al. 2014, Madry et al. 2018 'Towards Deep Learning Models Resistant to Adversarial Attacks', MITRE ATLAS, Carlini & Wagner 2017.
One ordered pass through every step. No clock. Each answer scores against the canonical solution.
Hints reduce the points you can earn for that step. Free-text steps queue for manual review.