Adversarial Examples: Image Classifier Under Attack

Cybersecurity for AI · 6 steps

Briefing

You are an AI cybersecurity engineer at Example Defense Tech. Your image classifier reviews photos uploaded to the platform for safety violations. A red-team report shows adversarially perturbed images (epsilon-bounded L-infinity perturbations) bypass the classifier with 78 percent attack success rate.

Design the defense stack and the evaluation gate that catches regressions. The model team has access to adversarial training, randomized smoothing, and input preprocessing.

This scenario tests MITRE ATLAS AML.T0015 Evade ML Model, the Goodfellow et al. 2014 'Explaining and Harnessing Adversarial Examples' baseline, and the practical defense stack. Sources: Goodfellow et al. 2014, Madry et al. 2018 'Towards Deep Learning Models Resistant to Adversarial Attacks', MITRE ATLAS, Carlini & Wagner 2017.

How Crucible mode works

One ordered pass through every step. No clock. Each answer scores against the canonical solution.

Hints reduce the points you can earn for that step. Free-text steps queue for manual review.

What you will practice

01Distinguish FGSM, PGD, and C&W attacks
02Apply adversarial training, randomized smoothing, and input preprocessing
03Set evaluation thresholds that include adversarial robustness
04Frame the defense as raising attacker cost, not eliminating risk

Back to Range