Range Scenario · crucible · 35 min
Model Theft: Extraction Attack via Public API
This cybersecurity training scenario simulates a working incident. A competitor is querying your cybersecurity classifier API at scale, paying for tokens. They are reconstructing a shadow model. Detect the attack, design the rate and pricing controls.
Scenario briefing
You are a cybersecurity engineer at Example AI Co. Your proprietary cybersecurity classifier (trained on years of in-house labeled data) is exposed via public API. The pricing is $1 per 1,000 classifications.
An anonymous customer has been calling the API at 60,000 classifications per day for 14 days, paying about $840 in API fees, against a model that cost millions to train. Their query distribution looks suspiciously like model-extraction probing: synthetic inputs designed to cover the input space rather than real-world data.
This scenario tests OWASP LLM10:2025 Model Theft, MITRE ATLAS technique AML.T0024 Exfiltration via ML Inference API, and the design of detection and defense. Sources: OWASP LLM Top 10 (2025), MITRE ATLAS, Tramèr et al. 2016 'Stealing Machine Learning Models via Prediction APIs'.
What you will practice
- Map model extraction to OWASP LLM10 and ATLAS AML.T0024
- Detect extraction-pattern queries against benign customer queries
- Apply rate limiting, output noise, and watermarking
- Decide between economic deterrence and active blocking
How this scenario is scored
The scenario has 6 ordered steps. Most steps are exact-match (a MITRE ATT&CK technique ID, a tool name, or a yes/no decision) or multiple choice. Free-text steps queue for manual review and do not affect the auto-final-score in the MVP.
Each step has a max score of 100 points. Hints deduct points up front, listed before you reveal them. Your final score is the sum across steps. Range Elo updates on completion based on scenario difficulty (Advanced) and your final score percentage.
Frequently asked questions
What is a model extraction attack?
An attacker queries a deployed model via API to reconstruct a shadow model that mimics it. The attacker pays for inference but ends up with a near-equivalent model they did not train. Tramèr et al. 2016 demonstrated the attack on commercial ML APIs and showed it works against decision trees, logistic regression, and neural networks.
What is MITRE ATLAS AML.T0024?
AML.T0024 is Exfiltration via ML Inference API in MITRE ATLAS. It covers attackers using legitimate API calls to extract proprietary model parameters or behavior. The technique sits in the Exfiltration tactic for adversarial ML and pairs with mitigations like rate limiting, query monitoring, and output perturbation.
What is output watermarking?
Output watermarking adds a small, statistically detectable signature to the API output. If a competitor's shadow model later produces outputs with the same watermark, you have evidence that they trained on extracted data. Watermarking does not prevent extraction; it provides legal evidence and creates a deterrent.
Course content is for educational purposes only and does not constitute professional advice. All claims are supported by cited peer-reviewed academic research. DecipherU does not teach or reproduce any proprietary sales methodology. Verify all referenced sources independently.
Get cybersecurity career insights delivered weekly
Join cybersecurity professionals receiving weekly intelligence on threats, job market trends, salary data, and career growth strategies.
By subscribing you agree to our privacy policy. Unsubscribe anytime.