Model Theft: Extraction Attack via Public API

Cybersecurity for AI · 6 steps

Briefing

You are a cybersecurity engineer at Example AI Co. Your proprietary cybersecurity classifier (trained on years of in-house labeled data) is exposed via public API. The pricing is $1 per 1,000 classifications.

An anonymous customer has been calling the API at 60,000 classifications per day for 14 days, paying about $840 in API fees, against a model that cost millions to train. Their query distribution looks suspiciously like model-extraction probing: synthetic inputs designed to cover the input space rather than real-world data.

This scenario tests OWASP LLM10:2025 Model Theft, MITRE ATLAS technique AML.T0024 Exfiltration via ML Inference API, and the design of detection and defense. Sources: OWASP LLM Top 10 (2025), MITRE ATLAS, Tramèr et al. 2016 'Stealing Machine Learning Models via Prediction APIs'.

How Crucible mode works

One ordered pass through every step. No clock. Each answer scores against the canonical solution.

Hints reduce the points you can earn for that step. Free-text steps queue for manual review.

What you will practice

01Map model extraction to OWASP LLM10 and ATLAS AML.T0024
02Detect extraction-pattern queries against benign customer queries
03Apply rate limiting, output noise, and watermarking
04Decide between economic deterrence and active blocking

Back to Range