Model Denial of Service: Long-Context Resource Exhaustion

Cybersecurity for AI · 6 steps

Briefing

You are an SRE running the public cybersecurity LLM endpoint at Example Inference Inc. Your endpoint accepts up to 200,000-token prompts. The pricing is per-token.

An anonymous user has been submitting 195,000-token prompts at 8 requests per minute for the past 6 hours. Cost is $4,200 in the last 6 hours alone, latency for legitimate users went from 1.2s to 6.8s, and the GPU pool is saturated.

This scenario tests OWASP LLM04:2025 Model DoS, the design of input-size limits, rate limits per identity, and cost guardrails. Sources: OWASP LLM Top 10 (2025), AWS Well-Architected Reliability Pillar.

How Crucible mode works

One ordered pass through every step. No clock. Each answer scores against the canonical solution.

Hints reduce the points you can earn for that step. Free-text steps queue for manual review.

What you will practice

01Map abuse to OWASP LLM04
02Design per-identity rate limits and token-budget caps
03Distinguish economic DoS from availability DoS in the LLM context
04Set circuit-breaker triggers based on cost, not just latency

Back to Range