You are an SRE running the public cybersecurity LLM endpoint at Example Inference Inc. Your endpoint accepts up to 200,000-token prompts. The pricing is per-token.
An anonymous user has been submitting 195,000-token prompts at 8 requests per minute for the past 6 hours. Cost is $4,200 in the last 6 hours alone, latency for legitimate users went from 1.2s to 6.8s, and the GPU pool is saturated.
This scenario tests OWASP LLM04:2025 Model DoS, the design of input-size limits, rate limits per identity, and cost guardrails. Sources: OWASP LLM Top 10 (2025), AWS Well-Architected Reliability Pillar.
One ordered pass through every step. No clock. Each answer scores against the canonical solution.
Hints reduce the points you can earn for that step. Free-text steps queue for manual review.