You are an AppSec engineer reviewing a cybersecurity AI assistant for a 4,000-employee firm. The assistant calls tools: run_sql(query), read_file(path), send_slack(channel, text), send_email(to, subject, body). The tools execute server-side after the LLM emits the call.
Review each tool definition. Some are well-designed; others are broken-by-design (parameters too permissive, no scoping to the calling user, missing confirmation gates). Fix what is broken, document the pattern for future tools.
This scenario tests OWASP LLM07:2025 Insecure Plugin Design (renamed in 2025 to System Prompt Leakage in some treatments; tool-design coverage continues across LLM07 and LLM08). Sources: OWASP LLM Top 10 (2025), NIST AI RMF GenAI Profile.
One ordered pass through every step. No clock. Each answer scores against the canonical solution.
Hints reduce the points you can earn for that step. Free-text steps queue for manual review.