You are a Tier 1 cybersecurity SOC analyst. Your team rolled out an LLM-based copilot last quarter. The copilot reads alert payloads, drafts a one-paragraph summary, and recommends a priority. Your job is to verify the copilot output before it lands in the queue manager's view.
Twelve alerts hit overnight. The copilot has already drafted summaries for all of them. You have 30 minutes to triage. The trap: the copilot occasionally invents indicators (hallucinated hostnames, fabricated MITRE technique IDs), and it tends to over-prioritize benign automation.
This scenario tests prompt construction, output verification, and the discipline of treating LLM output as a draft, not a verdict.
One ordered pass through every step. No clock. Each answer scores against the canonical solution.
Hints reduce the points you can earn for that step. Free-text steps queue for manual review.