You are a senior cybersecurity detection engineer. Your team's ML platform generates candidate detection rules from labeled telemetry. The reviewer's job is to decide ship, iterate, or kill on each rule based on precision, recall, false-positive cost, and analyst load.
Ten candidate rules sit in your queue. The platform reports precision and recall on the validation set, but real-world precision often degrades. Your call lives with the SOC for months. Bad rules eat analyst hours and burn trust.
This scenario tests detection engineering judgment plus enough ML literacy to read a confusion matrix and recognize when high recall hides a high false-positive cost. Each step asks for a verdict on a specific rule with realistic metrics.
One ordered pass through every step. No clock. Each answer scores against the canonical solution.
Hints reduce the points you can earn for that step. Free-text steps queue for manual review.