Alert Storms Overview
An alert storm happens when a single root cause (e.g., a database going down) triggers many different alerts — API timeouts, connection refused errors, query failures, and so on. Without storm detection, each unique alert title would dispatch its own agent, leading to multiple agents fixing symptoms instead of the root cause.
How Storm Detection Works
Section titled “How Storm Detection Works”Storm detection is purely count-based. No AI is involved in the detection step — it simply counts alerts per relay within a time window.
Normal Flow (No Storm)
Section titled “Normal Flow (No Storm)”When alerts arrive at a normal rate:
- Alert arrives via webhook
- Relay rules execute
- Agent rule dispatches a coding agent
- Agent investigates and creates a PR
Storm Flow
Section titled “Storm Flow”When a burst of alerts arrives:
- First 2 alerts dispatch agents immediately (configurable via
maxImmediateDispatches) - Once the alert count exceeds the threshold within the time window, a storm is detected
- Subsequent alerts are held — no additional agents are dispatched
- After a debounced delay, an AI triage step analyzes all storm alerts
- The triage identifies the root cause alert
- A single agent is dispatched for the root cause with full storm context
Storm Lifecycle
Section titled “Storm Lifecycle”| Status | Description |
|---|---|
collecting | Storm detected, alerts are being collected |
triaging | AI is analyzing the alerts to find the root cause |
dispatched | Root cause identified, agent dispatched |
resolved | Storm resolved (future enhancement) |
Relationship to Alert Deduplication
Section titled “Relationship to Alert Deduplication”Storm detection is a layer above the existing same-title dedup:
- Same-title dedup prevents duplicate agents for alerts with identical (normalized) titles
- Storm detection prevents excessive agents when many different alert titles fire from a shared root cause
Both checks run in sequence. Storm detection runs first — if it holds an alert, the same-title dedup is skipped entirely.
Configuration
Section titled “Configuration”Storm behavior is configured per agent rule. See Tuning Storm Detection for configuration options.
const rule = { ruleType: "agent", config: { agentType: "devin", integrationId: "int_abc123", stormThreshold: 5, stormWindowSeconds: 60, maxImmediateDispatches: 2, },};Fail-Open Design
Section titled “Fail-Open Design”Storm detection is designed to never block alerts:
- If the storm detection database query fails, the alert proceeds to normal dispatch
- If the AI triage fails, the earliest high-severity alert is selected as root cause
- Existing alert routing, notifications, and acknowledgment are unaffected by storm detection