Tuning Storm Detection
Storm detection is configured per agent rule, so each relay can have independent thresholds tuned to its alert volume and criticality.
Configuration Options
Section titled “Configuration Options”stormThreshold
Section titled “stormThreshold”The number of alerts within the time window that triggers a storm.
- Type:
number - Default:
5 - Range:
2to50
{ stormThreshold: 5}stormWindowSeconds
Section titled “stormWindowSeconds”The time window (in seconds) used to count alerts for storm detection.
- Type:
number - Default:
60 - Range:
10to300
{ stormWindowSeconds: 60}maxImmediateDispatches
Section titled “maxImmediateDispatches”The number of agents dispatched immediately before storm hold kicks in. These first agents start working right away while the storm is still being evaluated.
- Type:
number - Default:
2 - Range:
0to5
{ maxImmediateDispatches: 2}Set to 0 to hold all dispatches during a storm. Set to a higher number if you want more agents working in parallel before triage.
Examples
Section titled “Examples”High-Traffic Relay
Section titled “High-Traffic Relay”For a relay that receives many alerts normally, increase the threshold to avoid false storm detection:
{ agentType: "devin", integrationId: "int_abc123", stormThreshold: 15, stormWindowSeconds: 120, maxImmediateDispatches: 3,}Critical Relay
Section titled “Critical Relay”For a relay where every alert matters and storms should be detected quickly:
{ agentType: "cursor", integrationId: "int_def456", repository: "https://github.com/org/critical-service", stormThreshold: 3, stormWindowSeconds: 30, maxImmediateDispatches: 1,}Debounced Triage Window
Section titled “Debounced Triage Window”When a storm is detected, the triage job doesn’t fire immediately. Instead, it uses a debounced delay to wait for more alerts to arrive:
- Each new alert pushes the triage job forward by 15 seconds
- The maximum delay is capped at 90 seconds from when the storm was first detected
- This ensures late-arriving alerts (which may include the actual root cause) are included in triage
For example, if a storm is detected at T=0:
- Alert at
T=5spushes triage toT=20s - Alert at
T=10spushes triage toT=25s - Alert at
T=20spushes triage toT=35s - No more alerts arrive — triage fires at
T=35s
If alerts keep arriving, triage fires at most at T=90s regardless.
AI Triage
Section titled “AI Triage”After the debounce window, AI analyzes all collected storm alerts. The triage considers:
- Timing — earlier alerts are more likely to be the root cause
- Infrastructure level — lower-level alerts (database, network) rank higher than application-level alerts
- Severity — higher severity alerts with specific error details are more informative
- Error specificity — alerts with stack traces, connection errors, or specific error codes are preferred
- Causal relationships — patterns where one failure causes others (e.g., DB down causing API timeouts)
The identified root cause alert is then dispatched to a coding agent with the full storm context, so the agent understands it needs to fix the underlying issue rather than a symptom.