Tuning Storm Detection

Storm detection is configured per agent rule, so each relay can have independent thresholds tuned to its alert volume and criticality.

Configuration Options

`stormThreshold`

The number of alerts within the time window that triggers a storm.

Type: number
Default: 5
Range: 2 to 50

{
  stormThreshold: 5
}

`stormWindowSeconds`

The time window (in seconds) used to count alerts for storm detection.

Type: number
Default: 60
Range: 10 to 300

{
  stormWindowSeconds: 60
}

`maxImmediateDispatches`

The number of agents dispatched immediately before storm hold kicks in. These first agents start working right away while the storm is still being evaluated.

Type: number
Default: 2
Range: 0 to 5

{
  maxImmediateDispatches: 2
}

Set to 0 to hold all dispatches during a storm. Set to a higher number if you want more agents working in parallel before triage.

Examples

High-Traffic Relay

For a relay that receives many alerts normally, increase the threshold to avoid false storm detection:

{
  agentType: "devin",
  integrationId: "int_abc123",
  stormThreshold: 15,
  stormWindowSeconds: 120,
  maxImmediateDispatches: 3,
}

Critical Relay

For a relay where every alert matters and storms should be detected quickly:

{
  agentType: "cursor",
  integrationId: "int_def456",
  repository: "https://github.com/org/critical-service",
  stormThreshold: 3,
  stormWindowSeconds: 30,
  maxImmediateDispatches: 1,
}

Debounced Triage Window

When a storm is detected, the triage job doesn’t fire immediately. Instead, it uses a debounced delay to wait for more alerts to arrive:

Each new alert pushes the triage job forward by 15 seconds
The maximum delay is capped at 90 seconds from when the storm was first detected
This ensures late-arriving alerts (which may include the actual root cause) are included in triage

For example, if a storm is detected at T=0:

Alert at T=5s pushes triage to T=20s
Alert at T=10s pushes triage to T=25s
Alert at T=20s pushes triage to T=35s
No more alerts arrive — triage fires at T=35s

If alerts keep arriving, triage fires at most at T=90s regardless.

AI Triage

After the debounce window, AI analyzes all collected storm alerts. The triage considers:

Timing — earlier alerts are more likely to be the root cause
Infrastructure level — lower-level alerts (database, network) rank higher than application-level alerts
Severity — higher severity alerts with specific error details are more informative
Error specificity — alerts with stack traces, connection errors, or specific error codes are preferred
Causal relationships — patterns where one failure causes others (e.g., DB down causing API timeouts)

The identified root cause alert is then dispatched to a coding agent with the full storm context, so the agent understands it needs to fix the underlying issue rather than a symptom.