Agent gateways fan out to models, vector stores, and bespoke HTTP tools; when any upstream wobbles, naive retries amplify outages by multiplying traffic to a sick dependency. In 2026, production OpenClaw stacks pair exponential backoff with circuit breakers that flip from closed to open after bounded error budgets, then probe cautiously in half-open states before admitting full load again. This article defines practical thresholds, half-open cadence, Prometheus metrics you should export, how breakers coordinate with 429 responses carrying Retry-After, how they complement per-tenant token budgets, and how read-timeout hang diagnostics plus synthetic health checks keep operators from misreading a breaker as healthy traffic. Read alongside provider 429 Retry-After handling, token budget tool throttles, gateway read-timeout hang diagnostics, and gateway health monitoring and uptime so retries, budgets, timeouts, and probes tell one coherent story.
Pricing context: rehearsing gateway behavior on a dedicated Mac mini from MacHTML near $16.9 per day is cheaper than an incident when half-open storms re-damage an upstream already recovering from overload.
Why retries without breakers fail operations
Retries assume transient faults. When an upstream enters sustained failure—database lock storms, certificate expiry, or regional packet loss—each retry adds load exactly where the system is already suffocating. Gateways see amplified fan-out because a single agent turn may invoke six tools in parallel; if each tool retries three times at 200ms intervals, you can enqueue 18 failing calls per user gesture. Circuit breakers cap that damage by converting repeated failures into fast local errors until a cooling period elapses.
OpenClaw deployments also mix synchronous HTTP tools with streaming model calls. Breakers must be scoped per dependency, not per process: opening the breaker for a flaky CRM search API should not block a healthy calculator tool that shares the same worker pool unless they contend for identical sockets. Namespace breaker keys with provider id, region, and credential fingerprint to avoid accidental coupling.
Closed, open, and half-open lifecycles
In the closed state all requests pass through to upstream with normal instrumentation. After failures cross a threshold, the breaker opens: subsequent calls fail fast with a structured error that agents can interpret as “dependency unavailable,” enabling higher-level planners to switch strategies. After a configured sleep—often derived from historical p95 recovery time—the breaker enters half-open and admits a trickle of probes. Successful probes close the circuit; failures re-open it with a longer backoff multiplier.
enum BreakerState { Closed, Open, HalfOpen }
func (b *Breaker) Allow(now time.Time) bool {
switch b.state {
case Open:
if now.Sub(b.openedAt) < b.cooldown { return false }
b.state = HalfOpen
b.probeBudget = 1
return true
case HalfOpen:
return b.probeBudget > 0
default:
return true
}
}
Failure thresholds and sliding windows
Pick thresholds from observed traffic, not defaults copied from blog posts. A pragmatic starting point for HTTP tools is a sliding window of 30 seconds with at least 20 attempts recorded before evaluating ratios. Open the breaker on either 5 consecutive transport failures or when more than 60 percent of sampled calls return 5xx within the window. Consecutive counters catch total outages quickly, while ratio counters catch elevated but partial degradation.
For streaming model routes, classify failures carefully: a mid-stream reset after tokens were delivered might be a billing dispute, not a breaker-worthy incident. Maintain separate counters for “headers never arrived” versus “stream aborted after 40 percent completion.”
Half-open probes and concurrency caps
Half-open is the riskiest phase: too many probes recreate the thundering herd you tried to escape. Cap probes to 1 in-flight call every 250ms per dependency, jittered by up to 50ms, and require two consecutive successes across different worker threads before fully closing. If your gateway runs 12 workers, serialize probes through a leader election channel or a small Redis cell so each host does not launch independent probes.
Prometheus metrics hooks and SLO wiring
Export at minimum: breaker_state as a gauge labeled by dependency, breaker_transitions_total counter with labels for from and to states, breaker_rejections_total for fast fails while open, and breaker_probe_latency_ms histogram with bounded buckets between 25ms and 2s. Pair these with existing request histograms so dashboards show whether rising latency preceded transitions.
# HELP breaker_state 0=closed 1=half_open 2=open
breaker_state{dep="crm_search"} 2
breaker_rejections_total{dep="crm_search"} 184
Coordinating with HTTP 429 and Retry-After
When upstream returns 429 with Retry-After: 120, treat that as authoritative cooling: extend the open timer by at least 120 seconds even if your local window would have attempted half-open sooner. Record a dedicated metric breaker_retry_after_skips_total so SREs can see how often provider throttles, not binary failures, extended outages. This coordination is detailed in the provider 429 guide.
Token budgets and per-tool throttles
Token budgets limit agent spend; breakers isolate dependency health. They should exchange signals: when a breaker opens for model provider A, decrement effective call budget for plans that insist on provider A so orchestration shifts to provider B without burning user-visible quota on guaranteed failures. The token budget throttle article shows how to wire counters without double-charging partial streams.
Read timeouts and hang diagnostics
Breakers should not open solely because a legitimate long-running tool exceeded a tight read timeout. Differentiate context deadline exceeded from reset-by-peer. Hang diagnostics in the read-timeout guide help you classify slow tools versus dead sockets so half-open probes use realistic read deadlines.
Health checks versus breaker gauges
Synthetic health checks hit fixed canary endpoints; breakers observe real user traffic mixes. Keep both: a green health route can lie when only rare code paths hit a broken shard, while a closed breaker might reflect a transient spike even when canaries pass. Align alert routes so pages fire when health monitors and breaker gauges disagree for more than 3 minutes.
Matrix: breaker versus throttle versus queue
| Mechanism | Primary goal | Signals used | User-visible effect |
|---|---|---|---|
| Circuit breaker | Protect upstream | 5xx ratio, consecutive errors | Fast fail with structured retry guidance |
| Token throttle | Protect budget | Tokens per minute, tenant tier | Graceful degradation to cheaper models |
| Request queue | Smooth bursts | Queue depth, age | Increased latency, not hard errors |
macOS LaunchAgent rehearsal patterns
Rehearse breaker transitions with launchctl timers that replay captured traffic files at 10 RPS for 5 minutes, then inject fault scripts that return synthetic 503s. macOS networking stacks differ slightly from Linux containers regarding keepalive defaults; validating on a cloud Mac catches half-open timers that assume Linux TCP user timeout values.
Numbered rollout checklist
- Instrument each upstream with distinct breaker keys and dependency labels.
- Wire Prometheus gauges and verify Grafana boards show state transitions within 15 seconds.
- Integrate Retry-After parsing from the 429 guide before enabling half-open auto probes.
- Align read deadlines for probes with production P99 latency plus a 40 percent safety margin.
- Run a chaos test that opens breakers for two dependencies simultaneously to validate isolation.
- Document agent-facing error codes so planners switch tools without user-visible crashes.
- Schedule weekly rehearsal on a rented Mac mini with production-like LaunchAgents.
- Review token throttle interactions to prevent budget starvation during breaker cooldown.
FAQ
Should GraphQL and REST share one breaker?
Split them if resolvers hit different shards; shared breakers hide partial failures.
Can half-open use the same pool as traffic?
Isolate probe user agents or headers so observability filters can exclude them from SLO calculations if needed.
What about WebSocket tools?
Track message-level failures separately; opening on the first ping timeout causes flapping when Wi-Fi users roam.
Operating resilient gateways is a rehearsal problem as much as an algorithm problem. A Mac mini rented from MacHTML for about $16.9 per day gives operators a macOS-shaped environment to validate LaunchAgent-driven fault injection, half-open timing, and Prometheus scrape overhead before changes reach production.
Rehearse OpenClaw gateway breakers on cloud Mac mini
Mirror production timers, inject 503s safely, and validate Prometheus transitions on real macOS networking before you ship breaker policies globally.