Blue/green deploys and weekly dependency bumps only stay boring when your OpenClaw gateway can disappear without truncating half-finished tool calls. In 2026, operators expect a graceful shutdown path that mirrors cloud-native patterns: flip readiness so load balancers stop routing new traffic, stop the accept loop, drain in-flight HTTP and streaming tool invocations under explicit deadlines, then exit with code zero while metrics and logs flush. This article defines readiness versus liveness semantics, how to bound drain windows without starving long CRM exports, how shutdown interacts with circuit breakers and dead-letter queues, how to coordinate with read-timeout diagnostics, and how health monitoring should expose the draining state to synthetic probes.
Pricing anchor: rehearsing SIGTERM choreography on a dedicated Mac mini from MacHTML near $16.9 per day is cheaper than a deploy where launchd escalates to SIGKILL after 30 seconds while twelve tool calls still hold database locks.
Signals, launchd ExitTimeout, and OS defaults
On macOS, launchd sends SIGTERM to your gateway process when an operator runs launchctl bootout or when a dependency upgrade restarts the job. Unless ExitTimeout is raised, launchd escalates to SIGKILL after a platform-specific grace window—commonly 30 seconds for many templates, but always verify with launchctl print gui/$UID/your.plist. Linux containers offer different defaults; mixing the two without rehearsal guarantees surprises.
Your gateway should install signal handlers that set an atomic shutting_down flag, close idle keep-alive connections, and start a drain timer thread-safe. Avoid heavy work inside the signal handler itself: enqueue an event to the main loop so TLS state stays consistent.
Document two budgets: external (what the load balancer believes) and internal (what your worker pool needs). If the internal budget exceeds the external one, extend the balancer deregistration delay or shrink tool timeouts before you change shutdown code.
Readiness versus liveness during drain
Liveness should stay true while the process can still make progress; readiness should flip false the moment you intend to reject new work. Many teams invert these out of convenience and cause cascading failures: Kubernetes or any TCP front-end keeps scheduling new sessions onto a host that already returned 503 without readiness flips, creating retry amplification.
Expose a /readyz endpoint that returns 503 with JSON {"draining":true,"in_flight":14,"deadline_ms":12000} so synthetic monitors from your health stack can chart drain curves. Pair with the uptime guidance in gateway health monitoring.
Stopping the accept loop without reset storms
Closing the listening socket immediately may RST in-flight TLS handshakes. Prefer toggling an application-level gate that responds 503 with Retry-After: 5 while existing connections finish their current request. HTTP/2 GOAWAY frames need careful sequencing: advertise max streams zero, wait for open streams to complete, then close.
For WebSocket tool bridges, send an application-level server_shutdown message with a client-side backoff hint before closing the socket so agents do not thundering-herd the next pod.
Worker pools, per-tool deadlines, and streaming
Partition drain budgets: reserve 60% of wall time for standard REST tools, 30% for long-poll or streaming model fan-out, and 10% for administrative hooks like metrics scrape completion. If a single tenant monopolizes workers, enforce per-tenant concurrency caps even during shutdown so other tenants’ tool calls can still complete.
When model streaming is active, stop accepting new turns but allow the current SSE chunk pump to finish until either bytes complete or a sub-deadline of 20 seconds elapses—whichever comes first. Log partial completions with structured error SHUTDOWN_TRUNCATED_STREAM so product teams can tune prompts.
Load balancer hooks and Retry-After headers
After readiness flips false, wait at least one balancer health-check interval—often 5 seconds—before stopping accept. Double the wait in multi-region setups where DNS TTLs still point stragglers at the old instance. Always emit consistent Retry-After values so client-side SDKs align with your idempotency story.
When using layer-4 pass-through load balancers, TCP connections that already reached the gateway may still enqueue new HTTP requests on reused keep-alive sockets. Track per-connection generation counters: after shutdown begins, reject new requests on old sockets with Connection: close while still finishing the active request on that socket.
Metrics, logs, and post-shutdown audits
Emit a gauge gateway_in_flight_total that scrapers can read during drain plus a counter gateway_shutdown_events_total{result} labeled clean, deadline_exceeded, or forced_kill. Histogram gateway_shutdown_duration_seconds should span from readiness flip to process exit so SLO dashboards show whether your 45 second budget is realistic week over week.
Structured logs should include the Kubernetes-style terminationGracePeriodSeconds equivalent you configured on macOS, the observed in_flight peak, and the slowest tool name. Archive those lines next to deploy tickets so future regressions compare apples to apples.
Alert if clean drops below 99% across seven days while deploy frequency rises—often that means tool timeouts crept upward or a new vendor SDK ignores context cancellation.
Canary weight shifts during rolling restarts
Rolling restarts amplify shutdown frequency. If each instance restarts hourly, operators experience constant partial drains; combine that with aggressive canary traffic shifts and you may never reach steady state. Cap concurrent draining instances to at most 20% of the fleet unless synthetic traffic proves headroom.
Automate rollback when error budgets burn during drain: if five-minute 5xx rates exceed 1% while any instance advertises draining, snap traffic back to the last known-good weight table and page the deploy owner.
Circuit breaker coordination
During drain you want fewer probes, not more. Coordinate with circuit breaker settings: lengthen half-open intervals or force breakers open for dependencies that are already unhealthy so workers are not wasted on doomed upstream calls while precious seconds tick down.
DLQ producers and idempotent enqueue
If shutdown races with failure handlers that enqueue to DLQ, ensure enqueue operations are idempotent or deduplicated; otherwise a duplicate enqueue after TCP retry may poison operators. Cross-read dead-letter queue design for envelope fields that survive process exit.
Matrix: hard stop versus graceful drain
| Scenario | Hard SIGKILL | Graceful drain |
|---|---|---|
| In-flight DB transaction | Risk partial commit | Wait or explicit rollback hook |
| Streaming model tokens | Truncated mid-sentence | Bounded wait + structured truncation flag |
| Deploy latency | Faster | +5–45s typical |
| Operator trust | Low | High when metrics prove drains |
Numbered rollout checklist
- Add
/readyzdraining JSON and dashboard it. - Align
ExitTimeoutwith internal drain math plus 10s safety margin. - Integration-test SIGTERM while holding at least 50 synthetic tool calls.
- Verify HTTP/2 GOAWAY ordering under h2load.
- Postmortem template includes drain timeline CSV attachment.
FAQ
Should systemd KillMode affect OpenClaw?
On mixed fleets, document differences: mixed versus control-group changes which child processes receive signals.
Can we drain forever?
No—always bound; infinite drain masks stuck workers. Use hang diagnostics to classify slow calls.
Do Windows dev laptops matter?
Only if you ship Windows gateways; macOS rehearsal still helps the majority of design partners using MacBooks.
Graceful shutdown is where reliability culture meets measurable seconds. A Mac mini rented from MacHTML for roughly $16.9 per day gives you native macOS signal delivery, Apple Silicon headroom for soak tests, and SSH/VNC access to watch launchd timelines while you tune drain budgets before touching customer traffic.
Rehearse OpenClaw gateway shutdown on cloud Mac mini
Mirror production ExitTimeout, h2load failure modes, and readiness flips on real macOS networking before your next blue/green cutover.