Restarting a single OpenClaw gateway bound to one port is fast, but rude: in-flight tool calls, half-streamed model tokens, and webhook acknowledgements can vanish mid-response. In 2026, teams that care about operator trust run traffic drains—they keep the old binary alive while shifting new TCP sessions to a fresh build on a second loopback port, then retire the old process only after idle timers fire. This article documents a pragmatic nginx weighted upstream pattern on macOS, numeric defaults that survived finance audits, and validation hooks aligned with openclaw doctor. Pair it with reverse proxy hardening for TLS headers and tunnel placement, and with LaunchAgent recovery when a drain fails and you must fall back to hard restarts.
You will get a step table, concrete timeouts, curl probes, and failure injection ideas suitable for a 24/7 gateway on rented Apple Silicon.
Why hard restarts fail product expectations
Users forgive 500 errors less than they forgive slow answers. A restart that drops active keep-alive sockets looks like a mysterious “assistant went offline” banner even when health checks pass milliseconds later. Draining preserves TCP semantics: existing connections finish while new ones land on the green binary.
Finance and compliance teams also care: abrupt termination can duplicate billing events if clients auto-retry without idempotency keys. A 120-second drain window costs almost nothing compared to reconciling duplicate tool invocations.
Topology: blue, green, and nginx in the middle
Bind blue to 127.0.0.1:8787 and green to 127.0.0.1:8788 (example ports). Public traffic never hits either port directly; nginx listens on 443 and proxies to an upstream group containing both loopback addresses. During steady state, weight 100 sits on blue. During rollout, shift weight in 10 point increments while watching error budgets.
| Phase | Blue weight | Green weight | Operator focus |
|---|---|---|---|
| Steady | 100 | 0 | Baseline metrics |
| Canary | 90 | 10 | Error rate parity |
| Half | 50 | 50 | Latency histograms |
| Complete | 0 | 100 | Drain blue sockets |
nginx configuration sketch
Keep configs boring—two upstream servers, least_conn or ip_hash if sticky sessions matter for your channel adapters, and explicit proxy headers so OpenClaw logs retain original host data.
upstream openclaw_gateway {
least_conn;
server 127.0.0.1:8787 weight=90 max_fails=3 fail_timeout=10s;
server 127.0.0.1:8788 weight=10 max_fails=3 fail_timeout=10s;
}
server {
listen 443 ssl;
location / {
proxy_pass http://openclaw_gateway;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_read_timeout 300s;
}
}
Reload nginx with nginx -t && nginx -s reload; the reload is sub-second and preserves most idle connections depending on worker settings—still announce maintenance windows if your worker recycle policy is aggressive.
Timeouts that must match end-to-end
Align proxy_read_timeout with the gateway’s upstream model idle ceiling. A common mismatch is 300s at nginx but 60s inside the gateway—users see truncated streams while logs blame the model. Document a single source of truth spreadsheet shared with SRE and vendor support.
WebSockets or streaming HTTP require proxy_buffering off on that location; forgetting this causes apparent stalls when buffers fill at 4 MB defaults.
When draining, set fail_timeout low enough to excise a bad green build quickly, but not so low that flaky health checks flap during CPU spikes on M4 hosts running parallel agents.
Validation, canaries, and rollback
Before shifting the first 10%, run synthetic chat probes against green directly on loopback with curl --http2-prior-knowledge or your vendor SDK smoke test. Compare median time-to-first-token against blue; regressions beyond 8% should halt the rollout.
Log correlation identifiers across nginx access logs and OpenClaw structured logs so you can prove which binary served a failing request. Without IDs, postmortems devolve into opinions.
Rollback is weight back to 100/0 plus a green process kill—practice this monthly so muscle memory exists during incidents.
macOS process ownership and ports
Run each gateway under its own LaunchAgent label so launchctl kickstop targets the correct PID family. Keep separate StandardOutPath files per color to avoid interleaved JSON lines that break log parsers.
Verify open sockets with lsof -nP -iTCP:8787 -sTCP:LISTEN after each shift; zombie listeners happen when developers hot-reload without releasing the port.
Rehearse the entire cutover on hardware that matches production. A cloud Mac mini from MacHTML—commonly near $16.9 per day—gives you Apple Silicon thermals, real launchd behavior, and SSH for scripted nginx edits without risking your laptop’s sleep settings.
Observability while weights move
Export three counters every 10 seconds during a cutover: active upstream connections per color, 5xx ratio per color, and p95 time-to-first-token from your synthetic probe. Plot them on the same axis so humans detect divergence instantly.
Annotate Grafana with the nginx reload timestamp and the Git SHA of the green binary. Future you will thank present you when correlating a latency bump with a specific weight step instead of guessing.
Alert if green’s 5xx rate exceeds blue by more than 0.5 percentage points for longer than 3 minutes while green carries any traffic—auto-rollback hooks are worth the YAML complexity.
Security posture during dual listeners
Two loopback ports double the accidental exposure risk if someone fat-fingers 0.0.0.0. Automate a post-deploy check that greps listening sockets and fails CI when public interfaces appear.
Secrets rotation must update both colors before traffic shifts; otherwise green might boot with stale API keys and silently inherit traffic. Store rotation tickets in the same runbook as the weight table.
Keep admin-only debug routes disabled on both binaries; attackers scanning during cutovers look for mismatched feature flags between versions.
Multi-tenant gateways and noisy neighbors
When several teams share one host, label per-tenant rate limits inside OpenClaw config and verify they match nginx limit_req zones. Draining does not fix fairness bugs—if green accidentally doubles concurrency, shift traffic back and inspect ulimit -n on both processes.
Schedule large cutovers outside overlapping cron jobs that snapshot disks; APFS copy-on-write spikes can add 30–80 ms to tail latencies independent of nginx weights.
Finally, capture a short screen recording of your Grafana board during the first production drain; onboarding videos beat prose when training the next on-call rotation.
FAQ
Does draining eliminate all errors?
No—application bugs still surface, but you remove a whole class of transport resets.
Can I use Unix domain sockets instead of TCP ports?
Yes, nginx upstream supports unix: paths; the same weighting logic applies.
What about mTLS to upstream models?
Terminate or pass-through consistently on both colors; mixed configs cause confusing partial TLS handshakes in access logs.
When gateways run continuously, the economics of dedicated hardware matter. A Mac mini pairs low idle power with enough multicore headroom for dual-color processes during cutovers, and macOS matches the automation assumptions many OpenClaw operators already documented. Renting through MacHTML avoids procurement delays, keeps VNC handy for visual checks of menu-bar status icons, and lets you spin an isolated staging host for nginx reload practice while production stays untouched—elastic capacity without buying a second machine you only need during release weeks.
Rehearse blue/green drains on real macOS
Provision a cloud Mac mini, wire nginx to two loopback ports, and practice weighted cutovers with OpenClaw before your next production upgrade window.