AI Frontier

OpenClaw gateway port binding on macOS in 2026: EADDRINUSE diagnostics with lsof, launchd plist checks, and health-probe alignment

MacHTML Lab2026.05.1130 min read

Nothing stalls an OpenClaw gateway rollout faster than Error: listen EADDRINUSE: address already in use. In 2026 teams still confuse “the port is taken” with “the model API is down,” waste hours rotating API keys, and never notice that a zombie Node process from yesterday’s smoke test still owns TCP 8787. macOS adds wrinkles: launchd may restart a job before the old socket fully drains, engineers run multiple profiles on one Mac mini, and synthetic health checks often curl 127.0.0.1 while the gateway binds to a LAN address. This guide gives a repeatable triage path—lsof first, plist parity second, probe alignment third—and ties the workflow to doctor diagnostics, first-run LaunchAgent smoke tests, and gateway health monitoring so operators stop guessing.

Treat port collisions as capacity incidents: document which host interface, which PID, and which LaunchAgent label last acquired the socket so postmortems stay short.

Symptoms that look like upstream failures

Clients see connection refused, curl reports empty responses, or dashboards show “gateway offline” even while CPU is idle. Before blaming model vendors, verify whether the HTTP server ever entered LISTEN. If the process crashes during startup, logs may truncate before printing the friendly banner—only the errno survives. Collect stderr from StandardErrorPath and correlate timestamps with launchctl print state transitions.

When multiple engineers share a rented Mac mini, stale npm run dev sessions from ad-hoc testing are the usual culprit; they rarely appear in production Grafana because they never registered metrics.

lsof recipes that answer in one minute

lsof -nP -iTCP:8787 -sTCP:LISTEN

The -n flag avoids DNS lookups that slow scans; -P prints numeric ports. If output lists a node PID you do not recognize, capture ps -p PID -o args= before killing—operations teams hate mystery processes. When nothing listens yet clients still fail, check whether they target HTTPS while the gateway only speaks HTTP on that port.

For IPv6-only listeners, append | grep \\[::\\] or run a second pass with lsof -nP -i6TCP:8787—some Node stacks default to dual-stack sockets that show up twice.

LaunchAgent ProgramArguments and port flags

Plists frequently pass --port twice: once from a wrapper script and once from the agent template copied between environments. The last flag wins silently, which makes diffing configs painful. Keep a single source of truth—either environment variables in EnvironmentVariables or explicit arguments, not both. After edits, launchctl bootout gui/$UID/label; launchctl bootstrap gui/$UID path ensures the old socket closes before the new job starts.

Validate that WorkingDirectory points at the checkout that actually contains the package.json you think is running; mismatched cwd plus npx can spawn a second listener on the default port.

127.0.0.1 versus 0.0.0.0 versus LAN IP

Binding 127.0.0.1 protects against casual LAN scanning but breaks health checks issued from another host—even another VM on the same hypervisor. Binding 0.0.0.0 accepts everything and demands firewall discipline. Binding a specific office IP breaks when DHCP renews. Document the chosen contract in the runbook and mirror it in openclaw doctor expectations.

Health probes that lie

Synthetic monitors that curl http://127.0.0.1:8787/readyz stay green while remote users cannot reach 10.0.40.12:8787 because route tables or split VPN exclude that subnet. Align probe source IPs with real user paths or tunnel through the same bastion. Export a metric gateway_bind_interface at startup so Grafana shows drift between environments.

Decision matrix

ScenarioPreferred bindWatch item
Single engineer laptop lab127.0.0.1Remember to flip before remote demos
Shared Mac mini behind corp firewallLAN IP + allow listCoordinate with DHCP reservations
Public edge with reverse proxyloopback + nginx frontKeep gateway off public interfaces entirely

TIME_WAIT and rapid reloads

Automated deploy scripts that restart every 30 seconds during debugging can exhaust ephemeral ports or leave the previous PID in TIME_WAIT for up to 60 seconds on busy gateways. Insert a deliberate 5 second sleep between bootout and bootstrap, or temporarily shift admin ports by +1 while validating hotfix branches.

Application Firewall gotchas

macOS may prompt to allow incoming connections for each new Node binary path. Denying the prompt yields successful local binds yet remote SYN timeouts that resemble port conflicts. Standardize binary locations or sign consistently so operators are not spammed with dialogs during upgrades.

Multiple gateways on one Mac

Blue-green experiments need distinct ports 8787 and 8788 plus unique LaunchAgent labels. Document port bands per tenant: for example 8700–8799 reserved for OpenClaw, 8800–8899 for mock upstreams. Without a spreadsheet, weekend contractors will pick whatever looks free.

When running side-by-side staging and production on one physical host—which rental Mac minis tempt you to do—apply separate macOS users or at minimum separate log directories so lsof output stays interpretable.

launchctl kickstart and stale listeners

After freeing a port, prefer launchctl kickstart -k gui/$UID/com.example.openclaw so launchd sends SIGKILL to stubborn children instead of waiting politely. Without -k, a hung middleware thread can keep the FD open even though the parent logged “shutdown complete.” Capture the first page of launchctl print gui/$UID/com.example.openclaw before and after to prove the job state moved from running to not running.

Ephemeral client ports and outbound storms

Heavy tool fan-out opens thousands of outbound connections; macOS may exhaust ephemeral ranges while inbound LISTEN still succeeds. If operators misread EADDRINUSE on the client side, they chase the wrong listener. Track net.inet.ip.portrange.first defaults and watch sysctl net.inet.ip.portrange.hifirst when tuning CI runners colocated with the gateway.

Structured logging for bind failures

Emit JSON logs with event="bind_failed", errno, attempted host, port, and process argv hash. When postmortems replay logs, they should not need SSH to rediscover basics. Pair errno 48 (EADDRINUSE) with the lsof snapshot command in the same log line so newer responders copy-paste the right diagnostic.

Why Linux CI misses macOS bind races

Containers restart quickly and often share network namespaces differently than GUI sessions on macOS. Treat Linux CI as compile-time checks; still run a smoke bind on Apple hardware before merging plist changes. A Mac mini rental closes that loop for roughly the cost of one engineer-hour per day.

FAQ

Does macOS reuse ports instantly?

No; expect brief waits after high-churn restarts.

Why does health pass while users fail?

Interfaces differ between probe and client.

Is 0.0.0.0 safer than loopback?

It is broader exposure, not safer—pair with firewall rules.

When rent a Mac mini?

When you must reproduce macOS bind semantics outside laptops.

Port fights are boring but expensive: every false escalation burns cross-team goodwill. A rented Mac mini on Apple Silicon from MacHTML—about $16.9 per day—gives you the same launchd lifecycle, socket defaults, and firewall prompts your production gateway sees, without shipping hardware to every contractor. Spin it up for a release week, capture lsof evidence, then shut it down when the backlog clears.

Quiet thermals help when you are SSH’d in running repeated bind tests without annoying desk neighbors.

Reproduce OpenClaw bind issues on real macOS

Rent a cloud Mac mini to validate ports, LaunchAgent plists, and health probes with macOS-accurate socket behavior.

Gateway port QA
From $16.9/Day