Every OpenClaw gateway that shells out to MCP-style tools eventually hits EMFILE not because the operating system is cruel but because parallelism multiplied by pipes sockets and log files exhausts the per-process file descriptor table faster than operators expect. In 2026, pair a bounded worker semaphore with honest ulimit -n headroom, then reconcile those numbers with the spend controls in OpenClaw token budgets and tool throttles plus the triage loop in openclaw doctor gateway diagnostics so concurrency spikes never masquerade as mysterious upstream outages.
This article gives concrete guardrails: default 8 concurrent child processes per gateway worker on an 8 GiB Mac mini, reserve 256 descriptors for the HTTP stack and metrics, and rehearse changes on rented Apple hardware at roughly $16.9 per day before touching production tenants.
Process model and hidden FD consumers
Each subprocess inherits three standard streams, often duplicates when you wire PTYs, plus sockets to the model provider, Prometheus scrapes, optional WebSocket fan-out, rotated log files, and SQLite or Redis connections. A naive 32-way parallel fan-out can therefore allocate over 1,000 descriptors before user tools even open files.
Worker semaphores and fair queuing
Centralize a counting semaphore around the tool executor: acquire before exec, release in defer paths including signal cancellation. Prefer FIFO fairness within a tenant while allowing weighted priorities for admin repair jobs so stuck marketing tenants do not block incident bridges.
| Profile | Parallel cap | Rationale |
|---|---|---|
| Filesystem reads | 12 | SSD queue depth sweet spot on Apple NVMe |
| Network crawlers | 6 | Each TLS socket consumes extra FDs |
| CPU compilers | 2× perf cores | Thermal headroom dominates throughput |
Soft versus hard limits on macOS
Run launchctl limit maxfiles before and after edits. Typical interactive shells show 256 soft defaults while servers need 10,240 soft with identical hard ceilings—document both numbers in your runbook.
LaunchAgent plist SoftResourceLimits
Ship SoftResourceLimits with NumberOfFiles aligned to your semaphore math plus 20% slack for short bursts. Pair with ThrottleInterval so crash loops do not hammer exec while limits are misconfigured.
Detecting descriptor leaks quickly
Sample lsof -p $GATEWAY_PID every 60 seconds during soak tests and diff counts for CLOSE_WAIT sockets. Alert when open descriptors exceed 70% of soft limit for more than 5 minutes.
IO-bound versus CPU-bound tool profiles
Tag tools in manifests with io_bound or cpu_bound so the scheduler can apply different semaphores—mixing them under one cap starves latency-sensitive shell utilities behind long ffmpeg jobs.
Queue depth metrics and backpressure
Export gateway_tool_queue_depth as a gauge and alert when the 95th percentile exceeds 50 pending jobs for longer than 10 minutes. Surface queue position in structured logs so support can reassure users without SSH access.
Rollout checklist
- Snapshot current
ulimit -nfrom production gateway parent PID. - Lower concurrency in staging first; measure p95 tool latency.
- Increase soft limits only after proving no leak regression for 72 hours.
- Re-run doctor probes verifying channel connectivity.
Multi-tenant isolation on shared Mac mini
When several teams share one gateway host, partition semaphores per Unix group or environment namespace so a runaway automation tenant cannot exhaust the global pool—allocate at least 30% headroom reserved for interactive sessions.
Benchmark harness expectations
Synthetic harnesses should ramp concurrency in steps of +4 every 5 minutes while recording CPU package power and fan duty cycle; Apple Silicon throttles sooner when GPU resident sets collide with tool workloads.
Coordination with token throttles and upstream 429s
High parallelism amplifies upstream rate limits documented in token throttle guidance; when 429s spike, temporarily lower concurrency before increasing backoff timers so user-visible latency improves in both dimensions.
Observability and on-call playbooks
Attach exemplars to histograms showing concurrent tool count versus FD usage so incident commanders can tell “too many tools” from “leaking sockets” within the first five minutes of a page.
Security interactions with sandboxed tools
Sandbox profiles that duplicate file descriptors for IPC may double-count against limits—validate Seatbelt profiles on staging gateways before enforcing stricter caps in production.
Documentation debt and config drift
Maintain a single Markdown table checked into the repo listing default semaphore values per tool family; stale docs cause operators to raise limits instead of fixing leaks.
vnode pressure and temporary directories
Tools that create thousands of scratch files under /var/folders can exhaust vnode caches even when FD tables look healthy. Monitor sysctl vfs.numvnodes during CI and cap temp file creation rates with per-job quotas of 10,000 files unless a manifest explicitly opts into bulk extraction.
kqueue watchers and long-poll loops
Gateways that watch workspace directories with kqueue allocate one descriptor per watched path. Collapse recursive watches into a single root with user-space filtering to avoid multiplying handles when repositories exceed 5,000 tracked files.
gRPC streams and HTTP/2 multiplexing
Each multiplexed stream still consumes window buffers; keep concurrent outbound streams below 100 per upstream connection to avoid SETTINGS frame churn that spikes CPU on M-series efficiency cores.
Redis connection pools
Centralized queues often open a Redis connection per worker thread—pool to 32 shared connections maximum on an 8 GiB host and verify TLS session resumption so handshakes do not multiply FD usage during reconnect storms.
Upgrade windows and rolling restarts
During rolling deploys, overlap old and new binaries briefly doubles descriptor usage; extend soft limits by 15% for the maintenance window or shrink concurrency for two scrape intervals to keep EMFILE away.
Support macros for customer-visible errors
When users see “too many open files,” respond with canned steps referencing openclaw doctor, semaphore caps, and the exact plist keys you ship—reduces duplicate tickets by roughly 40% in mature deployments.
Capacity planning spreadsheet starter
Model worst-case descriptors as (workers × tools_parallel × (3 pipes + 2 logs + 2 sockets)) + fixed_overhead. For an 8-worker gateway with parallelism 8, that lands near 1,500 descriptors before user workloads—add 25% buffer before choosing plist ceilings. Revisit quarterly because each new integration channel adds long-lived connections.
Finally, align finance approvals: raising hard limits without code changes often masks debt—pair every limit bump with a ticket that closes actual leaks so budgets stay honest quarter over quarter and auditors see traceable remediation.
Publish these numbers beside your SLO dashboard so product managers intuitively understand why a flashy “double concurrency” request might be rejected during freeze windows near major launches, compliance audits, holiday traffic spikes, or vendor maintenance windows globally.
Apple Silicon Mac mini rentals through MacHTML give you macOS-accurate launchd inheritance, realistic pipe buffering, and quiet sustained load—ideal for proving semaphore math before Black Friday traffic. At about $16.9 per day, finance teams treat capacity experiments as operational expense instead of CapEx, while engineers still get root-level introspection over file tables.
Elastic rental windows also let you clone a production-like gateway onto isolated hardware when debugging FD regressions without risking shared staging clusters that other squads depend on.
Rehearse OpenClaw concurrency on real macOS
Rent a cloud Mac mini to validate ulimit changes LaunchAgent plists and semaphore tuning with production-like load.