Should limits match JSON Schema maxLength?

Gateway caps should be equal or stricter than schema limits so malicious payloads fail before expensive validation paths.

What about streaming tool output?

Apply per-chunk backpressure: pause upstream reads when internal buffers exceed high-water marks tied to your Mac gateway memory budget.

Do circuit breakers interact with oversized bodies?

Repeated 413 or parse failures can trip sliding-window error budgets—coordinate thresholds so operators distinguish abuse from misconfigured clients.

OpenClaw gateway request limits JSON depth tool output caps 2026

Uncapped inbound bodies are the cheapest denial-of-wallet vector for LLM gateways: a single megabyte JSON blob can force megatokens of tokenizer work before your JSON Schema validation layer ever rejects it. In 2026, ship explicit maximum HTTP entity sizes, JSON parse depth ceilings, and per-tool stdout caps at the OpenClaw gateway edge—then wire those numbers to the same manifests your validators already consume so misconfigured clients meet one coherent policy surface. Pair limits with circuit breaker thresholds so repeated violations trip protective backoff instead of silently saturating CPU.

This guide anchors numbers you can defend in incident reviews: 4 MiB default maximum inbound JSON for interactive sessions, depth 32 for nested objects unless tools declare deeper trees, 512 KiB truncated preview for filesystem reads unless explicitly elevated, and rehearsal economics near $16.9 per day on a rented Apple Silicon Mac mini.

Why gateways must reject early

Defense-in-depth stacks TLS termination reverse proxies and application parsers—but the moment an oversized payload reaches your tokenizer you have already spent electricity and latency budget. Enforcing byte ceilings at the gateway socket layer using conservative MaxBytesReader patterns (or equivalent) ensures attackers pay round-trip cost without occupying model slots. Track rejection reasons separately from upstream HTTP 502 noise so Grafana tiles stay interpretable.

HTTP body and Content-Length ceilings

Terminate requests whose Content-Length exceeds your policy before streaming the remainder; for chunked uploads without length, accumulate into a ring buffer and abort once 4,194,304 bytes accrue for default interactive lanes. Document larger allowances only for batch import endpoints protected by separate authentication and queue depth metrics.

Lane	Max body	Rationale
Interactive chat	4 MiB	Fits rich tool JSON without inviting archive dumps
Automation webhook	16 MiB	Batched events with administrator approval
Emergency operator override	64 MiB	Requires signed header plus audit log entry

JSON depth array length and key counts

Parsers should refuse trees deeper than 32 levels or wider than 4096 array slots per node unless a tool-specific waiver exists in versioned manifests. Pair depth checks with maximum distinct keys per object (256) to mitigate prototype pollution style gadgets masquerading as configuration blobs.

{
  "error": "parse_depth_exceeded",
  "limit_depth": 32,
  "observed_depth": 41,
  "request_id": "b7f2…",
  "hint": "Flatten configuration arrays or split payloads across multiple tool calls"
}

Tool output truncation and streaming

Filesystem and shell tools can emit megabytes of accidental stdout. Wrap subprocess pipes with readers that switch to discard mode after 524,288 bytes unless the tool declared large_output:true in its contract. When streaming model tokens upward, respect TCP window pressure: pause tool reads when internal queues exceed 2 MiB combined RAM per worker on an 8 GiB Mac mini gateway.

Structured error bodies for 413 and 400

Use stable machine-readable error codes, echo policy limits, never echo attacker-controlled slices beyond 200 UTF-8 characters. Include retry_after_ms only when overload—not abuse—is the root cause.

Aligning limits with schema manifests

When your gateway already validates arguments via JSON Schema, ensure maxLength and maxItems values remain tighter than transport caps so validation fails fast but honestly—clients see schema errors instead of opaque socket resets.

macOS buffer defaults and LaunchAgent implications

launchd inherits different pipe sizes than Linux containers; rehearsal on bare macOS exposes scenarios where stdout fills faster than your goroutines drain—forcing honest tuning of high-water marks before production.

Abuse patterns and rate-adjacent controls

Payload attacks rarely arrive as single giant posts—more often they are thousands of just-below-limit requests that still exhaust JSON parsers. Track rejection_rate_413 and rejection_rate_parse as first-class metrics with 5% per five minutes warning thresholds and 20% automatic token throttle engagement for the offending API key. Pair with per-tenant credit balances so marketing tenants cannot starve platform operations through noisy campaigns.

Log samples for forensics but never store full rejected bodies: keep a SHA-256 digest and the first 1,024 bytes under encryption keys isolated from application logs, expiring after 72 hours unless legal hold applies.

SLO table for limiter decisions

Signal	Target	Action
Median parse time	< 3 ms on M4 for 16 KiB JSON	Alert if > 6 ms for 15 minutes
413 ratio	< 0.3% steady state	Page if > 2% after deploy
Truncated tool rows	< 1% of tool calls	Ask owners to paginate tools

Rollout sequencing for teams nervous about breakage

Shadow mode: log would-have-rejected counts without failing requests for 48 hours.
Canary tenants: enforce caps for internal workspaces first.
Progressive tightening: drop interactive max body from 8 MiB to 4 MiB across weekly trains.
Post-deploy verification: compare Prometheus scrape deltas against baseline CPU.

Document numeric defaults beside environment variables so Terraform modules and LaunchAgent plist fragments stay synchronized—nothing erodes trust faster than staging accepting payloads production rejects.

WebAssembly and native extension paths

Some teams offload JSON parsing to WASM modules or SIMD-accelerated libraries. Keep gateway-wide caps below whatever those accelerators promise at peak clock—even Apple Silicon throttles under sustained AVX-heavy kernels when thermals climb above 85 °C chassis readings during summer workloads.

If you compile parsers as universal binaries for arm64, benchmark cold versus warm caches: first-request latency can spike 40% when instruction caches miss after deploying new builds—surface that jitter in dashboards so limit tightening never coincides with hidden regressions.

Versioned contracts with mobile and desktop clients

Expose limits through a machine-readable /gateway/policy document bumped alongside semver gateways so Flutter Electron or Swift clients negotiate capabilities without brittle hard-coded constants. Include max_tool_output_preview_bytes so UX designers know when to paginate tables rendered from tool output.

Shared state and Redis-backed counters

When multiple gateway workers enforce quotas, centralize rolling counters in Redis with five-second TTL windows for burst detection and hourly hashes for sustained abuse. Use Lua scripts so increment-and-check stays atomic—race conditions here manifest as flaky 429 storms that confuse legitimate retries.

Choose eviction policies carefully: dropping quota keys during memory pressure is preferable to dropping authentication tokens, but document the trade-off so on-call engineers know quotas may relax briefly during cluster failover drills.

Finally, rehearse coordinated upgrades: when OpenAI or Anthropic raises context windows overnight, resist blindly raising gateway ceilings—instead reassess tokenizer budgets and confirm observability panels still turn green after policy bumps.

Runbooks should link directly to this policy table, Grafana board IDs, and the current semver tag so new responders never hunt tribal knowledge in chat scrollback during a live incident—ship updates whenever defaults shift.

That small discipline keeps multi-region gateways aligned week to week.

Renting Apple Silicon Mac mini capacity through MacHTML keeps TLS stacks decoder libraries and pipe semantics aligned with customer gateways many teams already run on macOS rather than forcing Linux-only assumptions that mis-size buffers. Silent thermals and predictable single-thread uplift matter when you measure parser microseconds under sustained 413 storms at roughly $16.9 per day operating expense versus depreciating owned hardware.

Elastic rental windows also let you clone production limit profiles into an isolated tenant for red-team exercises—burst abusive payloads validate counters without risking shared staging clusters.

Load-test gateway limits on real macOS

Provision a cloud Mac mini to mirror parser limits LaunchAgent logging and TLS behavior before promoting policy tightenings.

Cloud Mac pricing SSH & VNC guide