Uncapped inbound bodies are the cheapest denial-of-wallet vector for LLM gateways: a single megabyte JSON blob can force megatokens of tokenizer work before your JSON Schema validation layer ever rejects it. In 2026, ship explicit maximum HTTP entity sizes, JSON parse depth ceilings, and per-tool stdout caps at the OpenClaw gateway edge—then wire those numbers to the same manifests your validators already consume so misconfigured clients meet one coherent policy surface. Pair limits with circuit breaker thresholds so repeated violations trip protective backoff instead of silently saturating CPU.
This guide anchors numbers you can defend in incident reviews: 4 MiB default maximum inbound JSON for interactive sessions, depth 32 for nested objects unless tools declare deeper trees, 512 KiB truncated preview for filesystem reads unless explicitly elevated, and rehearsal economics near $16.9 per day on a rented Apple Silicon Mac mini.
Why gateways must reject early
Defense-in-depth stacks TLS termination reverse proxies and application parsers—but the moment an oversized payload reaches your tokenizer you have already spent electricity and latency budget. Enforcing byte ceilings at the gateway socket layer using conservative MaxBytesReader patterns (or equivalent) ensures attackers pay round-trip cost without occupying model slots. Track rejection reasons separately from upstream HTTP 502 noise so Grafana tiles stay interpretable.
HTTP body and Content-Length ceilings
Terminate requests whose Content-Length exceeds your policy before streaming the remainder; for chunked uploads without length, accumulate into a ring buffer and abort once 4,194,304 bytes accrue for default interactive lanes. Document larger allowances only for batch import endpoints protected by separate authentication and queue depth metrics.
| Lane | Max body | Rationale |
|---|---|---|
| Interactive chat | 4 MiB | Fits rich tool JSON without inviting archive dumps |
| Automation webhook | 16 MiB | Batched events with administrator approval |
| Emergency operator override | 64 MiB | Requires signed header plus audit log entry |
JSON depth array length and key counts
Parsers should refuse trees deeper than 32 levels or wider than 4096 array slots per node unless a tool-specific waiver exists in versioned manifests. Pair depth checks with maximum distinct keys per object (256) to mitigate prototype pollution style gadgets masquerading as configuration blobs.
{
"error": "parse_depth_exceeded",
"limit_depth": 32,
"observed_depth": 41,
"request_id": "b7f2…",
"hint": "Flatten configuration arrays or split payloads across multiple tool calls"
}
Tool output truncation and streaming
Filesystem and shell tools can emit megabytes of accidental stdout. Wrap subprocess pipes with readers that switch to discard mode after 524,288 bytes unless the tool declared large_output:true in its contract. When streaming model tokens upward, respect TCP window pressure: pause tool reads when internal queues exceed 2 MiB combined RAM per worker on an 8 GiB Mac mini gateway.
Structured error bodies for 413 and 400
Use stable machine-readable error codes, echo policy limits, never echo attacker-controlled slices beyond 200 UTF-8 characters. Include retry_after_ms only when overload—not abuse—is the root cause.
Aligning limits with schema manifests
When your gateway already validates arguments via JSON Schema, ensure maxLength and maxItems values remain tighter than transport caps so validation fails fast but honestly—clients see schema errors instead of opaque socket resets.
macOS buffer defaults and LaunchAgent implications
launchd inherits different pipe sizes than Linux containers; rehearsal on bare macOS exposes scenarios where stdout fills faster than your goroutines drain—forcing honest tuning of high-water marks before production.
Abuse patterns and rate-adjacent controls
Payload attacks rarely arrive as single giant posts—more often they are thousands of just-below-limit requests that still exhaust JSON parsers. Track rejection_rate_413 and rejection_rate_parse as first-class metrics with 5% per five minutes warning thresholds and 20% automatic token throttle engagement for the offending API key. Pair with per-tenant credit balances so marketing tenants cannot starve platform operations through noisy campaigns.
Log samples for forensics but never store full rejected bodies: keep a SHA-256 digest and the first 1,024 bytes under encryption keys isolated from application logs, expiring after 72 hours unless legal hold applies.
SLO table for limiter decisions
| Signal | Target | Action |
|---|---|---|
| Median parse time | < 3 ms on M4 for 16 KiB JSON | Alert if > 6 ms for 15 minutes |
| 413 ratio | < 0.3% steady state | Page if > 2% after deploy |
| Truncated tool rows | < 1% of tool calls | Ask owners to paginate tools |
Rollout sequencing for teams nervous about breakage
- Shadow mode: log would-have-rejected counts without failing requests for 48 hours.
- Canary tenants: enforce caps for internal workspaces first.
- Progressive tightening: drop interactive max body from 8 MiB to 4 MiB across weekly trains.
- Post-deploy verification: compare Prometheus scrape deltas against baseline CPU.
Document numeric defaults beside environment variables so Terraform modules and LaunchAgent plist fragments stay synchronized—nothing erodes trust faster than staging accepting payloads production rejects.
WebAssembly and native extension paths
Some teams offload JSON parsing to WASM modules or SIMD-accelerated libraries. Keep gateway-wide caps below whatever those accelerators promise at peak clock—even Apple Silicon throttles under sustained AVX-heavy kernels when thermals climb above 85 °C chassis readings during summer workloads.
If you compile parsers as universal binaries for arm64, benchmark cold versus warm caches: first-request latency can spike 40% when instruction caches miss after deploying new builds—surface that jitter in dashboards so limit tightening never coincides with hidden regressions.
Versioned contracts with mobile and desktop clients
Expose limits through a machine-readable /gateway/policy document bumped alongside semver gateways so Flutter Electron or Swift clients negotiate capabilities without brittle hard-coded constants. Include max_tool_output_preview_bytes so UX designers know when to paginate tables rendered from tool output.
Shared state and Redis-backed counters
When multiple gateway workers enforce quotas, centralize rolling counters in Redis with five-second TTL windows for burst detection and hourly hashes for sustained abuse. Use Lua scripts so increment-and-check stays atomic—race conditions here manifest as flaky 429 storms that confuse legitimate retries.
Choose eviction policies carefully: dropping quota keys during memory pressure is preferable to dropping authentication tokens, but document the trade-off so on-call engineers know quotas may relax briefly during cluster failover drills.
Finally, rehearse coordinated upgrades: when OpenAI or Anthropic raises context windows overnight, resist blindly raising gateway ceilings—instead reassess tokenizer budgets and confirm observability panels still turn green after policy bumps.
Runbooks should link directly to this policy table, Grafana board IDs, and the current semver tag so new responders never hunt tribal knowledge in chat scrollback during a live incident—ship updates whenever defaults shift.
That small discipline keeps multi-region gateways aligned week to week.
Renting Apple Silicon Mac mini capacity through MacHTML keeps TLS stacks decoder libraries and pipe semantics aligned with customer gateways many teams already run on macOS rather than forcing Linux-only assumptions that mis-size buffers. Silent thermals and predictable single-thread uplift matter when you measure parser microseconds under sustained 413 storms at roughly $16.9 per day operating expense versus depreciating owned hardware.
Elastic rental windows also let you clone production limit profiles into an isolated tenant for red-team exercises—burst abusive payloads validate counters without risking shared staging clusters.
Load-test gateway limits on real macOS
Provision a cloud Mac mini to mirror parser limits LaunchAgent logging and TLS behavior before promoting policy tightenings.