Does failover fix 429 rate limits automatically?

Routing helps only when a secondary provider has separate quota. Pair failover with retry-after budgets documented in your gateway config.

How often should I rerun doctor after changing models?

After every openclaw.json routing edit and before promoting LaunchAgent plists to production.

OpenClaw model failover provider routing macOS 2026

Operators running OpenClaw on a macOS gateway in 2026 often configure a single premium model—then production halts when Anthropic returns HTTP 429, OpenAI answers 503, or a regional outage zeros out your only API key. Model failover and explicit provider routing turn those hard stops into bounded degradations: a documented chain of secondary models, compatible tool schemas, and retry budgets that respect Retry-After headers. This tutorial shows how to express that chain in ~/.openclaw/openclaw.json, keep secrets out of git, and validate behavior with openclaw doctor before you touch LaunchAgent production plists. Pair it with 429 retry discipline, JSON and environment profiles, upstream circuit breakers, and doctor diagnostics so routing, throttling, and observability stay aligned.

You will leave with a failover decision matrix, a copy-ready configuration skeleton, numeric guardrails (3 providers max in the hot path, 30 s ceiling per provider attempt, 120 s total turn budget), and a staging checklist you can run on a rented Apple Silicon Mac mini for about $16.9 per day on published MacHTML pricing.

Symptoms that mean you need failover

Channels keep answering “thinking…” while logs show repeated provider errors. Tool calls stall mid-turn because the gateway exhausted retries against a single vendor. Finance sees surprise bills when operators manually swap API keys at 2 a.m. instead of promoting a tested secondary route. Each pattern is a signal that routing logic—not model quality—is the bottleneck.

Failover is not “use the cheapest model always.” It is a controlled degradation path: premium model for customer-facing turns, mid-tier model for internal HTML/CSS audits, local Ollama only for non-destructive summaries when cloud APIs are down—each tier documented and probed.

Failover chain design principles

Keep the hot path short: at most three providers, each probed weekly with the same tool schema your HTML/CSS agents use. Never fail open to a model that cannot call write tools if production workflows require file edits—degrade to read-only summaries instead. Log which provider served each turn with a correlation id so finance can attribute spend spikes.

Separate staging and production chains even if they share hardware; a typo in staging JSON should not demote production traffic to an experimental local model.

openclaw.json routing skeleton

Store routing beside—not inside—git-tracked HTML. A pragmatic pattern keeps provider names in openclaw.json while API keys live in ~/.openclaw/.env with chmod 600. Document the primary model id, an ordered fallback list, per-provider timeouts, and optional cost ceilings.

{
  "agents": {
    "default": {
      "model": {
        "primary": "anthropic/claude-sonnet-4-20250514",
        "fallbacks": [
          "openai/gpt-4.1",
          "google/gemini-2.5-pro"
        ]
      },
      "providers": {
        "anthropic": { "timeoutMs": 30000 },
        "openai": { "timeoutMs": 30000 },
        "google": { "timeoutMs": 45000 }
      }
    }
  }
}

Treat the snippet as structural guidance—field names evolve between OpenClaw releases, so diff your file against the release notes before paste-deploying. After edits, run openclaw doctor and a one-line chat probe per provider.

Provider selection matrix

Tier	When to use	Risk
Primary (premium)	Customer-facing Slack/Telegram turns	Quota exhaustion during launches
Secondary (mid)	Internal audits, non-urgent fixes	Tool schema drift vs primary
Tertiary (local)	Read-only summaries when cloud APIs down	Cannot match cloud reasoning depth
Disabled	Destructive tools during outage	Ops pressure to “just enable it”

429, 503, and retry-after coordination

Failover is not a substitute for respecting rate limits. When the primary returns 429, honor Retry-After up to your documented ceiling—often 30 seconds for chat—before advancing to secondary providers. For 503 storms, pair routing with the gateway circuit breaker so you do not hammer a degraded endpoint. See the dedicated 429 retry article for backoff tables that match macOS timers.

Cap total turn time near 120 seconds so Slack threads do not look hung; return a human-readable degradation message instead of looping across five providers.

Tool-schema compatibility across models

Every model in the chain must accept the same JSON tool definitions your gateway publishes. A secondary model that rejects browser tools will break HTML/CSS pipelines mid-flight. Maintain a staging script that runs three canned tool calls—read file, write patch, list directory—against each provider before you add it to production fallbacks.

When a provider lacks vision but your primary uses images, strip image parts before failover or route image tasks to a vision-capable secondary only.

Doctor probes and synthetic chats

After routing edits, capture openclaw doctor --json to the change ticket. Send synthetic messages that force each fallback: temporarily revoke the primary key in staging, confirm secondary answers within 10 seconds, restore keys, and verify primary resumes without restart. Watch gateway logs for provider labels on each turn.

Align probes with staging vs production profiles so you never test fallbacks against production secrets on a developer laptop.

Staging rollout checklist

Export current openclaw.json and .env fingerprints (not secret values) to git-ignored storage.
Add secondary provider keys with separate billing alerts.
Run tool-call smoke tests per provider on the staging gateway port.
Simulate primary outage by invalidating one key; confirm fallback within SLA.
Promote LaunchAgent plist only after 24 hours of clean staging metrics.
Document rollback: single JSON revert + launchctl kickstart steps.

For teams adding Zhipu GLM as a cost-aware secondary, see our GLM (Z.AI) provider guide for zai/* model refs and Coding Plan endpoints.

FAQ

Should failover models use the same tool definitions?

Yes—test tool calls on every provider before adding it to the chain.

Does failover fix 429 automatically?

Only if the secondary has separate quota; otherwise backoff is still required.

How often should I rerun doctor?

After every routing or secret change, and weekly on production cron.

Renting an Apple Silicon Mac mini through MacHTML keeps provider routing tests on the same WebKit and Node builds your executives use—not a Linux container approximating macOS. Nodes expose SSH for scripted chat probes and optional VNC when Keychain prompts block API calls. Idle power commonly sits near 6–12 W, so a week-long failover drill costs less than an overnight outage where every channel bot answers “provider error.”

Published pricing near $16.9 per day beats buying idle metal between releases. When the drill ends, stop the instance; your routing tables remain in git while CapEx does not accumulate across 36 months.

Rehearse OpenClaw model failover on real macOS

Rent a cloud Mac mini to validate provider chains, 429 fallbacks, and doctor probes before you promote routing changes to production LaunchAgents.

Rent a cloud Mac SSH and VNC setup guide