How CoreSDK behaves when the control plane or sidecar is unreachable. CORESDK_FAIL_MODE=open/closed. HMAC-verified cache. No application code changes required.

Available in Phase 1b. Offline mode relies on the sidecar daemon's local cache. Phase 1a (Rust crate only) users access this via coresdk-resilience directly.

Offline Mode

CoreSDK's sidecar daemon maintains a local HMAC-verified cache of policies, JWT signing keys, and configuration. When the control plane becomes unreachable — due to a network partition, a rolling deploy, or a cloud outage — the sidecar switches to this cache automatically. Your application continues to authenticate requests and evaluate policies without any code changes.

How it works

Normal operation
──────────────────────────────────────────────────
Application → Sidecar → Control plane
                 ↑
         writes to local cache
         (HMAC-SHA256 signed)

Offline / partitioned
──────────────────────────────────────────────────
Application → Sidecar → ✗ Control plane (unreachable)
                 ↑
         reads from local cache
         (signature verified on every read)
         logs warning every sync interval

The sidecar detects a partition when a sync attempt times out or returns a non-2xx response. From that point it operates entirely from the local cache until the control plane becomes reachable again, at which point it re-syncs automatically without a restart.

What the cache contains

Data	Used for
JWT public keys (JWKS)	Verifying inbound JWTs
Rego policy bundle	Policy evaluation
SDK configuration	Feature flags, rate limits, tenant config
Tenant roster	Multi-tenancy isolation

All four categories continue to work in offline mode.

Cache integrity

HMAC keys are distributed to the sidecar via the mTLS-authenticated channel — never written to config files or environment variables. Every cache read verifies the HMAC-SHA256 signature of the stored blob. A tampered or corrupted entry is rejected and the fail mode is applied.

Configuring fail mode

CORESDK_FAIL_MODE controls what happens when the sidecar itself is unreachable from the application process (distinct from the control plane being partitioned).

Mode	Behavior
`open` (default)	Requests pass through; the partition is recorded in telemetry
`closed`	Requests are rejected with `503 Service Unavailable`

Set via environment variable (no code change required):

export CORESDK_FAIL_MODE=closed

Or in SDKConfig:

from coresdk import CoreSDKClient, SDKConfig

_sdk = CoreSDKClient(SDKConfig(
    sidecar_addr="127.0.0.1:50051",
    tenant_id="acme",
    service_name="orders-api",
    fail_mode="closed",  # "open" (default) or "closed"
))

Or via SDKConfig.from_env() which reads CORESDK_FAIL_MODE automatically:

_sdk = CoreSDKClient(SDKConfig.from_env())

Choosing the right fail mode

Use open (the default) for services where availability outweighs strict security enforcement — public read APIs, health checks, internal tooling. Use closed for surfaces that process financial transactions, modify sensitive data, or are subject to compliance requirements where unauthenticated access is never acceptable.

Cache persistence across restarts

The cache is written to disk on every successful sync and survives sidecar restarts. If the sidecar starts while the control plane is unreachable, it loads the last known-good cache and begins serving immediately.

Sidecar warning logs during a partition

Every sync interval (default 30 seconds, configurable via CORESDK_SYNC_INTERVAL_SECONDS) the sidecar emits a structured warning:

level=warn msg="control plane unreachable — operating from cache"
  partition_duration_seconds=142
  cache_age_seconds=142
  cache_valid=true
  policies_cached=4
  jwks_cached=2
  next_retry_in_seconds=30

These are emitted at WARN level. Configure your log aggregator to alert on control plane unreachable for extended partitions.

Testing offline behavior locally

Step 1 — seed the cache

coresdk-sidecar start --log-level=debug
# Wait for: level=info msg="sync complete" policies=4 jwks=2

Step 2 — simulate a partition

# macOS
echo "block drop out proto tcp from any to api.coresdk.io" \
  | sudo pfctl -ef -

# Linux
sudo iptables -A OUTPUT -d api.coresdk.io -j DROP

# or stop the local control plane
docker compose stop control-plane

Step 3 — verify auth still works from cache

curl -H "Authorization: Bearer $VALID_JWT" http://localhost:8080/api/orders
# → 200 OK (JWT verified from cached JWKS)

curl http://localhost:7700/status | jq .
# → { "partitioned": true, "cache_valid": true, ... }

Step 4 — restore and confirm re-sync

sudo pfctl -d          # macOS
# or
sudo iptables -D OUTPUT -d api.coresdk.io -j DROP   # Linux

curl http://localhost:7700/status | jq .partitioned
# → false

Testing closed fail mode

CORESDK_FAIL_MODE=closed python -m your_app &

# Stop the sidecar process (not the control plane)
coresdk-sidecar stop

curl http://localhost:8080/api/orders
# → 503 Service Unavailable

Environment variable reference

Variable	Default	Description
`CORESDK_FAIL_MODE`	`open`	`open` or `closed`
`CORESDK_SIDECAR_ADDR`	`127.0.0.1:50051`	Sidecar address
`CORESDK_SYNC_INTERVAL_SECONDS`	`30`	Control plane sync interval
`CORESDK_SIDECAR_PORT`	`7700`	Sidecar status HTTP port

Next steps

Resilience Primitives — HMAC cache details, circuit breaker, retry
Error Handling — how 503 and auth failures surface to callers
TLS & mTLS — HMAC keys distributed over the mTLS channel

Offline Mode

On this page