Skip to main content
CoreSDK
Core Concepts

Offline Mode

How CoreSDK behaves when the control plane or sidecar is unreachable. CORESDK_FAIL_MODE=open/closed. HMAC-verified cache. No application code changes required.

Available in Phase 1b. Offline mode relies on the sidecar daemon's local cache. Phase 1a (Rust crate only) users access this via coresdk-resilience directly.

Offline Mode

CoreSDK's sidecar daemon maintains a local HMAC-verified cache of policies, JWT signing keys, and configuration. When the control plane becomes unreachable — due to a network partition, a rolling deploy, or a cloud outage — the sidecar switches to this cache automatically. Your application continues to authenticate requests and evaluate policies without any code changes.

How it works

Normal operation
──────────────────────────────────────────────────
Application → Sidecar → Control plane

         writes to local cache
         (HMAC-SHA256 signed)

Offline / partitioned
──────────────────────────────────────────────────
Application → Sidecar → ✗ Control plane (unreachable)

         reads from local cache
         (signature verified on every read)
         logs warning every sync interval

The sidecar detects a partition when a sync attempt times out or returns a non-2xx response. From that point it operates entirely from the local cache until the control plane becomes reachable again, at which point it re-syncs automatically without a restart.

What the cache contains

DataUsed for
JWT public keys (JWKS)Verifying inbound JWTs
Rego policy bundlePolicy evaluation
SDK configurationFeature flags, rate limits, tenant config
Tenant rosterMulti-tenancy isolation

All four categories continue to work in offline mode.

Cache integrity

HMAC keys are distributed to the sidecar via the mTLS-authenticated channel — never written to config files or environment variables. Every cache read verifies the HMAC-SHA256 signature of the stored blob. A tampered or corrupted entry is rejected and the fail mode is applied.

Configuring fail mode

CORESDK_FAIL_MODE controls what happens when the sidecar itself is unreachable from the application process (distinct from the control plane being partitioned).

ModeBehavior
open (default)Requests pass through; the partition is recorded in telemetry
closedRequests are rejected with 503 Service Unavailable

Set via environment variable (no code change required):

export CORESDK_FAIL_MODE=closed

Or in SDKConfig:

from coresdk import CoreSDKClient, SDKConfig

_sdk = CoreSDKClient(SDKConfig(
    sidecar_addr="127.0.0.1:50051",
    tenant_id="acme",
    service_name="orders-api",
    fail_mode="closed",  # "open" (default) or "closed"
))

Or via SDKConfig.from_env() which reads CORESDK_FAIL_MODE automatically:

_sdk = CoreSDKClient(SDKConfig.from_env())

Choosing the right fail mode

Use open (the default) for services where availability outweighs strict security enforcement — public read APIs, health checks, internal tooling. Use closed for surfaces that process financial transactions, modify sensitive data, or are subject to compliance requirements where unauthenticated access is never acceptable.

Cache persistence across restarts

The cache is written to disk on every successful sync and survives sidecar restarts. If the sidecar starts while the control plane is unreachable, it loads the last known-good cache and begins serving immediately.

Sidecar warning logs during a partition

Every sync interval (default 30 seconds, configurable via CORESDK_SYNC_INTERVAL_SECONDS) the sidecar emits a structured warning:

level=warn msg="control plane unreachable — operating from cache"
  partition_duration_seconds=142
  cache_age_seconds=142
  cache_valid=true
  policies_cached=4
  jwks_cached=2
  next_retry_in_seconds=30

These are emitted at WARN level. Configure your log aggregator to alert on control plane unreachable for extended partitions.

Testing offline behavior locally

Step 1 — seed the cache

coresdk-sidecar start --log-level=debug
# Wait for: level=info msg="sync complete" policies=4 jwks=2

Step 2 — simulate a partition

# macOS
echo "block drop out proto tcp from any to api.coresdk.io" \
  | sudo pfctl -ef -

# Linux
sudo iptables -A OUTPUT -d api.coresdk.io -j DROP

# or stop the local control plane
docker compose stop control-plane

Step 3 — verify auth still works from cache

curl -H "Authorization: Bearer $VALID_JWT" http://localhost:8080/api/orders
# → 200 OK (JWT verified from cached JWKS)

curl http://localhost:7700/status | jq .
# → { "partitioned": true, "cache_valid": true, ... }

Step 4 — restore and confirm re-sync

sudo pfctl -d          # macOS
# or
sudo iptables -D OUTPUT -d api.coresdk.io -j DROP   # Linux

curl http://localhost:7700/status | jq .partitioned
# → false

Testing closed fail mode

CORESDK_FAIL_MODE=closed python -m your_app &

# Stop the sidecar process (not the control plane)
coresdk-sidecar stop

curl http://localhost:8080/api/orders
# → 503 Service Unavailable

Environment variable reference

VariableDefaultDescription
CORESDK_FAIL_MODEopenopen or closed
CORESDK_SIDECAR_ADDR127.0.0.1:50051Sidecar address
CORESDK_SYNC_INTERVAL_SECONDS30Control plane sync interval
CORESDK_SIDECAR_PORT7700Sidecar status HTTP port

Next steps

On this page