Multi-Tenant RAG System
Build AskAcme — a production-ready RAG API serving multiple enterprise customers from one deployment, with CoreSDK enforcing JWT auth, tenant isolation, RBAC, and PII gating.
Building a Secure Multi-Tenant RAG System
What you'll build: AskAcme — a Retrieval-Augmented Generation (RAG) API that serves three enterprise
customers from one FastAPI deployment. Each customer's documents are isolated, roles control who can
read vs. ingest vs. delete, and PII documents are hidden behind an explicit pii_access role.
CoreSDK handles all of this with a single middleware line.
The Story
Acme Intelligence has three enterprise customers:
| Customer | Knowledge Base | Users |
|---|---|---|
acme-corp | Financial reports, board decks | CFO, analysts |
globex | Engineering wikis, runbooks | Devs, SREs |
initech | HR policies, employee handbook | HR, all staff |
They share one deployment. The challenge: every customer's data must be invisible to every other
customer — even if someone sends the wrong token. Certain documents (salary data, SSNs) must be
hidden unless the user has a specific pii_access role.
CoreSDK enforces:
- JWT authentication on every route
- Tenant isolation —
tenant_idfrom JWT claims, not from the request body - Role-based access for write operations (
editor,admin) - PII document gating at retrieval time
- Fail-open mode — service stays up if the sidecar restarts
Architecture
Browser / API Client
│ Authorization: Bearer <JWT>
▼
┌──────────────────────────────────┐
│ FastAPI App │
│ │
│ CoreSDKMiddleware │ ← validates JWT on every request
│ │ │
│ ▼ │
│ /query /ingest /documents │
│ │ │
│ ▼ │
│ VectorStore (per-tenant ns) │ ← tenant_id from JWT claims
│ │ │
│ ▼ │
│ LLM (swap in Claude / GPT-4) │
└──────────────────────────────────┘
│ gRPC :50051
▼
┌─────────────────┐
│ CoreSDK Sidecar│ ← JWT validation, policy eval, audit
└─────────────────┘The key insight: tenant_id comes from JWT claims, not from the request body. A user cannot query
a different tenant by changing a URL parameter — the sidecar cryptographically validates the token and
the middleware extracts the tenant.
Prerequisites
pip install "coresdk[fastapi]" uvicorn httpxFor a local sidecar (optional — service runs in fail-open without one):
docker run -p 50051:50051 ghcr.io/coresdk-dev/sidecar:latestQuickstart
1. Clone the example
git clone https://github.com/coresdk-dev/examples
cd examples/python2. Run with MockSDK (no sidecar needed)
The fastest way to start. MockSDK stands in for the real sidecar — perfect for local dev and CI.
CORESDK_USE_MOCK=true python 07_multitenant_rag.pyAskAcme RAG — End-to-End Test Suite
==================================================
▸ Ops Endpoints
[PASS] GET /healthz → 200 (bypasses auth)
[PASS] POST /query without token → 401 Missing Authorization header
▸ acme-corp — Financial Knowledge Base
[PASS] Query 'revenue growth' → 200 sources=['Q4 2024 Financial Report', ...]
[PASS] Query 'salary' without pii_access → PII doc hidden
[PASS] List docs → acme-corp only (2 visible, 1 PII hidden)
▸ Tenant Isolation
[PASS] acme-corp cannot see globex runbook
[PASS] initech cannot see globex architecture doc
▸ Role-Based Access Control
[PASS] POST /ingest without 'editor' role → 403
[PASS] DELETE without 'admin' role → 403
17/17 passed3. Run as a live server
CORESDK_USE_MOCK=true uvicorn 07_multitenant_rag:app --reload --port 8000Try it:
# Health check — no auth required
curl http://localhost:8000/healthz
# Query without token → 401
curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{"query": "revenue growth"}'
# {"type":"...unauthorized","title":"Unauthorized","status":401}
# Query with token (fail-open, any token accepted without sidecar)
curl -X POST http://localhost:8000/query \
-H "Authorization: Bearer acme-token" \
-H "X-Tenant-ID: acme-corp" \
-H "Content-Type: application/json" \
-d '{"query": "revenue growth"}'
# {"tenant_id":"acme-corp","query":"revenue growth","answer":"...","sources":[...]}
# Different tenant — sees only their own docs
curl -X POST http://localhost:8000/query \
-H "Authorization: Bearer globex-token" \
-H "X-Tenant-ID: globex" \
-H "Content-Type: application/json" \
-d '{"query": "database failover"}'
# {"sources":[{"id":"globex-002","title":"Incident Runbook: Database Failover",...}]}4. Connect to a real sidecar
export CORESDK_SIDECAR_ADDR=localhost:50051
export CORESDK_FAIL_MODE=open # or "closed" to deny on sidecar error
export CORESDK_JWKS_URI=https://your-idp.example/.well-known/jwks.json
uvicorn 07_multitenant_rag:app --reloadWith a real sidecar, tenant_id comes from the validated JWT — not the X-Tenant-ID header.
Code Walkthrough
Step 1 — One line of middleware
from coresdk import SDK
from coresdk.middleware.fastapi import CoreSDKMiddleware
sdk = SDK.from_env() # reads CORESDK_SIDECAR_ADDR, CORESDK_FAIL_MODE, etc.
app = FastAPI()
app.add_middleware(
CoreSDKMiddleware,
sdk=sdk,
exclude_paths=["/healthz", "/readyz", "/docs", "/openapi.json"],
)Every request that hits a protected route has already been validated by the sidecar.
JWT claims — including tenant_id and roles — are attached to request.state.coresdk_user.
Step 2 — Tenant from claims, not the request
def get_tenant(request: Request, claims: dict | None = None) -> str:
if claims:
t = claims.get("tenant_id", "")
if t:
return t
# Fail-open fallback: header only valid in dev mode
return request.headers.get("X-Tenant-ID", "unknown")In production, claims["tenant_id"] is cryptographically signed in the JWT.
An attacker cannot change it by modifying a header or URL parameter.
Step 3 — Scoped vector store search
@app.post("/query")
async def query(body: dict, request: Request):
claims = get_claims(request)
tenant_id = get_tenant(request, claims) # ← from JWT
has_pii_access = "pii_access" in claims.get("roles", [])
docs = store.search(
tenant_id, # ← only searches this tenant's namespace
body["query"],
top_k=3,
include_pii=has_pii_access,
)
answer = llm(body["query"], docs)
return {"tenant_id": tenant_id, "answer": answer, "sources": [...]}The vector store never searches across tenants. Even a crafted request can't cross namespace boundaries.
Step 4 — Role-based write access
def require_role(role: str):
def _dep(request: Request):
claims = get_claims(request)
if role not in claims.get("roles", []):
raise HTTPException(status_code=403, detail={
"type": "https://askai.example/errors/forbidden",
"title": "Forbidden",
"status": 403,
"detail": f"Role '{role}' required. Your roles: {claims.get('roles')}",
})
return claims
return _dep
# Ingest: requires editor or admin
@app.post("/ingest")
async def ingest(body: dict, request: Request,
claims: dict = Depends(require_role("editor"))):
...
# Delete: requires admin only
@app.delete("/documents/{doc_id}")
async def delete_document(doc_id: str, request: Request,
claims: dict = Depends(require_role("admin"))):
...Roles come from the validated JWT claims — not from a database lookup on every request.
Step 5 — PII gating at retrieval
# Flag sensitive documents at ingest time
Document(
"acme-003",
"Employee Salary Data 2024",
"Alice Johnson salary $210,000. SSN: 123-45-6789...",
tags=["hr", "confidential"],
contains_pii=True, # ← hidden unless caller has pii_access role
)
# At search time — filter before returning results
has_pii_access = "pii_access" in claims.get("roles", [])
docs = store.search(tenant_id, query, include_pii=has_pii_access)A user querying "employee salary" without pii_access gets an empty result — not a 403.
This prevents enumeration attacks (a 403 would tell the attacker the document exists).
Step 6 — Fail-open vs fail-closed
CORESDK_FAIL_MODE=open # sidecar down → allow (default, keeps service running)
CORESDK_FAIL_MODE=closed # sidecar down → deny all (use in high-security environments)With fail_mode=open: if the sidecar is unreachable during a deploy or network blip,
the service stays up with empty claims (no roles, no tenant isolation from JWT).
With fail_mode=closed: any sidecar outage returns 401 to every request. Choose this
when data leakage is a higher risk than downtime.
Testing Strategy
Unit tests with MockSDK (no sidecar)
from coresdk.testing import MockSDK
from coresdk._types import AuthDecision, Claims
sdk = MockSDK(default_allow=True)
# Give a specific token real claims (role + tenant)
sdk.set_token_decision("admin-token", AuthDecision(
allowed=True,
claims=Claims(
sub="alice",
tenant_id="acme-corp",
roles=["admin", "editor", "pii_access"],
exp=9999999,
),
reason="",
))
app.add_middleware(CoreSDKMiddleware, sdk=sdk)Integration tests with real sidecar
CORESDK_SIDECAR_ADDR=localhost:50051 pytest tests/Built-in test harness
CORESDK_USE_MOCK=true python 07_multitenant_rag.py
# Runs 17 assertions across all three tenant personasSwap in a Real LLM
The example ships with a keyword-matching stub. Replace fake_llm() with any provider:
# Anthropic Claude
import anthropic
client = anthropic.Anthropic()
def llm(query: str, docs: list[Document]) -> str:
context = "\n\n".join(f"[{d.title}]\n{d.content}" for d in docs)
msg = client.messages.create(
model="claude-opus-4-6",
max_tokens=1024,
system=f"Answer using only this context:\n{context}",
messages=[{"role": "user", "content": query}],
)
return msg.content[0].text
# OpenAI GPT-4o
from openai import OpenAI
oai = OpenAI()
def llm(query: str, docs: list[Document]) -> str:
context = "\n\n".join(f"[{d.title}]\n{d.content}" for d in docs)
resp = oai.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": f"Answer using only this context:\n{context}"},
{"role": "user", "content": query},
],
)
return resp.choices[0].message.contentSwap in a Real Vector Store
# Pinecone — tenant isolation via namespace
import pinecone
pc = pinecone.Pinecone(api_key=os.environ["PINECONE_API_KEY"])
index = pc.Index("askai")
def ingest(tenant_id: str, doc: Document, embedding: list[float]):
index.upsert(
vectors=[{
"id": doc.id,
"values": embedding,
"metadata": {"title": doc.title, "pii": doc.contains_pii},
}],
namespace=tenant_id, # ← one namespace per tenant = isolation
)
def search(tenant_id: str, query_embedding: list[float], include_pii: bool):
return index.query(
vector=query_embedding,
top_k=3,
namespace=tenant_id, # ← never crosses tenant boundary
filter={} if include_pii else {"pii": False},
)Environment Variables
| Variable | Default | Description |
|---|---|---|
CORESDK_SIDECAR_ADDR | localhost:50051 | gRPC sidecar address |
CORESDK_FAIL_MODE | open | open = allow on sidecar error, closed = deny |
CORESDK_TENANT_ID | default | Default tenant (overridden by JWT) |
CORESDK_JWKS_URI | — | IdP JWKS endpoint for real JWT validation |
CORESDK_ENV | production | development enables insecure gRPC channel |
CORESDK_USE_MOCK | false | true uses MockSDK — no sidecar needed |
Full Source
The complete runnable example is at
examples/python/07_multitenant_rag.py.
git clone https://github.com/coresdk-dev/examples
cd examples/python
CORESDK_USE_MOCK=true python 07_multitenant_rag.pyZero-Trust Microservices
Build a zero-trust service mesh where every internal service-to-service call is authenticated via short-lived JWTs, policies control which services can call which endpoints, and lateral movement is impossible. Go SDK.
AI Agent Tool Gateway
Build AgentGate — a secure gateway that lets LLM agents call internal tools (databases, APIs, shell commands) with per-tool policy enforcement, rate limiting, and a full audit trail of every action taken. TypeScript + Express.