Skip to main content
CoreSDK
Guides

Multi-Tenant RAG System

Build AskAcme — a production-ready RAG API serving multiple enterprise customers from one deployment, with CoreSDK enforcing JWT auth, tenant isolation, RBAC, and PII gating.

Building a Secure Multi-Tenant RAG System

What you'll build: AskAcme — a Retrieval-Augmented Generation (RAG) API that serves three enterprise customers from one FastAPI deployment. Each customer's documents are isolated, roles control who can read vs. ingest vs. delete, and PII documents are hidden behind an explicit pii_access role. CoreSDK handles all of this with a single middleware line.

The Story

Acme Intelligence has three enterprise customers:

CustomerKnowledge BaseUsers
acme-corpFinancial reports, board decksCFO, analysts
globexEngineering wikis, runbooksDevs, SREs
initechHR policies, employee handbookHR, all staff

They share one deployment. The challenge: every customer's data must be invisible to every other customer — even if someone sends the wrong token. Certain documents (salary data, SSNs) must be hidden unless the user has a specific pii_access role.

CoreSDK enforces:

  • JWT authentication on every route
  • Tenant isolation — tenant_id from JWT claims, not from the request body
  • Role-based access for write operations (editor, admin)
  • PII document gating at retrieval time
  • Fail-open mode — service stays up if the sidecar restarts

Architecture

Browser / API Client
       │  Authorization: Bearer <JWT>

┌──────────────────────────────────┐
│         FastAPI App              │
│                                  │
│  CoreSDKMiddleware               │  ← validates JWT on every request
│       │                          │
│       ▼                          │
│  /query  /ingest  /documents     │
│       │                          │
│       ▼                          │
│  VectorStore (per-tenant ns)     │  ← tenant_id from JWT claims
│       │                          │
│       ▼                          │
│  LLM (swap in Claude / GPT-4)    │
└──────────────────────────────────┘
       │  gRPC :50051

┌─────────────────┐
│  CoreSDK Sidecar│  ← JWT validation, policy eval, audit
└─────────────────┘

The key insight: tenant_id comes from JWT claims, not from the request body. A user cannot query a different tenant by changing a URL parameter — the sidecar cryptographically validates the token and the middleware extracts the tenant.

Prerequisites

pip install "coresdk[fastapi]" uvicorn httpx

For a local sidecar (optional — service runs in fail-open without one):

docker run -p 50051:50051 ghcr.io/coresdk-dev/sidecar:latest

Quickstart

1. Clone the example

git clone https://github.com/coresdk-dev/examples
cd examples/python

2. Run with MockSDK (no sidecar needed)

The fastest way to start. MockSDK stands in for the real sidecar — perfect for local dev and CI.

CORESDK_USE_MOCK=true python 07_multitenant_rag.py
AskAcme RAG — End-to-End Test Suite
==================================================

▸ Ops Endpoints
  [PASS]  GET /healthz → 200 (bypasses auth)
  [PASS]  POST /query without token → 401  Missing Authorization header

▸ acme-corp — Financial Knowledge Base
  [PASS]  Query 'revenue growth' → 200  sources=['Q4 2024 Financial Report', ...]
  [PASS]  Query 'salary' without pii_access → PII doc hidden
  [PASS]  List docs → acme-corp only (2 visible, 1 PII hidden)

▸ Tenant Isolation
  [PASS]  acme-corp cannot see globex runbook
  [PASS]  initech cannot see globex architecture doc

▸ Role-Based Access Control
  [PASS]  POST /ingest without 'editor' role → 403
  [PASS]  DELETE without 'admin' role → 403

17/17 passed

3. Run as a live server

CORESDK_USE_MOCK=true uvicorn 07_multitenant_rag:app --reload --port 8000

Try it:

# Health check — no auth required
curl http://localhost:8000/healthz

# Query without token → 401
curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"query": "revenue growth"}'
# {"type":"...unauthorized","title":"Unauthorized","status":401}

# Query with token (fail-open, any token accepted without sidecar)
curl -X POST http://localhost:8000/query \
  -H "Authorization: Bearer acme-token" \
  -H "X-Tenant-ID: acme-corp" \
  -H "Content-Type: application/json" \
  -d '{"query": "revenue growth"}'
# {"tenant_id":"acme-corp","query":"revenue growth","answer":"...","sources":[...]}

# Different tenant — sees only their own docs
curl -X POST http://localhost:8000/query \
  -H "Authorization: Bearer globex-token" \
  -H "X-Tenant-ID: globex" \
  -H "Content-Type: application/json" \
  -d '{"query": "database failover"}'
# {"sources":[{"id":"globex-002","title":"Incident Runbook: Database Failover",...}]}

4. Connect to a real sidecar

export CORESDK_SIDECAR_ADDR=localhost:50051
export CORESDK_FAIL_MODE=open        # or "closed" to deny on sidecar error
export CORESDK_JWKS_URI=https://your-idp.example/.well-known/jwks.json

uvicorn 07_multitenant_rag:app --reload

With a real sidecar, tenant_id comes from the validated JWT — not the X-Tenant-ID header.

Code Walkthrough

Step 1 — One line of middleware

from coresdk import SDK
from coresdk.middleware.fastapi import CoreSDKMiddleware

sdk = SDK.from_env()  # reads CORESDK_SIDECAR_ADDR, CORESDK_FAIL_MODE, etc.

app = FastAPI()
app.add_middleware(
    CoreSDKMiddleware,
    sdk=sdk,
    exclude_paths=["/healthz", "/readyz", "/docs", "/openapi.json"],
)

Every request that hits a protected route has already been validated by the sidecar. JWT claims — including tenant_id and roles — are attached to request.state.coresdk_user.

Step 2 — Tenant from claims, not the request

def get_tenant(request: Request, claims: dict | None = None) -> str:
    if claims:
        t = claims.get("tenant_id", "")
        if t:
            return t
    # Fail-open fallback: header only valid in dev mode
    return request.headers.get("X-Tenant-ID", "unknown")

In production, claims["tenant_id"] is cryptographically signed in the JWT. An attacker cannot change it by modifying a header or URL parameter.

@app.post("/query")
async def query(body: dict, request: Request):
    claims = get_claims(request)
    tenant_id = get_tenant(request, claims)  # ← from JWT

    has_pii_access = "pii_access" in claims.get("roles", [])
    docs = store.search(
        tenant_id,            # ← only searches this tenant's namespace
        body["query"],
        top_k=3,
        include_pii=has_pii_access,
    )
    answer = llm(body["query"], docs)
    return {"tenant_id": tenant_id, "answer": answer, "sources": [...]}

The vector store never searches across tenants. Even a crafted request can't cross namespace boundaries.

Step 4 — Role-based write access

def require_role(role: str):
    def _dep(request: Request):
        claims = get_claims(request)
        if role not in claims.get("roles", []):
            raise HTTPException(status_code=403, detail={
                "type": "https://askai.example/errors/forbidden",
                "title": "Forbidden",
                "status": 403,
                "detail": f"Role '{role}' required. Your roles: {claims.get('roles')}",
            })
        return claims
    return _dep

# Ingest: requires editor or admin
@app.post("/ingest")
async def ingest(body: dict, request: Request,
                 claims: dict = Depends(require_role("editor"))):
    ...

# Delete: requires admin only
@app.delete("/documents/{doc_id}")
async def delete_document(doc_id: str, request: Request,
                           claims: dict = Depends(require_role("admin"))):
    ...

Roles come from the validated JWT claims — not from a database lookup on every request.

Step 5 — PII gating at retrieval

# Flag sensitive documents at ingest time
Document(
    "acme-003",
    "Employee Salary Data 2024",
    "Alice Johnson salary $210,000. SSN: 123-45-6789...",
    tags=["hr", "confidential"],
    contains_pii=True,  # ← hidden unless caller has pii_access role
)

# At search time — filter before returning results
has_pii_access = "pii_access" in claims.get("roles", [])
docs = store.search(tenant_id, query, include_pii=has_pii_access)

A user querying "employee salary" without pii_access gets an empty result — not a 403. This prevents enumeration attacks (a 403 would tell the attacker the document exists).

Step 6 — Fail-open vs fail-closed

CORESDK_FAIL_MODE=open    # sidecar down → allow (default, keeps service running)
CORESDK_FAIL_MODE=closed  # sidecar down → deny all (use in high-security environments)

With fail_mode=open: if the sidecar is unreachable during a deploy or network blip, the service stays up with empty claims (no roles, no tenant isolation from JWT).

With fail_mode=closed: any sidecar outage returns 401 to every request. Choose this when data leakage is a higher risk than downtime.

Testing Strategy

Unit tests with MockSDK (no sidecar)

from coresdk.testing import MockSDK
from coresdk._types import AuthDecision, Claims

sdk = MockSDK(default_allow=True)

# Give a specific token real claims (role + tenant)
sdk.set_token_decision("admin-token", AuthDecision(
    allowed=True,
    claims=Claims(
        sub="alice",
        tenant_id="acme-corp",
        roles=["admin", "editor", "pii_access"],
        exp=9999999,
    ),
    reason="",
))

app.add_middleware(CoreSDKMiddleware, sdk=sdk)

Integration tests with real sidecar

CORESDK_SIDECAR_ADDR=localhost:50051 pytest tests/

Built-in test harness

CORESDK_USE_MOCK=true python 07_multitenant_rag.py
# Runs 17 assertions across all three tenant personas

Swap in a Real LLM

The example ships with a keyword-matching stub. Replace fake_llm() with any provider:

# Anthropic Claude
import anthropic
client = anthropic.Anthropic()

def llm(query: str, docs: list[Document]) -> str:
    context = "\n\n".join(f"[{d.title}]\n{d.content}" for d in docs)
    msg = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=1024,
        system=f"Answer using only this context:\n{context}",
        messages=[{"role": "user", "content": query}],
    )
    return msg.content[0].text

# OpenAI GPT-4o
from openai import OpenAI
oai = OpenAI()

def llm(query: str, docs: list[Document]) -> str:
    context = "\n\n".join(f"[{d.title}]\n{d.content}" for d in docs)
    resp = oai.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": f"Answer using only this context:\n{context}"},
            {"role": "user", "content": query},
        ],
    )
    return resp.choices[0].message.content

Swap in a Real Vector Store

# Pinecone — tenant isolation via namespace
import pinecone
pc = pinecone.Pinecone(api_key=os.environ["PINECONE_API_KEY"])
index = pc.Index("askai")

def ingest(tenant_id: str, doc: Document, embedding: list[float]):
    index.upsert(
        vectors=[{
            "id": doc.id,
            "values": embedding,
            "metadata": {"title": doc.title, "pii": doc.contains_pii},
        }],
        namespace=tenant_id,  # ← one namespace per tenant = isolation
    )

def search(tenant_id: str, query_embedding: list[float], include_pii: bool):
    return index.query(
        vector=query_embedding,
        top_k=3,
        namespace=tenant_id,  # ← never crosses tenant boundary
        filter={} if include_pii else {"pii": False},
    )

Environment Variables

VariableDefaultDescription
CORESDK_SIDECAR_ADDRlocalhost:50051gRPC sidecar address
CORESDK_FAIL_MODEopenopen = allow on sidecar error, closed = deny
CORESDK_TENANT_IDdefaultDefault tenant (overridden by JWT)
CORESDK_JWKS_URIIdP JWKS endpoint for real JWT validation
CORESDK_ENVproductiondevelopment enables insecure gRPC channel
CORESDK_USE_MOCKfalsetrue uses MockSDK — no sidecar needed

Full Source

The complete runnable example is at examples/python/07_multitenant_rag.py.

git clone https://github.com/coresdk-dev/examples
cd examples/python
CORESDK_USE_MOCK=true python 07_multitenant_rag.py

On this page