Kubernetes Multi-Tenancy Architecture: Namespaces, vCluster, and Hard Isolation

Kubernetes Multi-Tenancy Architecture: Namespaces, vCluster, and Hard Isolation

Kubernetes Multi-Tenancy Architecture: Choosing Between Namespaces, vCluster, and Hard Isolation

Every platform team eventually hits the same wall. You run one Kubernetes cluster, ten teams want to ship onto it, and the finance spreadsheet says you cannot afford ten clusters. So you reach for namespaces, hand out RBAC, and call it multi-tenancy. Six months later a tenant installs a CRD that collides with another team’s operator, a runaway Job starves the scheduler, and a security review asks the uncomfortable question: can tenant A read tenant B’s secrets? This is the central tension of kubernetes multi-tenancy architecture — shared infrastructure is cheap and operationally simple, but isolation is exactly the property a shared cluster does not give you for free. The arrival of platform engineering as a discipline, plus the economics of sharing expensive GPU nodes, has made this decision sharper and more expensive to get wrong than it was even two years ago.

This post is an architecture decision record. It maps the isolation spectrum from soft to hard to dedicated, names the dimensions you actually have to defend, and gives you a decision matrix instead of a vendor pitch.

What this covers: the isolation spectrum, the control-plane leakage problem, soft-tenancy primitives, vCluster and Capsule, kernel-level sandboxing, and a model-selection matrix with concrete “choose X when” rules.

Context and Background

Multi-tenancy in Kubernetes means running workloads from multiple distinct trust domains — teams, customers, environments — on shared infrastructure, while preventing them from interfering with or observing each other beyond what policy allows. The hard part is that Kubernetes was not designed as a multi-tenant operating system. It was designed as a single-cluster orchestrator for a cooperative set of workloads, and multi-tenancy has been bolted on through layers of admission control, policy, and increasingly, virtualization.

There are three broad postures. Soft multi-tenancy assumes tenants are semi-trusted — internal teams who will not deliberately attack each other, but who make mistakes. Here a single cluster is partitioned with namespaces, RBAC, quotas, and network policy. Hard multi-tenancy assumes tenants are untrusted or hostile, or that a compromise of one tenant must not reach another. This demands stronger boundaries: virtual control planes, sandboxed runtimes, sometimes dedicated nodes. Dedicated clusters sit at the far end — one cluster per tenant, full separation, and a correspondingly full operational and financial bill.

The reason a single label like “isolation” is useless is that isolation is not one property. It decomposes into at least five dimensions you have to reason about separately: control-plane isolation (can a tenant see or mutate another’s API objects, or break the shared API server?), network isolation (can pods reach across tenant boundaries?), node and kernel isolation (does a container escape on a shared node compromise neighbours?), data isolation (secrets, persistent volumes, etcd contents), and noisy-neighbour or resource isolation (can one tenant exhaust CPU, memory, or scheduler attention?). A model that nails network isolation can be wide open on kernel isolation. The official Kubernetes multi-tenancy documentation frames these as separable concerns for exactly this reason, and the SIG Multi-Tenancy working group has spent years building tooling around each axis. Internally, this connects directly to how you build a self-service platform — see our take on Crossplane composition functions for an internal developer platform, because tenancy and self-service are two sides of the same platform contract.

The Multi-Tenancy Isolation Spectrum

The right model is the weakest one that still satisfies your threat model. If tenants are trusted internal teams, namespaces plus guardrails are usually enough; if tenants are untrusted or run arbitrary code, you climb toward virtual control planes and sandboxed runtimes; only adversarial or compliance-bound separation justifies dedicated clusters. Every step up the spectrum buys isolation and costs money, operational overhead, and shared-resource efficiency.

Kubernetes multi-tenancy isolation spectrum

Figure 1: The isolation spectrum runs from a single-tenant cluster, through soft multi-tenancy (namespace-per-tenant on shared API server, nodes, and kernel), to hard multi-tenancy (virtual control planes and sandboxed runtimes), and finally dedicated clusters with hardware-level separation. Each rightward step strengthens the boundary and raises cost.

The mistake teams make is treating this as a binary — “are we multi-tenant or not?” — when it is a continuum, and you can sit at different points for different dimensions. A common and sensible production posture is soft control-plane tenancy (shared API server with strong RBAC) combined with hard kernel tenancy (gVisor or Kata on a dedicated node pool for untrusted code). Mixing layers is not a hack; it is the point. Let us walk the three sub-problems that the spectrum hides.

Namespace soft-tenancy primitives

Namespaces are the unit of soft tenancy. A namespace is a scoping mechanism for names and a target for RBAC, quotas, and policy — but on its own it isolates almost nothing. Out of the box, two namespaces share the API server, the network, the nodes, and the kernel. Isolation comes from the primitives you layer on top.

Start with RBAC. Each tenant gets a Role scoped to its namespace and a RoleBinding granting its group access. The discipline is to never hand tenants ClusterRole bindings, because cluster-scoped permissions are precisely the leak. Next, contain resource consumption with ResourceQuota and LimitRange. A quota caps aggregate consumption per namespace; a LimitRange supplies default requests and limits so a tenant who forgets to set them does not get unbounded pods.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-blue-quota
  namespace: team-blue
spec:
  hard:
    requests.cpu: "20"
    requests.memory: 64Gi
    limits.cpu: "40"
    limits.memory: 128Gi
    pods: "100"
    persistentvolumeclaims: "20"
    services.loadbalancers: "2"

Network isolation is not default. Kubernetes networking is flat: every pod can reach every other pod unless a NetworkPolicy says otherwise — and only if your CNI enforces it (Calico and Cilium do; the choice matters, which is why we wrote a CNI comparison of Calico, Cilium, Flannel, and Multus). The non-negotiable baseline is a default-deny ingress policy per tenant namespace, then explicit allow rules.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  namespace: team-blue
spec:
  podSelector: {}
  policyTypes:
    - Ingress
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-same-namespace
  namespace: team-blue
spec:
  podSelector: {}
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector: {}

Finally, enforce a security baseline. Pod Security Admission (PSA), the built-in successor to PodSecurityPolicy, applies one of three profiles — privileged, baseline, restricted — at the namespace level via labels. For untrusted tenants the restricted profile is the floor; it blocks privilege escalation, host namespaces, and most dangerous capabilities. PSA is coarse, though, so most serious platforms add a policy engine — Kyverno or OPA Gatekeeper — to express rules PSA cannot: require specific labels, block :latest images, force registries, mandate resource limits. Figure 3 shows these guardrails stacked.

Namespace soft-tenancy guardrails stack

Figure 3: The soft-tenancy guardrail stack applied to a tenant namespace — RBAC scopes access, ResourceQuota and LimitRange contain consumption, a default-deny NetworkPolicy fences traffic, Pod Security Admission enforces the runtime baseline, and Kyverno or OPA adds the policy rules PSA cannot express. All five must be present before tenant workloads are admitted.

Control-plane isolation limits

Here is where soft tenancy hits its ceiling, and it is the single most important thing to understand before you over-trust namespaces. The API server is shared. Every tenant’s objects live in the same etcd, are served by the same API server, and are subject to the same set of admission webhooks and CRDs. Three concrete leaks follow.

First, cluster-scoped resources have no namespace. CRDs, ClusterRoles, PriorityClasses, StorageClasses, IngressClasses, webhook configurations, and PersistentVolumes are global. If tenant A is allowed to create a CRD, that CRD’s kind now exists for everyone. If two tenants both want their own version of a widgets.example.com CRD at different schema versions, they cannot have it — there is exactly one CRD object cluster-wide, and last write wins. This is the CRD-conflict problem, and it is fatal for any tenancy model where tenants install their own operators.

Second, a misbehaving admission webhook is a cluster-wide outage. A tenant-owned validating webhook with failurePolicy: Fail that goes down can block API writes for the whole cluster, including the control plane’s own operations.

Third, resource discovery leaks metadata. Even with airtight RBAC on objects, the existence of certain cluster-scoped names, node information, and API group versions is visible. Namespaces partition object names; they do not virtualize the API surface.

This is the precise gap that virtual control planes — vCluster — exist to close, and we will get to how shortly.

Node and kernel isolation

The deepest boundary is the kernel. By default, containers from every tenant share the host kernel via the same container runtime. A container-escape vulnerability — a runc CVE, a kernel privilege-escalation bug — lets a compromised pod break out and reach every other pod scheduled on that node, regardless of namespace, RBAC, or network policy. None of the soft primitives touch this, because they all operate above the kernel.

Three mitigations exist, in increasing strength. Dedicated node pools with taints and tolerations keep a tenant’s pods on nodes only that tenant uses, so an escape is contained to that tenant’s blast radius rather than the whole fleet. It is cheap to reason about and works with any runtime, but it fragments capacity and weakens bin-packing — the very efficiency you went multi-tenant to get. gVisor (runsc) interposes a user-space kernel between the container and the host, intercepting syscalls so the real kernel is rarely touched directly; strong isolation, with a syscall-heavy performance tax. Kata Containers runs each pod in a lightweight microVM with its own guest kernel, giving near-VM isolation at higher memory and startup cost. For genuinely untrusted code — multi-tenant SaaS running customer containers — kernel isolation is not optional, and it composes with everything above it. For a deeper treatment of the hardware-rooted end of this, see our piece on confidential containers on Kubernetes.

vCluster, Capsule, and a Decision Matrix

Once soft tenancy’s control-plane ceiling becomes the binding constraint, two families of tooling answer it differently. Capsule keeps the single shared control plane but adds a tenant abstraction on top; vCluster gives each tenant its own virtual control plane. They solve overlapping but distinct problems.

Capsule is a multi-tenancy operator. It introduces a Tenant custom resource that groups namespaces under an owner and enforces policies across them — a tenant can self-serve new namespaces within limits, and Capsule auto-applies network policies, quotas, RBAC, and admission constraints to every namespace in the tenant. It is, in effect, a productized, opinionated version of the soft-tenancy stack from the previous section, plus namespace self-service. Crucially, it still uses one shared API server, so it does not solve the CRD-conflict or cluster-scoped-resource problem. It makes soft tenancy manageable at scale; it does not make it hard.

How vCluster virtual control planes work

Figure 2: A vCluster runs a real but lightweight Kubernetes API server (often k3s) inside a pod in the host cluster, backed by its own etcd or sqlite. The tenant’s kubectl talks only to this virtual API server. A syncer component watches the virtual cluster and copies the pods (and the resources they need) down into a single host namespace, where the host scheduler places them on shared nodes. Status flows back up. The tenant sees a private cluster; the host sees ordinary pods.

This is the architectural move that matters. With vCluster, each tenant gets what looks like a dedicated cluster: their own API server, their own etcd, their own view of namespaces, and — critically — their own CRDs and cluster-scoped resources. Tenant A can install widgets.example.com v1 and tenant B can install it at v2, because each lives in a separate virtual control plane. Cluster-admin in a vCluster is harmless to the host; it only grants admin over the virtual cluster. The CRD-conflict problem, the cluster-scoped-leak problem, and the “tenant wants cluster-admin” problem all dissolve.

What vCluster does not isolate by default is the kernel. The syncer copies real pods down to the host, where they run as ordinary containers on shared nodes with the shared host kernel. So vCluster gives you strong control-plane isolation and weak-by-default kernel isolation — which is why the serious hard-multi-tenancy pattern is vCluster plus Kata or gVisor plus dedicated node pools: virtual control plane for the API surface, sandboxed runtime for the kernel. A minimal vCluster values sketch:

# vcluster values.yaml sketch
controlPlane:
  distro:
    k8s:
      enabled: true
  backingStore:
    etcd:
      embedded:
        enabled: true
sync:
  toHost:
    pods:
      enabled: true
    services:
      enabled: true
  fromHost:
    nodes:
      enabled: true
      selector:
        labels:
          pool: tenant-blue
policies:
  podSecurityStandard: restricted

The decision matrix below maps the isolation models against the dimensions that actually drive the choice. Read it as: pick the leftmost (cheapest) row whose isolation columns clear your threat model.

Model Control-plane isolation Network isolation Kernel isolation Cost Ops overhead Blast radius
Plain namespace Weak (shared API, shared CRDs) Opt-in via NetworkPolicy None (shared kernel) Lowest Low Whole cluster
HNC (hierarchical namespaces) Weak (still shared API) Opt-in, inherited None Lowest Low–medium Whole cluster
Capsule Weak–medium (policy-enforced, shared API) Auto-applied per tenant None Low Medium Whole cluster
vCluster Strong (own API, own CRDs) Per virtual cluster None by default (add Kata/gVisor) Medium Medium–high Host node fleet
Dedicated cluster Full Full Full (own nodes) Highest High Single tenant only

Hierarchical Namespace Controller (HNC) deserves a mention: it lets you nest namespaces in a parent-child tree so policy, RBAC, and quotas propagate downward. It is excellent for organizing a single soft-tenant cluster — org → team → service — and reducing the toil of applying the same RBAC and network policy to dozens of namespaces. But it shares the control plane, so it sits in the same weak-control-plane bucket as plain namespaces. Use it for ergonomics, not for hard isolation.

Trade-offs, Gotchas, and What Goes Wrong

Every model has a failure mode that bites in production, usually months after rollout when the original architect has moved on. Figure 4 sketches the selection flow, but the flow only works if you know where each path leaks.

Decision flow for choosing a tenancy model

Figure 4: A selection flow — start from who your tenants are (trusted vs untrusted), branch on whether they need their own CRDs or cluster scope, then on whether kernel isolation is required, and land on namespaces+HNC+Capsule, vCluster, vCluster+Kata, or dedicated clusters.

Cluster-scoped leaks in soft tenancy. The recurring incident: someone grants a tenant a ClusterRole to “make a thing work,” and now that tenant can list secrets or nodes cluster-wide. Audit for ClusterRoleBindings pointing at tenant subjects; they should be nearly empty. PSA and Kyverno do not catch an over-broad RBAC grant — RBAC is its own attack surface.

CRD conflicts. Two operators, one CRD name, two schemas — one of them silently breaks on the next upgrade. If your tenants install Helm charts that bundle CRDs, you are one chart bump away from a cross-tenant outage. This is the clearest signal you have outgrown namespaces and need vCluster.

vCluster sync limits. The syncer is powerful but not transparent. Not every resource is synced by default, webhooks and certain cluster-scoped behaviours need explicit configuration, and high object churn in a virtual cluster adds reconciliation load on the host. vCluster also concentrates blast radius at the host: a host-node compromise still reaches every tenant’s pods unless you added kernel sandboxing. Virtual control planes isolate the API, not the silicon.

Cost sprawl with dedicated clusters. “Just give everyone a cluster” sounds clean until you are running 60 clusters, each with its own control-plane spend, ingress, monitoring stack, and patch cadence. Fleet management (Cluster API, Fleet, Argo across clusters) becomes a full-time platform investment. Dedicated clusters trade one hard problem (isolation) for another (fleet ops at scale).

Policy drift. Whatever model you pick, the guardrails are only as good as their enforcement. Namespaces created out-of-band without the default-deny NetworkPolicy, quotas that were “temporarily” raised, PSA labels missing on a new namespace — drift accumulates. Enforce creation through a controller (Capsule, HNC, or an admission policy that requires the baseline) rather than trusting humans to remember.

Practical Recommendations

Choose plain namespaces + RBAC + quota + NetworkPolicy + PSA when tenants are trusted internal teams, none need their own CRDs or cluster-scoped resources, and a kernel escape reaching neighbours is an acceptable (low-likelihood) risk. This covers the majority of internal platforms.

Choose Capsule or HNC on top of that when you have many namespaces and tenants and the toil of consistently applying guardrails is the real problem — i.e., you need namespace self-service and policy automation, not stronger isolation.

Choose vCluster when tenants need their own CRDs, cluster-admin, or operators, or when control-plane blast radius (a tenant’s webhook taking down everyone) is unacceptable — but you still want to share expensive nodes, especially GPUs.

Choose vCluster + Kata/gVisor + dedicated node pools when tenants run untrusted or arbitrary code and a kernel escape must not cross tenant boundaries.

Choose dedicated clusters when compliance, data residency, hostile-tenant threat models, or hard regulatory separation demand it — and you have the fleet-ops maturity to run them.

Checklist before you commit: (1) Write the threat model first — who are the tenants and what is the worst case? (2) Enumerate the five isolation dimensions and mark which ones your threat model actually requires. (3) Pick the cheapest model that clears them. (4) Enforce guardrails through a controller, not documentation. (5) Re-audit ClusterRoleBindings and cluster-scoped resources quarterly. (6) Plan the kernel-isolation story explicitly — do not let “shared nodes” be an accident.

Frequently Asked Questions

Is a namespace enough for multi-tenancy?

For trusted internal teams, a namespace plus RBAC, ResourceQuota, NetworkPolicy, and Pod Security Admission is a reasonable soft-tenancy boundary. It is not enough when tenants are untrusted, need their own CRDs or cluster-scoped resources, or when a kernel escape or shared-API-server outage crossing tenant boundaries is unacceptable. A namespace isolates names; it does not isolate the API server, the kernel, or the network by default.

What is the difference between soft and hard multi-tenancy?

Soft multi-tenancy assumes semi-trusted tenants (internal teams who err but do not attack) and partitions one shared cluster with namespaces and policy. Hard multi-tenancy assumes untrusted or hostile tenants and demands strong boundaries — virtual control planes, sandboxed runtimes, sometimes dedicated nodes — so that a compromise of one tenant cannot reach another. The dividing line is your trust assumption about tenants, not a specific tool.

How does vCluster isolate tenants?

vCluster runs a real but lightweight Kubernetes API server (commonly k3s) inside a pod in the host cluster, with its own etcd or sqlite backing store. Each tenant’s kubectl talks only to this virtual API server, giving them their own CRDs, cluster-scoped resources, and even cluster-admin without touching the host. A syncer copies the resulting pods down to a host namespace where the shared scheduler runs them. It isolates the control plane strongly but shares the host kernel by default.

Does vCluster provide kernel-level isolation?

No, not by default. The syncer schedules tenant pods as ordinary containers on shared host nodes using the shared host kernel, so a container escape can still reach other tenants on the same node. For kernel isolation you combine vCluster with a sandboxed runtime — Kata Containers (microVM per pod) or gVisor (user-space kernel) — and typically dedicated node pools via taints and tolerations.

When should I use dedicated clusters instead?

Use dedicated clusters when compliance, data residency, hostile-tenant threat models, or hard regulatory separation require full hardware-level isolation, and when you have the fleet-management maturity (Cluster API, GitOps across clusters, centralized observability) to operate many clusters. The trade-off is the highest cost and operational overhead of any model — you replace the isolation problem with a fleet-ops-at-scale problem.

What is Capsule and how is it different from vCluster?

Capsule is a multi-tenancy operator that adds a Tenant custom resource over the shared control plane, automating namespace self-service, RBAC, quotas, and network policies per tenant. vCluster instead gives each tenant a separate virtual control plane. Capsule makes soft tenancy manageable at scale but does not solve CRD conflicts or cluster-scoped leaks; vCluster does, because each tenant’s API surface is independent.

Further Reading

By Riju — about

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *