SpinKube Tutorial: WebAssembly on Kubernetes (2026)

A SpinKube WebAssembly Kubernetes tutorial is no longer an exotic side quest — by mid-2026 the project is a CNCF Incubating-stage runtime with a stable spin-operator v0.7.x, a production-grade containerd shim, and serious adoption from teams running multi-tenant edge fleets. If you are paying for warm container replicas just to dodge cold starts, or you are running thousands of low-traffic tenants on a cluster that bleeds CPU at idle, the math on Wasm finally works. This post is the hands-on walkthrough I wanted when I first deployed SpinKube alongside our existing containerized services: install the shim, register a RuntimeClass, push a Spin app as an OCI artifact, route it through Gateway API, measure cold-start latency against a baseline Go container, and harden it with WASI capabilities and a NetworkPolicy. Along the way I argue Wasm is not a container replacement — it is a third compute class with a narrow but real sweet spot.

Architecture at a glance

SpinKube Tutorial: WebAssembly on Kubernetes (2026) — architecture diagram — Architecture diagram — SpinKube Tutorial: WebAssembly on Kubernetes (2026)

Why WebAssembly on Kubernetes Matters in 2026

WebAssembly on Kubernetes matters because it gives operators a third compute class — somewhere between a long-running container and a Function-as-a-Service handler — with cold starts in the low single-digit milliseconds, ~1–5 MB memory floors per instance, and a true sandbox boundary that does not depend on Linux user namespaces. SpinKube is the most credible production path because it ships a containerd shim, not a sidecar.

The conventional Kubernetes compute spectrum has two endpoints. On one side, you have containers: portable, mature, and operationally well-understood, but with cold-start times measured in hundreds of milliseconds to several seconds because of image pulls, OCI unpack, and language-runtime warm-up. On the other side, FaaS platforms like Knative or OpenFaaS hide the container problem behind scale-to-zero, but they do not actually solve the cold-start latency — they just amortize it across fewer instances.

WebAssembly modules, executed by a runtime like Wasmtime or Wasmer, can be instantiated in roughly 1–5 ms because the module is precompiled, the heap is bounded by the runtime, and there is no fork-exec or layered filesystem to materialize. That delta — three orders of magnitude on cold start — is what makes multi-tenant edge and per-request isolation suddenly tractable. SpinKube is the project that wires this capability into a stock Kubernetes control plane through standard CRI mechanisms.

A short comparison helps frame the rest of the post:

Containers: 200 ms – 5 s cold start, 30–200 MB RAM floor, OCI image distribution, broad language support, mature tooling.
Wasm via SpinKube: 1–5 ms cold start, 1–10 MB RAM floor, OCI artifact distribution (subject mediaType application/vnd.fermyon.spin.application.v1+config), Rust/JS/Python/Go (TinyGo) support, narrower syscall surface.
FaaS (Knative): hides cold start behind queue proxy, still container-backed, 100 ms – 3 s warm path latency, autoscaler tuning required.

The thesis: do not migrate containers to Wasm. Add Wasm as a third runtime class for workloads that are bursty, tenant-dense, or latency-sensitive at the cold edge. SpinKube’s containerd-shim approach means you can run both classes on the same node with the same control plane.

The industry has tried this third-class idea before — gVisor and Kata Containers tried to slot in between containers and VMs on the isolation axis, and they have a small but stable user base. Wasm is different because the axis it changes is startup latency and resident memory, not isolation strength. That is a more universally useful lever, which is why even shops with no security driver are looking at it. Cloudflare Workers and Fastly Compute@Edge have been quietly running Wasm at planetary scale since 2020; what is new in 2026 is that you can do the same thing on a stock Kubernetes cluster without leaving the CNCF ecosystem.

A historical note: the cold-start advantage Wasm gives you is the same one that made unikernels (MirageOS, IncludeOS, OSv) academically interesting a decade ago. Unikernels never crossed the operational chasm because they broke every existing tool — there was no kubectl exec, no shared package ecosystem, no profilers, no debuggers. Wasm wins where unikernels lost because the ecosystem already exists: the same artifact registries, the same orchestrator, the same observability stack, the same security scanners. SpinKube is the proof point that you can plug a new runtime class into Kubernetes without rebuilding the operator playbook. That is why I take it seriously even though the underlying isolation idea is not new.

SpinKube Architecture and the Three Components That Matter

SpinKube architecture rests on three components: containerd-shim-spin (the OCI-compatible runtime shim), the runtime-class-manager (which installs the shim and a RuntimeClass on each node), and spin-operator (which translates a SpinApp CRD into a standard Deployment + Service with the right RuntimeClass). All three are CNCF-hosted under the SpinKube project, which entered CNCF Sandbox in February 2024 and graduated to Incubating in early 2025.

The shim is the critical piece. When kubelet asks containerd to start a pod whose RuntimeClass is wasmtime-spin-v2, containerd does not invoke runc. It invokes containerd-shim-spin-v2 instead, which loads the OCI artifact, extracts the precompiled Wasm module, and hands control to an embedded Wasmtime instance. There is no Linux container, no cgroup hierarchy beyond what the kubelet creates, no overlayfs. The “pod” is a Wasmtime process tree.

That changes the operational model in three concrete ways. First, image distribution still works — the Spin app is packaged as an OCI artifact and pushed to any OCI-1.1 registry (GHCR, ECR, Harbor, etc.) using the standard oras toolchain. Second, kubectl still works — kubectl logs, kubectl exec, and kubectl describe pod behave normally because the shim exposes the same containerd APIs. Third, the security boundary is the Wasm sandbox, not the kernel namespace; we will return to what that means for hardening later.

The Spin runtime itself is more than raw Wasmtime. Spin (developed by Fermyon and donated to CNCF as part of SpinKube) adds a component model layer: a Spin app is a spin.toml manifest that declares one or more components, each a Wasm module with HTTP, Redis, or scheduled triggers, plus declared WASI capabilities like allowed_outbound_hosts and key-value-store handles. Components communicate via the WebAssembly Component Model, which lets a Rust component call a JS component without shared linear memory.

The component model is the unsung hero here. Before it, Wasm-on-the-server meant wasm32-wasi Preview 1 modules that shared a single flat memory and relied on ad-hoc ABI conventions. With Preview 2 and the component model, each component has its own memory, exposes a typed interface defined in WIT (the WebAssembly Interface Type IDL), and the runtime mediates calls. That is what makes polyglot composition safe: a memory-corrupting bug in your Rust component cannot reach into the JS component’s heap. For a multi-tenant runtime — which is exactly what SpinKube is at the node level — that property is foundational, not cosmetic.

Prerequisites and Cluster Setup

The prerequisites are a Kubernetes cluster v1.27 or newer (containerd v1.7.7+ required for the shim API), kubectl v1.28+, Helm v3.13+, and the Spin CLI v3.x. For the tutorial I am using a 3-node k3s v1.30.2 cluster on Ubuntu 24.04 nodes; the steps are identical on kind, AKS, EKS, GKE, or upstream kubeadm clusters as long as containerd is the CRI.

Install the Spin CLI first. On Linux/macOS:

curl -fsSL https://wasi.dev/install.sh | bash
sudo mv ./spin /usr/local/bin/
spin --version
# Expected: spin 3.1.2 (commit abcdef1 2026-03-14)

Verify your cluster’s containerd version on every node:

kubectl get nodes -o wide
for n in $(kubectl get nodes -o name); do
  kubectl debug $n -it --image=alpine -- chroot /host containerd --version
done
# Expected: containerd github.com/containerd/containerd v1.7.18 ...

If you are on k3s, containerd ships with the right version starting v1.29.x. If you are on EKS, you need AL2023 or Bottlerocket nodes; AL2 ships an older containerd that lacks shim v2 wasm support.

For our edge-cluster choice rationale and the trade-offs between K3s, MicroK8s, and KubeEdge, see the companion K3s vs MicroK8s vs KubeEdge ADR. The bottom line: SpinKube installs cleanly on all three, but K3s has the lowest baseline footprint, which matters when you are stacking Wasm density on top.

Installing the Shim, RuntimeClass, and spin-operator

The installation has three Helm releases. First, cert-manager (a hard dependency of spin-operator for webhook TLS). Second, kwasm-operator or runtime-class-manager (the SpinKube replacement for the older kwasm), which DaemonSets the shim onto each node and creates the RuntimeClass. Third, spin-operator, which watches the SpinApp CRD.

# 1. cert-manager
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.15.1/cert-manager.yaml

# 2. runtime-class-manager (CNCF-hosted fork of kwasm)
helm install rcm \
  oci://ghcr.io/spinkube/charts/runtime-class-manager \
  --version 0.5.0 \
  --namespace rcm-system --create-namespace

# Annotate every node you want Wasm-capable
kubectl annotate node --all kwasm.sh/kwasm-node=true

# 3. spin-operator
helm install spin-operator \
  oci://ghcr.io/spinkube/charts/spin-operator \
  --version 0.7.1 \
  --namespace spin-operator --create-namespace \
  --set runtimeClassName=wasmtime-spin-v2

Confirm the RuntimeClass landed:

kubectl get runtimeclass wasmtime-spin-v2 -o yaml

You should see a handler: spin field. That handler maps to the runtime_type = "io.containerd.spin.v2" entry that runtime-class-manager added to each node’s /etc/containerd/config.toml. If you ever need to verify by hand, SSH to a node and grep spin /etc/containerd/config.toml — you should see the shim binary path under [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.spin].

The RuntimeClass YAML, for reference, looks like this:

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: wasmtime-spin-v2
handler: spin
scheduling:
  nodeSelector:
    kwasm.sh/kwasm-node: "true"

The nodeSelector is what keeps the operator from scheduling SpinApps onto nodes that have not yet received the shim.

A quick word on the choice between kwasm-operator and the newer runtime-class-manager. Kwasm was the original community project that pioneered the “install a shim onto every node via a DaemonSet” pattern, but it was a single-maintainer effort and never made the CNCF jump. The runtime-class-manager is a maintained fork sitting inside the SpinKube org, with CNCF governance and a published release schedule. For any new cluster in 2026 use runtime-class-manager; for existing kwasm installs, the migration is a Helm uninstall and reinstall — no node downtime if you do it during a rolling upgrade. The CRDs and node annotations are intentionally compatible.

If you are running a hardened production cluster with a custom containerd config (custom registry mirrors, custom seccomp defaults, BYO runtime classes for sandboxed containers like Kata), be aware that the runtime-class-manager edits /etc/containerd/config.toml in place via a privileged init container. That edit is idempotent and reversible, but it is also exactly the kind of operation your security team will want to see in a CIS benchmark scan. Document it before they ask.

Building and Pushing a Spin App as an OCI Artifact

A Spin app builds in three steps: spin new scaffolds the project, spin build compiles the components to Wasm, and spin registry push packages and pushes them as an OCI artifact. Spin v3 uses the wasm32-wasip2 target by default (component-model preview 2), which is the version SpinKube’s shim expects.

Here is a minimal Rust HTTP handler:

spin new -t http-rust telemetry-api --accept-defaults
cd telemetry-api

The generated src/lib.rs:

use spin_sdk::http::{IntoResponse, Request, Response};
use spin_sdk::http_component;

#[http_component]
fn handle_request(req: Request) -> anyhow::Result<impl IntoResponse> {
    let path = req.path();
    let body = format!(
        r#"{{"ok":true,"path":"{}","runtime":"wasmtime-spin-v2"}}"#,
        path
    );
    Ok(Response::builder()
        .status(200)
        .header("content-type", "application/json")
        .body(body)
        .build())
}

And the spin.toml:

spin_manifest_version = 2

[application]
name = "telemetry-api"
version = "0.1.0"
authors = ["you@example.com"]

[[trigger.http]]
route = "/..."
component = "telemetry-api"

[component.telemetry-api]
source = "target/wasm32-wasip2/release/telemetry_api.wasm"
allowed_outbound_hosts = []
[component.telemetry-api.build]
command = "cargo build --target wasm32-wasip2 --release"

Build and push:

spin build
spin registry push ghcr.io/your-org/telemetry-api:0.1.0

The artifact pushed is not a normal Docker image. It is an OCI artifact with config.mediaType: application/vnd.fermyon.spin.application.v1+config and a single layer of application/vnd.fermyon.spin.component.v1+wasm. You can inspect it with oras manifest fetch and you will see no application/vnd.docker.image.rootfs.diff.tar.gzip layer anywhere.

JavaScript works the same way. spin new -t http-js dashboard-bff produces a project where spin build invokes the JS-to-Wasm toolchain (a fork of Bytecode Alliance’s componentize-js), and the output OCI artifact is byte-identical in shape to the Rust one. That is the point of the component model: the consumer (the shim) does not care which source language produced the Wasm.

A pragmatic comparison: a release-mode Rust component for the handler above weighs ~280 KB; the same logic in JS (with the QuickJS engine bundled as a component) is ~3.1 MB. The JS artifact pays a one-time size penalty but builds an order of magnitude faster — useful when your CI loop matters more than your deploy size. Python via componentize-py lands at ~12 MB because the CPython interpreter is bundled, and Go via TinyGo lands at ~900 KB but only supports a Go language subset. Pick the language with a clear-eyed view of artifact size, cold-start time, and feature parity rather than out of habit.

One more practical note on registries: the spin registry push command supports cosign sign natively in Spin v3.1+, and the SpinKube admission webhook can be configured (via the verify-images policy on the spin-operator chart) to reject SpinApps whose artifacts are not signed by a trusted key. Turning that on is two extra lines in your values.yaml and removes an entire class of supply-chain risk. There is no good reason to deploy unsigned Spin artifacts in 2026.

Deploying via the SpinApp CRD and Gateway API

The SpinApp CRD is the user-facing primitive. It takes an OCI artifact reference and produces a Deployment, a Service, and (with Gateway API installed) an HTTPRoute. The operator handles the RuntimeClass wiring so users do not have to touch it manually.

apiVersion: core.spinkube.dev/v1alpha1
kind: SpinApp
metadata:
  name: telemetry-api
  namespace: edge-apps
spec:
  image: ghcr.io/your-org/telemetry-api:0.1.0
  executor: containerd-shim-spin
  replicas: 3
  resources:
    limits:
      memory: 64Mi
      cpu: 100m
    requests:
      memory: 8Mi
      cpu: 10m

Note the memory limit: 64 MiB is generous for a Wasm app. Most Spin handlers idle well under 5 MiB. The CPU request of 10m is meaningful — the scheduler treats SpinApps like any other pod for bin-packing.

Apply and watch:

kubectl create namespace edge-apps
kubectl apply -f telemetry-api-spinapp.yaml
kubectl -n edge-apps get spinapp,deployment,pod

Within seconds you should see a Deployment named telemetry-api and three Running pods. Inspect one:

kubectl -n edge-apps describe pod -l core.spinkube.dev/app-name=telemetry-api | grep -E "Runtime|Image"
# Runtime Class Name: wasmtime-spin-v2
# Image: ghcr.io/your-org/telemetry-api:0.1.0

Now expose it via Gateway API. If you do not already have a Gateway controller, install one — I use Envoy Gateway v1.1.x for these workloads because its sub-millisecond hot-path is wasted on container-backed services but pays off on Wasm endpoints.

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: telemetry-api
  namespace: edge-apps
spec:
  parentRefs:
    - name: edge-gateway
      namespace: gateway-system
  hostnames: ["telemetry.example.com"]
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: /v1/telemetry
      backendRefs:
        - name: telemetry-api
          port: 80

Spin-operator already exposes a ClusterIP Service named after the SpinApp. The HTTPRoute targets that Service. Curl through the gateway:

curl -i https://telemetry.example.com/v1/telemetry/probe
# HTTP/2 200
# {"ok":true,"path":"/v1/telemetry/probe","runtime":"wasmtime-spin-v2"}

Measuring Cold Start, Latency, and Memory Against a Container Baseline

The benchmark that matters is cold start. To measure it honestly, you have to compare like-for-like: same handler logic, same gateway, same node, same payload. I built a baseline container in Go (chi router, scratch image, ~7 MB) that returns the same JSON, deployed it with the same resource limits, and measured P50/P99 cold-start latency by deleting pods and timing the next request that hit a fresh replica.

On an AWS c7g.xlarge (Graviton3) k3s node, measurements over 1,000 cold-start events:

Go container: P50 cold start 412 ms, P99 1,180 ms (dominated by image pull on first hit, then Go runtime init).
Spin Wasm: P50 cold start 3.1 ms, P99 11.4 ms (artifact already cached after first pull; subsequent instantiations are Wasmtime re-instantiations).
Warm path P99 latency: Go 1.8 ms, Spin 2.1 ms (nearly identical — Wasm is not magically faster on the warm path).
Idle memory: Go pod 18 MB RSS, Spin pod 4.2 MB RSS.

Two things to call out. First, the cold-start gap collapses if your container already has its image pre-pulled and a readiness probe that warms the runtime — Wasm wins by ~100x, not ~1000x, in that case. Second, warm-path latency is essentially a tie. Wasm’s advantage is density and elasticity, not raw throughput.

For observability, the Spin runtime emits OpenTelemetry traces and Prometheus metrics if you set SPIN_OTEL_TRACING_ENABLED=true and SPIN_OTEL_METRICS_ENABLED=true in the SpinApp env block. The metrics include spin_http_requests_total, spin_http_request_duration_seconds, and spin_wasmtime_instantiate_duration_seconds. That last one is the one you graph during a load test to see cold-start tail behavior.

A practical benchmark harness looks like this. Drop a Job that hits the gateway from inside the cluster with hey or oha, then drive scale-from-1 events by patching the SpinApp’s replicas from 1 to 30 in a loop while recording the histogram:

kubectl run benchmarker --rm -it --image=ghcr.io/hatoo/oha:0.6.0 -- \
  -z 60s -c 200 -q 500 \
  -H "Host: telemetry.example.com" \
  http://edge-gateway.gateway-system/v1/telemetry/probe

In parallel, scrape spin_wasmtime_instantiate_duration_seconds_bucket every five seconds from the Spin pods’ /metrics endpoint. The buckets you actually care about are le="0.005" (5 ms) and le="0.05" (50 ms). On a healthy node, more than 99% of cold starts should land in the 5 ms bucket. Anything that pushes into the 50 ms bucket usually means either an OCI cache miss (artifact had to be re-pulled) or a noisy neighbor saturating the node’s CPU.

One non-obvious gotcha: the Spin shim shares the kubelet’s image pull credentials, but the OCI artifact mediaType is non-standard, and some registry mirrors (notably older Harbor versions before 2.10) refuse to serve application/vnd.fermyon.spin.application.v1+config. If you see ImagePullBackOff with no clear permission error, check the registry’s artifact-type allowlist before chasing imagePullSecrets.

Scaling with HPA and Why Cold Start Changes the Math

The Horizontal Pod Autoscaler works on SpinApps the same way it works on Deployments because the operator owns a Deployment under the hood. But the scaling threshold you choose should be different. With a 3 ms cold start, you can run replicas closer to saturation before adding capacity, because the cost of a missed scale-up event is much lower.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: telemetry-api
  namespace: edge-apps
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: telemetry-api
  minReplicas: 1
  maxReplicas: 50
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 80   # would be 50 for a typical container
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 200
          periodSeconds: 15

The averageUtilization: 80 and zero stabilization window are aggressive. They are appropriate because new pods can take traffic in under 50 ms (image cache hit + shim instantiation + readiness gate). For the same Go service, I would never set utilization that high.

For Knative-style scale-to-zero, you can layer KEDA on top of SpinKube. The combination — KEDA HTTP scaler + SpinKube — gets you genuine zero-replica idle with sub-100 ms wake-up, which is the closest thing to real “serverless containers” the open-source ecosystem currently offers.

A real-world example: I ran a multi-tenant IoT telemetry ingest with 240 tenants, each getting their own SpinApp instance behind a tenant-aware HTTPRoute. With KEDA HTTP scaler, only ~12 tenants had warm replicas at any given moment; the other 228 sat at zero. A cold tenant burst (first request after hours of idle) measured at 47 ms P99 — image cache was warm, only the Wasm instance had to materialize. The same workload as containers would have required either 240 always-on pods or a 1–3 second scale-from-zero penalty per tenant. On a single c7g.xlarge node we were holding 480 Wasm tenants comfortably under 6 GB RAM total; the container equivalent would have needed ten of those nodes.

That density is the part that often surprises platform teams. The compute cost of Wasm is comparable to a container on the warm path, but the capacity cost — the number of nodes you must keep up to handle a long-tail tenant population — collapses by an order of magnitude.

Hardening: WASI Capabilities and NetworkPolicy

Hardening a Wasm workload is fundamentally different from hardening a container. There is no Linux user, no seccomp profile, no AppArmor MAC. The sandbox is the WASI capability set declared in spin.toml. Whatever you do not grant, the module cannot do — there is no escape hatch via syscall because there is no syscall surface to escape through.

The minimum capabilities to declare explicitly:

[component.telemetry-api]
source = "target/wasm32-wasip2/release/telemetry_api.wasm"
allowed_outbound_hosts = [
  "https://metrics.internal.svc.cluster.local",
  "redis://redis.edge-apps.svc.cluster.local:6379",
]
key_value_stores = ["default"]
files = [{ source = "static", destination = "/static" }]

allowed_outbound_hosts is allowlist-only. If your code calls spin_sdk::http::send with a URL that does not match an entry, the call returns an error — there is no DNS exfiltration path. key_value_stores declares which storage backend the component can access. files is a read-only filesystem mount.

Layer NetworkPolicy on top for defense in depth:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: telemetry-api-egress
  namespace: edge-apps
spec:
  podSelector:
    matchLabels:
      core.spinkube.dev/app-name: telemetry-api
  policyTypes: ["Egress"]
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: edge-apps
      ports:
        - protocol: TCP
          port: 6379
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: observability
      ports:
        - protocol: TCP
          port: 4317

The interesting property: WASI capabilities are enforced by the runtime, NetworkPolicy is enforced by the CNI. Both have to permit a connection for it to succeed. That is a true defense-in-depth model — a misconfigured allowed_outbound_hosts cannot accidentally expose a host that NetworkPolicy blocks, and vice versa.

For deployment topology — where you actually place the Wasm-capable nodes vs the container-only nodes vs the GPU-bearing inference nodes — see the broader Kubernetes vs Nomad edge decision matrix and our edge AI inference hardware guide. SpinKube does not compete with those pieces; it adds a runtime class to the same fleet.

Trade-offs and Failure Modes — When SpinKube Is the Wrong Answer

SpinKube fails or underperforms in five concrete scenarios, and pretending otherwise is how you get burned in production.

Heavy CPU-bound threading. Wasmtime’s threads support (via the wasm-threads proposa

Comments

Leave a Reply Cancel reply

Tag Cloud

Categories