Cilium Tetragon Runtime Security: eBPF Hands-On (2026)

Cilium Tetragon runtime security puts a fully programmable enforcement layer directly inside the Linux kernel. It catches malicious process execution, unauthorized file writes, and suspicious network connections before they complete — not after the damage is done. This post gives you a working, step-by-step guide to deploying Tetragon in a Kubernetes cluster, writing your first TracingPolicy, and blocking real attack patterns in 2026.

What this covers:
– Why runtime security needs eBPF instead of auditd or seccomp
– How Tetragon works: eBPF hooks, TracingPolicy CRDs, and in-kernel enforcement
– A full hands-on walkthrough: block writes to /etc and detect suspicious exec calls
– Tetragon vs Falco: trade-offs, kernel requirements, and operational overhead
– Practical recommendations and an FAQ

Context: Why Runtime Security Needs eBPF

Traditional runtime security approaches share a painful bottleneck. They sit in userspace and wait for the kernel to hand them information about events that have already occurred.

The auditd and seccomp ceiling

auditd sends syscall records to userspace over a netlink socket. By the time your SIEM receives a write event, the write has already committed to disk. Seccomp is faster — it blocks syscalls before they execute — but it operates at process startup via a static allowlist baked into a container’s seccomp profile. It cannot reason about runtime context: which file is being opened, what the calling binary’s parent is, or whether a network connection targets a known-bad IP.

Neither tool gives you Kubernetes-level context. They see PIDs and syscall numbers. They cannot tell you that PID 8192 is a sidecar in the payments namespace making an outbound TCP connection to an unusual IP.

What eBPF changes

eBPF programs run inside the kernel’s verified execution environment. They attach to kprobes, tracepoints, and LSM hooks with negligible overhead compared to copying events across the kernel–userspace boundary. They can read process metadata, file path dentries, and network socket state right where the event happens. The kernel verifier guarantees they cannot crash the kernel or loop forever.

Tetragon uses this to do something no userspace tool can match: it can fire a SIGKILL or override a syscall return value from inside the kernel, in the same execution path as the event, before the operation succeeds. This is in-kernel enforcement, not post-hoc alerting.

The eBPF ecosystem has matured rapidly. The eBPF.io foundation now catalogs over 80 production eBPF projects, and the CNCF Security Technical Advisory Group lists eBPF-based runtime enforcement as a Tier 1 cloud-native security control for 2026.

How Tetragon Works

Tetragon is a CNCF incubating project originally developed by Isovalent (now part of Cisco). It runs as a DaemonSet on every Kubernetes node and ships a set of eBPF programs that hook into the kernel alongside a userspace agent that aggregates, enriches, and exports events.

Figure 1 — Tetragon architecture. eBPF programs run in the kernel and write events to a ring buffer. The Tetragon agent reads the buffer, enriches events with pod and namespace context, and exports structured JSON over gRPC. TracingPolicy CRDs configure which kernel hooks are active.

eBPF Hooks and the Process Lifecycle

Tetragon attaches eBPF programs at four primary hook sites.

kprobes and kretprobes intercept the entry and return of specific kernel functions. A kprobe on security_file_open captures every file-open attempt before the VFS layer grants access. A kretprobe on tcp_connect captures outbound connection attempts with the socket’s destination IP and port in scope.

Tracepoints attach to stable kernel trace events like sched_process_exec and sched_process_exit. These are preferred over kprobes for process lifecycle events because they use a stable ABI that does not break across minor kernel versions.

LSM hooks (Linux Security Module hooks, available from kernel 5.7+ with CONFIG_BPF_LSM) allow eBPF programs to sit in the kernel’s main security decision path. This is the hook site that enables Tetragon’s hardest enforcement actions: returning an error code directly from the LSM hook denies the operation atomically.

The agent maintains a process cache keyed by PID and mount namespace. Every event is enriched with the originating binary path, command-line arguments, UID/GID, capability set, and — critically for Kubernetes — the pod name, namespace, and container ID. This enrichment happens in the agent, not the eBPF program, to stay within the verifier’s complexity budget.

Figure 2 — Process and event lifecycle. When a process executes, the eBPF program looks up the parent context in the process cache. The enriched event flows to the Tetragon agent and is exported as a structured JSON message containing binary path, arguments, pod identity, and namespace.

The TracingPolicy CRD

Every behavior Tetragon observes or enforces is expressed as a TracingPolicy Kubernetes custom resource. This is the interface between cluster operators and the eBPF layer.

A TracingPolicy specifies:
– Which kernel function or tracepoint to hook (kprobes, tracepoints, or lsmhooks)
– Which arguments to capture (by index and type: int, string, file, sock)
– Which selectors must match before an action fires (binary name, namespace, file prefix, UID, capabilities)
– Which action to take on a match: Post (observe only), Sigkill (terminate the process in-kernel), or Override (return a specific errno)

Here is a minimal but complete TracingPolicy that observes all execve calls cluster-wide:

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: observe-all-exec
spec:
  kprobes:
    - call: "sys_execve"
      syscall: true
      args:
        - index: 0
          type: "string"
      selectors:
        - matchActions:
            - action: Post

Apply it with:

kubectl apply -f observe-all-exec.yaml

Tetragon compiles the policy into eBPF bytecode, loads it into the kernel, and immediately starts emitting ProcessExec events. No agent restart is required.

Observability vs In-Kernel Enforcement

This distinction matters operationally. Tetragon supports two fundamentally different operating modes, and you can mix them in the same cluster.

Observe mode (action: Post) emits structured events for every matched operation. The process continues normally. Use this mode during initial rollout to understand your baseline before writing blocking rules. The structured output is JSON and maps directly to the OpenTelemetry semantic conventions for security events.

Enforce mode (action: Sigkill or action: Override) acts in the kernel execution path. Sigkill sends SIGKILL to the offending process immediately, before the syscall returns to userspace. Override causes the syscall to return the specified errno (e.g., EPERM) — the process sees a failed call but is not terminated. Use Override when you want to deny an operation without killing the container (useful for applications that handle permission errors gracefully).

For more on how Tetragon integrates with Cilium’s broader networking security model, see the Cilium service mesh sidecarless eBPF deep dive and the Cilium 1.17 service mesh tutorial.

Installing Tetragon

Tetragon installs as a Helm chart from the official Cilium repository. The chart deploys the DaemonSet, the CRD definitions, and an optional Prometheus metrics endpoint.

Prerequisites:
– Kubernetes 1.27 or newer
– Linux kernel 4.19 minimum; 5.3 or newer strongly recommended for BTF (BPF Type Format) support, which eliminates kernel-header build dependencies
– Helm 3.x

Add the Helm repo and install:

helm repo add cilium https://helm.cilium.io
helm repo update

helm install tetragon cilium/tetragon \
  --namespace kube-system \
  --set tetragon.grpc.address=localhost:54321 \
  --set tetragon.exportFilename=/var/run/cilium/tetragon/tetragon.log \
  --set tetragon.enableK8sAPI=true \
  --version 1.2.0

Verify the DaemonSet is running on all nodes:

kubectl rollout status daemonset/tetragon -n kube-system
kubectl get pods -n kube-system -l app.kubernetes.io/name=tetragon

Install the tetra CLI for querying the local agent gRPC endpoint:

TETRAGON_VERSION=1.2.0
curl -LO "https://github.com/cilium/tetragon/releases/download/v${TETRAGON_VERSION}/tetra-linux-amd64.tar.gz"
tar -xzf tetra-linux-amd64.tar.gz
mv tetra /usr/local/bin/

Stream live events from any node:

tetra getevents -o compact

You will immediately see process exec and exit events for everything running on the node, enriched with pod and namespace metadata.

Hands-On: Detect and Block

This section walks through two concrete enforcement scenarios: blocking writes to /etc from non-privileged pods, and killing any process that executes a known-suspicious binary.

Scenario 1: Block Writes to /etc

Attackers who gain code execution inside a container often attempt to modify /etc/passwd, /etc/hosts, or /etc/cron.d. Even with a read-only root filesystem, a misconfigured volume mount can expose /etc as writable.

The following TracingPolicy hooks security_file_open (an LSM hook point) and terminates any process that opens a file under /etc for writing, unless it is running as UID 0 in the kube-system namespace:

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: block-etc-writes
spec:
  kprobes:
    - call: "security_file_open"
      syscall: false
      args:
        - index: 0
          type: "file"
      selectors:
        - matchArgs:
            - index: 0
              operator: "Prefix"
              values:
                - "/etc/"
          matchCapabilities:
            - type: Effective
              operator: "NotIn"
              values:
                - "CAP_SYS_ADMIN"
          matchNamespaces:
            - namespace: Pod
              operator: "NotIn"
              values:
                - "kube-system"
          matchActions:
            - action: Sigkill

Apply and test:

kubectl apply -f block-etc-writes.yaml

# In a test pod without CAP_SYS_ADMIN:
kubectl run test-pod --image=busybox --restart=Never -- sh -c "echo test > /etc/passwd"

# The process is killed in-kernel. The pod exits with a non-zero code.
kubectl logs test-pod
# Output will show nothing — the write never reached the buffer.

Verify the enforcement event:

tetra getevents -o compact | grep block-etc-writes
# Example output:
# 🔴 SIGKILL test-pod/busybox /bin/sh /etc/passwd (block-etc-writes)

Scenario 2: Detect and Kill Reverse Shells

Reverse shells often exec into /bin/bash, /bin/sh, or common tools like nc, ncat, or python3 with arguments that include -e, -c, or an IP address. This TracingPolicy matches any execve of nc or ncat within non-system namespaces and kills the process immediately:

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: block-reverse-shell-tools
spec:
  kprobes:
    - call: "sys_execve"
      syscall: true
      args:
        - index: 0
          type: "string"
      selectors:
        - matchArgs:
            - index: 0
              operator: "In"
              values:
                - "/bin/nc"
                - "/usr/bin/nc"
                - "/bin/ncat"
                - "/usr/bin/ncat"
                - "/usr/bin/python3"
          matchNamespaces:
            - namespace: Pod
              operator: "NotIn"
              values:
                - "kube-system"
                - "monitoring"
          matchActions:
            - action: Sigkill

This is deliberately aggressive. In practice, adjust the binary list and add matchLabels selectors to scope it to production namespaces, not developer or CI namespaces where you may legitimately use these tools.

Understanding the Enforcement Decision Flow

Figure 3 — In-kernel enforcement decision flow. The eBPF program evaluates every kernel event against the loaded TracingPolicy selectors. If no selector matches, the event is optionally logged with Post and the call proceeds. If a selector matches, the configured action (Sigkill or Override) fires in the kernel execution path before the syscall returns.

The critical insight is the position of enforcement. The SIGKILL is delivered inside the kernel, not by a userspace daemon polling a log stream. The latency between the event and the enforcement action is measured in nanoseconds, not milliseconds. A credential-scraping process that lives for 50 ms to dump /etc/shadow and exit has no window to operate.

Reading Tetragon Events Programmatically

Tetragon exports events as newline-delimited JSON on its export file or via the gRPC API. A ProcessKprobe event for the block-etc-writes policy looks like:

{
  "process_kprobe": {
    "process": {
      "exec_id": "a1b2c3...",
      "pid": 18234,
      "uid": 1000,
      "binary": "/bin/sh",
      "arguments": "-c echo test > /etc/passwd",
      "pod": {
        "namespace": "default",
        "name": "test-pod",
        "container": { "name": "busybox", "image": "busybox:latest" }
      }
    },
    "function_name": "security_file_open",
    "args": [{ "file_arg": { "path": "/etc/passwd" } }],
    "action": "KPROBE_ACTION_SIGKILL"
  }
}

This structured output integrates directly with Elasticsearch, Loki, or any SIEM that accepts JSON. The Kubernetes enrichment — namespace, pod name, container name, image — is present on every event, eliminating the correlation gymnastics required when joining kernel audit logs with Kubernetes metadata post-hoc.

Tetragon vs Falco: Trade-offs

Both Tetragon and Falco are CNCF projects. Both provide runtime security observability for Kubernetes workloads. They differ significantly in architecture, enforcement capability, and operational profile.

Figure 4 — Tetragon vs Falco comparison. Tetragon enforces at the kernel layer with in-kernel SIGKILL and Override actions. Falco detects in userspace and delegates enforcement to external sidekick integrations. Both share CNCF lineage, open-source licensing, and Kubernetes workload context enrichment.

Kernel Version Requirements

Tetragon requires at minimum Linux 4.19 for basic kprobe support. For production use, 5.3 or newer is strongly recommended because it enables BTF (BPF Type Format), which provides CO-RE (Compile Once, Run Everywhere) portability across kernel versions without needing kernel headers at runtime. LSM-based enforcement actions require 5.7+ with CONFIG_BPF_LSM enabled.

Falco’s eBPF probe also requires a recent kernel (4.14+ for the eBPF driver), but its kernel module driver has broader compatibility back to 3.x kernels. For teams running older nodes — common in IoT edge deployments or long-lifecycle enterprise clusters — Falco’s driver flexibility is a genuine advantage.

Performance Overhead

Direct comparison numbers depend heavily on workload type, event volume, and which hooks are active. Qualitatively:

Tetragon’s eBPF programs run in the kernel execution path with no user–kernel boundary crossing for each event. Overhead per observed syscall is very low — benchmarks from the Isovalent/Cisco team show single-digit microsecond hook latency in representative workloads.
Falco’s eBPF probe copies event data to userspace for every syscall in scope. Under high-syscall workloads (e.g., busy database servers performing thousands of read/write calls per second), this copy cost accumulates. Falco’s kernel module is slightly faster than its eBPF driver but introduces kernel module maintenance risk.

Neither tool has zero overhead. Enabling broad kprobe coverage across all syscalls will measurably impact performance. Start with targeted policies and expand based on measured impact.

Policy Language and Expressiveness

Falco rules use a powerful YAML/condition DSL that supports complex boolean expressions, lists, and macros. The community maintains a large Falco rules library covering hundreds of attack patterns. This is a meaningful operational advantage for teams that want day-one coverage without writing custom policies.

Tetragon’s TracingPolicy CRD is lower-level and more expressive at the kernel function level. You can hook any kernel function by name, capture raw argument values, and apply fine-grained selectors. This power comes with responsibility: you need to know which kernel function to hook for the behavior you want to detect. The Tetragon docs at tetragon.io/docs provide a growing library of example policies, but it is still smaller than Falco’s ecosystem.

Enforcement Capability

This is the sharpest difference. Tetragon enforces in-kernel. Falco detects in userspace and relies on Falcosidekick integrations to trigger external actions (Lambda functions, webhooks, kubectl delete). The round-trip from detection to action in a Falco+Falcosidekick pipeline is measured in hundreds of milliseconds to seconds. For a fast-moving exploit that exfiltrates data in under a second, that window matters.

Teams that need deterministic in-kernel enforcement should use Tetragon. Teams that prioritize rule ecosystem breadth and integration flexibility can start with Falco and consider adding Tetragon for the highest-sensitivity enforcement use cases.

Operating Both Together

Tetragon and Falco are not mutually exclusive. Some production teams run Falco for its broad rule library and SIEM integrations and layer Tetragon on top for specific high-value enforcement policies. The overhead doubles but so does defense depth. A kill from Tetragon with zero-latency enforcement plus an alert and ticket from Falco+Falcosidekick gives both immediate containment and a durable audit trail.

For a broader view of zero-trust principles that complement runtime enforcement, see zero-trust architecture for industrial OT and IoT.

Practical Recommendations

Use this checklist when operationalizing Tetragon in a production cluster.

Before deploy:
– [ ] Confirm node kernel versions. Aim for 5.3+ across the fleet. If nodes run 4.x kernels, test each policy against that kernel version in staging.
– [ ] Identify your most sensitive namespaces. Apply enforcement policies there first, observe-only everywhere else.
– [ ] Export the baseline process inventory: run action: Post on sys_execve for 48 hours. Build an allowlist of expected binaries per namespace.

Policy authoring:
– [ ] Always start with action: Post. Validate events appear as expected before switching to action: Sigkill.
– [ ] Use matchNamespaces and matchLabels selectors to scope every enforcement policy. Global blocking policies for common binaries will break CI pipelines and debug workflows.
– [ ] Pin Tetragon CRD versions. TracingPolicy spec fields have changed between minor versions. Test policy YAML against the target CRD version before upgrading.
– [ ] Keep enforcement policies small and single-purpose. One TracingPolicy per behavior pattern is easier to audit and roll back than a monolithic policy.

Operations:
– [ ] Pipe Tetragon JSON events to your SIEM. Add a tetragon.action == KPROBE_ACTION_SIGKILL filter as a high-priority alert. Every kill is a confirmed enforcement event, not a detection guess.
– [ ] Track tetragon_process_cache_misses_total in Prometheus. A rising miss rate means Tetragon is seeing processes without parent context, which degrades enrichment quality.
– [ ] Maintain a staging cluster that mirrors production kernel versions. New Tetragon versions and new TracingPolicies should both pass staging validation before production rollout.
– [ ] Document every enforcement policy in your runbook: what it blocks, what the expected false-positive rate is, and who owns it.

Incident response:
– [ ] Configure Tetragon’s gRPC export to feed your SOC SIEM in real time. ProcessExec + ProcessKprobe + ProcessExit events together reconstruct a full process tree for forensic analysis.
– [ ] When investigating a kill event, use tetra getevents --pod <name> to pull the full event sequence for that pod. The enriched JSON gives you binary path, arguments, parent PID, and file paths — everything you need to determine whether the kill was legitimate or a false positive.

FAQ

Does Tetragon work without Cilium CNI?

Yes. Tetragon is a standalone project. It does not require Cilium as the cluster CNI. The DaemonSet installs independently, and TracingPolicy CRDs work with any CNI. Using Cilium CNI alongside Tetragon enables additional network policy integration and shared eBPF map access, but it is not a prerequisite.

What is the minimum kernel version for in-kernel enforcement?

Basic Sigkill enforcement via kprobe-based policies works from kernel 4.19. For Override (errno injection) and LSM-based policies, you need 5.7+ with CONFIG_BPF_LSM=y. Check with grep CONFIG_BPF_LSM /boot/config-$(uname -r). BTF-based CO-RE portability requires 5.3+.

Can Tetragon policies be namespace-scoped, not cluster-wide?

Yes. TracingPolicy is a cluster-scoped CRD, but selectors within the policy can restrict enforcement to specific Kubernetes namespaces, pod labels, and container names. Tetragon 1.x also introduced TracingPolicyNamespaced, a namespace-scoped variant of the CRD that lets namespace owners manage their own policies without cluster-admin privileges.

How does Tetragon handle multi-container pods?

The Tetragon agent resolves container identity from the PID’s cgroup namespace and cross-references the Kubernetes API. Each event carries the container name and image as enrichment fields. In a multi-container pod, events from the init container, the main container, and any sidecars are individually tagged. You can write selectors that scope a policy to a specific container name within a pod.

Will a TracingPolicy that fires SIGKILL cause Kubernetes to restart the pod?

That depends on the pod’s restartPolicy. For restartPolicy: Always (the default for Deployment pods), Kubernetes will restart the container after a SIGKILL just as it would for any other container crash. If you want the pod to stay down after enforcement fires, use restartPolicy: Never for the workload, or pair Tetragon enforcement with a NetworkPolicy to isolate the pod while you investigate. For immediate containment in an incident, follow up with kubectl delete pod after the kill event.

Is Tetragon production-ready in 2026?

Tetragon graduated to CNCF incubating status and is running in production at multiple large-scale Kubernetes deployments. Isovalent/Cisco ships it as part of the Cilium Enterprise platform. The 1.x release line has a stable CRD API. The main operational risks are policy-authoring complexity and the need to validate policies against specific kernel versions. Treat it as production-ready for teams with eBPF operational experience and staging validation processes in place.

Cilium Tetragon Runtime Security: eBPF Hands-On (2026)

Cilium Tetragon Runtime Security: eBPF Hands-On (2026)

Context: Why Runtime Security Needs eBPF

The auditd and seccomp ceiling

What eBPF changes

How Tetragon Works

eBPF Hooks and the Process Lifecycle

The TracingPolicy CRD

Observability vs In-Kernel Enforcement

Installing Tetragon

Hands-On: Detect and Block

Scenario 1: Block Writes to /etc

Scenario 2: Detect and Kill Reverse Shells

Understanding the Enforcement Decision Flow

Reading Tetragon Events Programmatically

Tetragon vs Falco: Trade-offs

Kernel Version Requirements

Performance Overhead

Policy Language and Expressiveness

Enforcement Capability

Operating Both Together

Practical Recommendations

FAQ

Further Reading

Related

Comments

Leave a Reply Cancel reply

Tag Cloud

Categories