Grafana Alloy: An OpenTelemetry Collector Tutorial
This Grafana Alloy tutorial is for the engineer who already runs an OpenTelemetry pipeline and wants to know whether Alloy is worth adopting, and how to wire it up without re-learning observability from scratch. Alloy is Grafana’s distribution of the OpenTelemetry Collector. It bundles the upstream Collector’s receivers, processors, and exporters into a single binary, then wraps them in a component-graph configuration language that also speaks native Prometheus scraping, Loki log shipping, and Pyroscope profiling. If you have been running Grafana Agent in flow mode, you have effectively already been running early Alloy — it is the supported successor, and Agent reached end of life in November 2025.
The practical question is not “what is Alloy” but “what changes operationally when I move from a vanilla otelcol binary or a Grafana Agent to Alloy.” That is what this guide answers, with config you can paste and adapt.
What this covers: Alloy’s lineage, its component pipeline model, a worked metrics-plus-logs-plus-traces config, Kubernetes deployment and clustering, the trade-offs against the upstream Collector, and the failure modes that bite in production.
Context: Alloy vs Grafana Agent vs OTel Collector
Three things commonly get conflated, so it is worth nailing down the relationships before any config.
The OpenTelemetry Collector is the upstream, vendor-neutral CNCF project. It defines the receiver-processor-exporter pipeline model and ships components for OTLP, Prometheus, Jaeger, Kafka, and dozens of backends. You build a distribution by selecting components — the otelcol-contrib build is the kitchen-sink version. Its config is YAML: you declare components, then list them in pipelines to define dataflow.
Grafana Agent was Grafana’s earlier telemetry agent. It had two modes: a static mode and a “flow” mode introduced in 2023 that used a component graph and the River configuration syntax. Flow mode was the prototype for what became Alloy.
Grafana Alloy, released as 1.0 in April 2024, is the rename-and-graduation of Agent flow mode. It is a full OpenTelemetry Collector distribution — every otelcol.* component maps to an upstream Collector component — plus first-class prometheus.*, loki.*, pyroscope.*, and discovery.* components for the Grafana ecosystem. So Alloy is simultaneously a conformant OTel Collector and a Prometheus/Loki/Pyroscope agent in one process.
The headline difference from vanilla otelcol is the config language. Alloy does not use the Collector’s YAML pipelines. It uses a component graph expressed in a HCL-derived syntax (formerly called River, now just “the Alloy configuration syntax”), where you reference one component’s exported output as another component’s input. If you have written Terraform, the mental model transfers directly. This is a real divergence: an upstream Collector YAML file will not run on Alloy unmodified, and vice versa. The components underneath are the same code; the wiring language is not.
For a deeper treatment of the upstream model that Alloy builds on, see our OpenTelemetry Collector architecture and pipelines guide.
Alloy’s component pipeline model
Alloy’s runtime is a directed graph of components. Each component does one job — receive OTLP, batch telemetry, scrape a target, write to a backend — and exposes typed outputs that other components consume by reference. There is no separate “pipeline” section as in upstream YAML; the pipeline is the set of references between components. Alloy resolves the graph at load time, detects cycles, and only starts components that are actually reachable.

Telemetry from instrumented apps and hosts enters one OTLP receiver, passes through shared processor components, then fans out to per-signal exporters writing to Prometheus, Loki, and Tempo.
Components, arguments, and exports
A component is a block with a type, a label, and a body. The type (otelcol.receiver.otlp) selects behaviour; the label (default) makes it addressable. Inside the block you set arguments (inputs) and read exported fields (outputs). Wiring happens through expressions: you point one component at another’s exported field, and Alloy infers the dependency edge.

Each Alloy component takes arguments as inputs and publishes exported fields, which downstream components reference to form the pipeline graph.
For OTel-style signals the convention is explicit output blocks. A receiver does not “export” a pipe; instead you tell the receiver which downstream components should receive each signal type. Consider this minimal trace path (illustrative, trimmed for clarity):
// Receive OTLP over gRPC and HTTP
otelcol.receiver.otlp "default" {
grpc {
endpoint = "0.0.0.0:4317"
}
http {
endpoint = "0.0.0.0:4318"
}
output {
traces = [otelcol.processor.batch.default.input]
}
}
// Batch before export
otelcol.processor.batch "default" {
output {
traces = [otelcol.exporter.otlphttp.tempo.input]
}
}
// Export to a Tempo OTLP endpoint
otelcol.exporter.otlphttp "tempo" {
client {
endpoint = "http://tempo:4318"
}
}
Read it bottom-up to see the references: the receiver’s output.traces points at otelcol.processor.batch.default.input, the batch processor’s output.traces points at otelcol.exporter.otlphttp.tempo.input. Those .input fields are exported by each component. Alloy turns those references into graph edges. This is the whole model — there is no third place where you declare the pipeline order.
Signal types are wired independently
A subtle but important property: metrics, logs, and traces are separate wires. A receiver’s output block has metrics, logs, and traces lists, and you can send each to a different chain. That is why a single OTLP receiver can feed a Prometheus remote-write path, a Loki write path, and a Tempo path simultaneously without any of them touching the others. You are not configuring “a pipeline”; you are configuring three independent dataflows that happen to share an entry point.
Native Grafana components alongside otelcol
Beyond otelcol.*, Alloy ships components that have no upstream Collector equivalent. prometheus.scrape pulls metrics from targets discovered by discovery.kubernetes; prometheus.remote_write ships them to Mimir. loki.source.kubernetes tails pod logs; loki.write ships them. Crucially, you can bridge between worlds: otelcol.exporter.prometheus converts OTLP metrics into the prometheus.* format so they can be handed to prometheus.remote_write. That bridge is what makes Alloy a genuine superset rather than two agents stapled together.
Walkthrough: building a metrics+logs+traces pipeline
Let us build a single Alloy config that accepts OTLP for all three signals and routes each to the right Grafana backend. This mirrors the most common adoption pattern: applications already emit OTLP, and you want Alloy to be the gateway that normalises and forwards.

One OTLP receiver splits the three signals into independent processor chains, each ending at its backend: metrics to Mimir via remote write, logs to Loki, traces to Tempo with tail sampling.
Step 1: the OTLP receiver and signal split
The receiver is the single entry point. We define all three output lists here, each pointing at the head of its chain.
otelcol.receiver.otlp "default" {
grpc { endpoint = "0.0.0.0:4317" }
http { endpoint = "0.0.0.0:4318" }
output {
metrics = [otelcol.processor.batch.metrics.input]
logs = [otelcol.processor.resourcedetection.default.input]
traces = [otelcol.processor.batch.traces.input]
}
}
Note the separate batch processors for metrics and traces — they have different latency and size characteristics, so it is common to tune them independently rather than share one.
Step 2: the metrics path to Mimir
Metrics get batched, converted from OTLP to Prometheus format, then remote-written. The converter component is the bridge mentioned earlier.
otelcol.processor.batch "metrics" {
output {
metrics = [otelcol.exporter.prometheus.to_mimir.input]
}
}
// Convert OTLP metrics into prometheus.* series
otelcol.exporter.prometheus "to_mimir" {
forward_to = [prometheus.remote_write.mimir.receiver]
}
prometheus.remote_write "mimir" {
endpoint {
url = "http://mimir:9009/api/v1/push"
// queue_config tuning omitted for brevity
}
}
The forward_to argument is how the otelcol-to-prometheus handoff works: the exporter emits Prometheus-native series and forwards them to prometheus.remote_write.mimir.receiver. From here on you are in Prometheus-land, which means standard remote-write queue tuning, relabeling, and WAL semantics apply.
Step 3: the logs path to Loki
Logs benefit from resource detection so that Kubernetes and cloud attributes land as labels. Then we batch and hand off to a Loki exporter.
otelcol.processor.resourcedetection "default" {
detectors = ["env", "system"]
output {
logs = [otelcol.processor.batch.logs.input]
}
}
otelcol.processor.batch "logs" {
output {
logs = [otelcol.exporter.loki.default.input]
}
}
// Bridge OTLP logs into loki.* format
otelcol.exporter.loki "default" {
forward_to = [loki.write.default.receiver]
}
loki.write "default" {
endpoint {
url = "http://loki:3100/loki/api/v1/push"
}
}
The same bridge pattern recurs: otelcol.exporter.loki converts OTLP log records into the Loki line format and forwards them to a loki.write component, which owns the actual HTTP push, batching, and backpressure.
Step 4: the traces path with tail sampling
Traces are where processors earn their keep. We batch, then apply tail sampling to keep all errors and slow traces while sampling the rest. Tail sampling is illustrative here — the policies below are a starting point, not a recommendation for your traffic.
otelcol.processor.batch "traces" {
output {
traces = [otelcol.processor.tail_sampling.default.input]
}
}
otelcol.processor.tail_sampling "default" {
// Illustrative policies — tune to your trace volume
policy {
name = "keep-errors"
type = "status_code"
status_code { status_codes = ["ERROR"] }
}
policy {
name = "keep-slow"
type = "latency"
latency { threshold_ms = 500 }
}
policy {
name = "probabilistic-rest"
type = "probabilistic"
probabilistic { sampling_percentage = 10 }
}
output {
traces = [otelcol.exporter.otlphttp.tempo.input]
}
}
otelcol.exporter.otlphttp "tempo" {
client { endpoint = "http://tempo:4318" }
}
A real caveat: tail sampling needs to see a complete trace, so all spans of a trace must reach the same Alloy instance. In a clustered deployment that means you either run tail sampling on a single-replica tier or use trace-ID-aware load balancing in front of the sampling tier. Getting this wrong silently drops spans and produces broken traces — a classic gotcha covered in the trade-offs section.
Step 5: validate before you ship
Before running anything, lint and dry-run the config:
alloy fmt config.alloy
alloy run --stability.level=generally-available config.alloy
alloy fmt catches syntax errors and normalises formatting; alloy run will refuse to start and print the offending component if the graph has an unresolved reference or a cycle. Alloy’s built-in UI on port 12345 (/graph) renders the live component graph, which is the fastest way to confirm your references resolved the way you intended.
Deployment and scaling
Alloy ships as a single static binary with no runtime dependencies, which keeps deployment simple across the three common topologies: bare binary, Kubernetes DaemonSet, and clustered gateway.

A two-tier Kubernetes topology: per-node DaemonSet agents collect locally and forward OTLP to a clustered gateway tier that handles aggregation, sampling, and backend writes.
Binary and systemd
For a VM or edge host, the binary is the whole story. Run it as a systemd unit pointing at your config, and Alloy exposes its UI and metrics on port 12345. This is the right model for the IoT and digital-twin edge cases where you want one lightweight collector per gateway box, scraping local exporters and forwarding upstream. For why the Grafana stack fits edge observability, see our OpenTelemetry vs Prometheus and Loki edge observability ADR.
# Typical systemd-managed invocation
alloy run /etc/alloy/config.alloy \
--server.http.listen-addr=0.0.0.0:12345 \
--storage.path=/var/lib/alloy/data
Kubernetes DaemonSet
On Kubernetes, the standard pattern is a DaemonSet so one Alloy pod runs per node, collecting node-local logs and metrics and receiving OTLP from co-located workloads. The official Helm chart (grafana/alloy) supports this; you set the controller type and mount the config from a ConfigMap. A DaemonSet keeps log collection node-local (each agent only tails its own node’s pods via loki.source.kubernetes), which avoids the cross-node traffic of a centralised log scraper.
The two-tier pattern adds a separate gateway Deployment. DaemonSet agents do cheap local collection and forward OTLP to the gateway; the gateway does the expensive work — tail sampling, cardinality reduction, remote-write batching — at a controlled replica count. This isolates per-node resource pressure from aggregation logic and gives you one place to tune backend connections.
Clustering
Alloy clustering lets a set of replicas coordinate so that scrape targets and certain workloads are sharded across them automatically, rather than every replica doing all the work. You enable it with --cluster.enabled=true and a peer-discovery mechanism; components that opt in (notably prometheus.scrape) then distribute targets across the cluster using consistent hashing. When a replica joins or leaves, targets rebalance.
Clustering is what makes the gateway tier horizontally scalable for metrics scraping. It does not magically solve trace tail sampling, because tail sampling needs whole traces co-located — clustering shards targets, not trace assembly. Keep those two scaling problems separate in your head.
Trade-offs and what goes wrong
Alloy is not a free upgrade. The honest accounting:
Config portability is gone. This is the biggest adoption cost. Your existing otelcol YAML does not run on Alloy, and an Alloy config does not run on a vanilla Collector. If you value the ability to switch collector distributions freely, Alloy’s syntax locks you toward the Grafana ecosystem. The components are upstream code, but the wiring is proprietary syntax. Teams standardised on OTel YAML across heterogeneous vendors should weigh this carefully.
The component model has a learning curve. Engineers fluent in Collector YAML have to relearn where the pipeline lives. The most common early mistake is forgetting that the pipeline is the output/forward_to references, not a declared list — people define components and wonder why no data flows because nothing references them. Alloy’s /graph UI is the antidote; check it first when telemetry goes missing.
Tail sampling and clustering interact badly if you are naive. As noted, sharding targets across a cluster is fine for scraping but breaks tail sampling, which needs complete traces on one instance. The fix is a dedicated, trace-ID-load-balanced sampling tier — extra topology you have to design for, not a default.
The otelcol-to-prometheus/loki bridges add a conversion hop. Routing OTLP metrics through otelcol.exporter.prometheus into prometheus.remote_write means an OTLP-to-Prometheus translation, which has known edge cases around delta-versus-cumulative temporality and exemplar handling. If you only ever speak OTLP end to end, sending directly with otelcol.exporter.otlphttp to an OTLP-native backend avoids the conversion entirely. Choose the bridge only when your backend is Prometheus-protocol.
Cardinality still bites. Alloy does not protect you from a badly instrumented service. Resource detection and relabeling can add labels; without limits they inflate cardinality just as fast as any other collector. Budget for prometheus.relabel rules and metric filtering in the gateway tier.
For the kernel-level signals Alloy can ingest but not generate, pair it with eBPF tooling — see our guide on eBPF observability and kernel tracing for APM.
Practical recommendations
A short, opinionated checklist distilled from the above:
- Adopt Alloy if you are already on the Grafana stack (Mimir, Loki, Tempo, Pyroscope). The native
prometheus.*,loki.*, andpyroscope.*components plus the unified config make it the path of least resistance. If you are multi-vendor and value OTLP portability, staying on a vanillaotelcol-contribbuild may serve you better. - Use the two-tier DaemonSet-plus-gateway topology for anything beyond a single node. Cheap local collection at the edge, expensive aggregation centrally.
- Isolate tail sampling onto its own tier with trace-ID load balancing. Never let clustering shard your trace traffic.
- Prefer OTLP end to end when your backend speaks it; reach for the Prometheus/Loki conversion bridges only when the backend requires the Prometheus protocol.
- Lint with
alloy fmt, dry-run withalloy run, and verify with the/graphUI before every deploy. The graph view turns “where did my data go” into a thirty-second check. - Set cardinality limits in the gateway with
prometheus.relabeland metric filtering — Alloy will not do it for you. - Pin a stability level. Use
--stability.level=generally-availablein production so experimental components cannot silently sneak into a critical pipeline.
Alloy earns its place when you want one binary doing OTLP, Prometheus, Loki, and profiling with a single config and a real component graph behind it. The cost is the proprietary configuration syntax and a model you have to learn. For Grafana-stack shops, that trade is usually worth it.
FAQ
Is Grafana Alloy a replacement for Grafana Agent?
Yes. Alloy is the official successor to Grafana Agent. Agent flow mode was effectively early Alloy, and Grafana declared Agent end of life in November 2025. New deployments should use Alloy, and existing Agent flow-mode configs migrate with minimal changes since they share the component model and configuration syntax. Static-mode Agent users have a larger migration because static mode has no direct Alloy equivalent.
Can Alloy replace the OpenTelemetry Collector?
Functionally yes for most use cases — Alloy is a full Collector distribution and ships the same otelcol.* components as upstream. The catch is configuration: Alloy uses its own component-graph syntax rather than the Collector’s YAML pipelines, so configs are not portable between them. If you need vendor-neutral, portable Collector config, the upstream binary fits better.
Does Alloy support metrics, logs, and traces in one pipeline?
Yes, and they are wired independently. A single OTLP receiver exposes separate metrics, logs, and traces output lists, and you route each signal to its own processor-and-exporter chain. One Alloy process can simultaneously feed metrics to Mimir, logs to Loki, and traces to Tempo from the same entry point without the three paths interfering.
What configuration language does Grafana Alloy use?
Alloy uses a component-based configuration syntax derived from HCL, originally branded River and now simply called the Alloy configuration syntax. You declare components with arguments and reference their exported fields to build the pipeline graph. It is closer to Terraform than to YAML, and Alloy resolves the reference graph at load time to determine dataflow.
How does Alloy clustering work?
Clustering lets multiple Alloy replicas coordinate and shard work — primarily scrape targets — across the set using consistent hashing. Enable it with --cluster.enabled=true plus peer discovery. Components like prometheus.scrape opt in and distribute targets automatically, rebalancing when replicas join or leave. Note that clustering shards targets, not trace assembly, so it does not by itself make tail sampling scalable.
Should I run Alloy as a DaemonSet or a Deployment on Kubernetes?
Both, in a two-tier pattern. Run a DaemonSet so one agent per node handles local log and metric collection close to the source, and a separate clustered Deployment as a gateway for aggregation, tail sampling, and backend writes. The DaemonSet keeps collection node-local and cheap; the gateway centralises the expensive work at a controlled replica count.
Further reading
- Grafana Alloy documentation — official component reference and configuration syntax guide.
- OpenTelemetry Collector documentation — the upstream model Alloy builds on.
- OpenTelemetry Collector architecture and pipelines
- OpenTelemetry vs Prometheus and Loki edge observability ADR
- eBPF observability and kernel tracing for APM
