Backstage 1.30 IDP Setup: Templates, Catalog, TechDocs Production Guide

Backstage 1.30 IDP Setup: Templates, Catalog, TechDocs Production Guide

Backstage 1.30 IDP Setup: Templates, Catalog, TechDocs Production Guide

Last Updated: May 16, 2026

Most teams that adopt Backstage hit the same wall around month three: the developer-loved demo on a laptop refuses to graduate into a production internal developer platform. The frontend talks to a SQLite database that nukes itself on every pod restart, the scaffolder writes to a single engineer’s personal GitHub token, TechDocs renders from local files that no one else can see, and the “auth” is a dev-mode shim that anyone on the corporate VPN can log into. By Backstage 1.30 the platform finally has all the primitives to run a real, stateful, multi-tenant IDP — but only if you bypass the create-app starter’s defaults and wire each subsystem the way the upstream maintainers actually run it at Spotify. This Backstage IDP production setup 2026 guide walks through that wiring end-to-end: the new backend system, PostgreSQL-backed catalog, GitHub-integrated scaffolder, S3-backed TechDocs, OIDC auth with sign-in resolvers and RBAC, and a Kubernetes plugin that surfaces live workloads. Every snippet is taken from a stack we run in anger — including a working docker-compose.yml you can clone today.

Backstage 1.30: What Changed and Why It Matters for an IDP

If you last looked at Backstage in the 1.18–1.20 era, two things have flipped under your feet, and ignoring either of them in 2026 will saddle you with deprecated plumbing inside a year.

1. The new backend system is now the only backend system. The legacy createServiceBuilder/PluginEnvironment pattern — every plugin in packages/backend/src/plugins/*.ts, hand-wiring logger, database, cache, tokenManager, and friends — was deprecated through the 1.24–1.28 cycle and is fully retired in the 1.30 release line. The replacement is createBackend() from @backstage/backend-defaults, which uses dependency injection (coreServices) and module-style plugin registration. The result is a packages/backend/src/index.ts that is roughly 20 lines instead of 200, with every plugin self-registering its database, scheduler, and auth needs. If your codebase still has a plugins/ directory full of createPlugin(env) factories, you are on borrowed time — migrate before 1.32 drops the compatibility shims entirely.

2. The auth backend has been rewritten around @backstage/plugin-auth-backend-module-* modules and explicit sign-in resolvers. The 1.30 release tightens what was already a hard rule in 1.28: you can no longer rely on the default email-matching resolver in production, because it crashes loudly when a token’s email claim is missing or unverified, and silently grants access to the wrong user when two IdPs share a domain. You must register a resolver explicitly — usually oidcAuthenticator.signInResolverFactories.emailMatchingUserEntityProfileEmail plus a fallback — and you must back it with a Catalog that actually contains your User and Group entities.

Beyond those two, 1.30 ships first-class OpenTelemetry instrumentation through @backstage/backend-defaults, a stable Permission Framework that can finally express resource-typed conditions without custom code, and an updated Kubernetes plugin that supports the KubernetesAuthProvider interface so you can swap from service-account tokens to AWS IAM/EKS, GCP Workload Identity, or aks-aad-token without rewriting the plugin. The combined message is clear: Backstage is no longer a “starter project you fork and customize” — it is a platform you assemble out of versioned modules.

The architecture this lands you in looks like the diagram below: a thin React SPA, a Node 20 backend composed of plugin modules, PostgreSQL as the single source of state, S3 (or compatible) for built TechDocs sites, and Git as the source of truth for catalog YAML and scaffolder templates.

Backstage 1.30 high-level architecture

Production Architecture: Frontend, Backend, Catalog, Auth, DB

The Backstage docs default to a “one container, two processes, SQLite” layout that is fine for a laptop and dangerous in production. Real Backstage IDP production setup 2026 topology has at least five separable concerns: stateless frontend pods, stateless backend pods (which scale horizontally), a stateful PostgreSQL, an object store for TechDocs HTML, and an external identity provider. Let’s break each apart.

Backstage production deployment

Frontend and Backend Pods

The frontend is a static React bundle (yarn workspace app build) served by an Nginx sidecar. Treat it like any other SPA: gzip + brotli, immutable hashed assets, a short-TTL index.html so you can ship a release by replacing one file. The backend is a Node 20 server that exposes both the API (/api/*) and, when needed, the frontend bundle (packages/backend can serve the SPA in single-pod deployments — but in production you should run them as separate Deployments so an OOM in catalog ingestion doesn’t blank the UI).

Run at least three backend replicas behind a Kubernetes Service. The backend is mostly stateless but holds in-memory caches for permission decisions and catalog refresh schedules; below three replicas you lose all rolling-update headroom, and the scheduler quietly hammers Postgres because every pod thinks it owns every job. With three or more replicas the built-in scheduler service uses Postgres advisory locks (pg_try_advisory_lock) so exactly one pod runs each scheduled task — catalog refresh, TechDocs sync, group ingestion — at a time.

PostgreSQL: the Real Source of State

Use PostgreSQL 16 (RDS, CloudSQL, Aurora — managed, not self-hosted unless you already have a DBA team). Backstage stores nine to twelve schemas in a single database by default: catalog, scaffolder, auth, permission, search, techdocs, app, events, notifications, plus a couple of plugin-specific ones. Give it 2 vCPU / 8 GB RAM as a baseline; the heavy table is catalog.refresh_state, which can reach tens of thousands of rows with reasonable catalog sizes, and scaffolder.tasks if you do not aggressively prune completed tasks. Run a read replica in a different AZ as your failover target, and back up with PITR enabled.

Provision a dedicated DB user per plugin only if your security model demands it. The official recommendation as of 1.30 is one Backstage role with CREATE privilege on the database; the plugins manage their own schemas via Knex migrations. Mixing per-plugin users complicates migrations without much real benefit.

Auth, Permission, and the Catalog Coupling

This is the part most teams underestimate. Backstage auth resolves an OIDC token claim to a Catalog User entity — not to a database row, not to an LDAP group, not to a JWT scope. If your Catalog does not contain that user, the sign-in fails or, worse, falls back to a guest identity. So your ingestion pipeline has to run before your first user logs in: an OrgEntityProvider against Azure AD, Okta, or LDAP that creates a User entity per employee and a Group entity per team, on a schedule (15-minute refresh is sane).

The Permission backend then evaluates rules like “users in Group platform-team may read any Component entity owned by their group” against that catalog. If you skip Group ingestion you cannot write any meaningful policy — every rule degrades to “allow if signed in”, which is to say no policy at all.

Step 1 – Bootstrapping with the New Backend System

The new backend system makes the entrypoint almost embarrassingly small. After npx @backstage/create-app@latest (which scaffolds the 1.30 layout by default), here is the complete packages/backend/src/index.ts for a production stack:

// packages/backend/src/index.ts
import { createBackend } from '@backstage/backend-defaults';

const backend = createBackend();

// Core plugins — order does not matter; the backend resolves the DI graph.
backend.add(import('@backstage/plugin-app-backend/alpha'));
backend.add(import('@backstage/plugin-proxy-backend/alpha'));

// Catalog with the GitHub org and entity providers.
backend.add(import('@backstage/plugin-catalog-backend/alpha'));
backend.add(import('@backstage/plugin-catalog-backend-module-github/alpha'));
backend.add(import('@backstage/plugin-catalog-backend-module-github-org/alpha'));
backend.add(import('@backstage/plugin-catalog-backend-module-scaffolder-entity-model'));

// Auth: OIDC with our resolver.
backend.add(import('@backstage/plugin-auth-backend'));
backend.add(import('@backstage/plugin-auth-backend-module-oidc-provider'));
backend.add(import('@backstage/plugin-auth-backend-module-github-provider'));
backend.add(import('./modules/auth/oidcSignInResolver')); // local module

// Permissions and RBAC.
backend.add(import('@backstage/plugin-permission-backend/alpha'));
backend.add(import('@backstage-community/plugin-rbac-backend'));

// Scaffolder + actions.
backend.add(import('@backstage/plugin-scaffolder-backend/alpha'));
backend.add(import('@backstage/plugin-scaffolder-backend-module-github'));
backend.add(import('@backstage/plugin-scaffolder-backend-module-cookiecutter'));

// TechDocs with S3 publisher, external builder.
backend.add(import('@backstage/plugin-techdocs-backend/alpha'));

// Kubernetes plugin.
backend.add(import('@backstage/plugin-kubernetes-backend/alpha'));

// Search across catalog + TechDocs.
backend.add(import('@backstage/plugin-search-backend/alpha'));
backend.add(import('@backstage/plugin-search-backend-module-catalog/alpha'));
backend.add(import('@backstage/plugin-search-backend-module-techdocs/alpha'));
backend.add(import('@backstage/plugin-search-backend-module-pg/alpha'));

backend.start();

Why this works. Each backend.add(import(...)) registers a module, not an instance. The DI container resolves coreServices.logger, coreServices.database, coreServices.scheduler, etc. lazily, so plugins can declare their needs without you wiring them. What fails if you skip the new pattern: every legacy plugins/*.ts file you keep has to import @backstage/backend-common (deprecated), and starting in 1.32 those imports will throw at boot.

The companion package.json for packages/backend contains the install list — keep it minimal and pin versions to the same minor (1.30.x) across every @backstage/* package or you will hit type-mismatch errors at runtime:

{
  "dependencies": {
    "@backstage/backend-defaults": "^1.30.0",
    "@backstage/plugin-app-backend": "^1.30.0",
    "@backstage/plugin-auth-backend": "^1.30.0",
    "@backstage/plugin-auth-backend-module-oidc-provider": "^0.4.0",
    "@backstage/plugin-auth-backend-module-github-provider": "^0.3.0",
    "@backstage/plugin-catalog-backend": "^1.30.0",
    "@backstage/plugin-catalog-backend-module-github": "^0.10.0",
    "@backstage/plugin-catalog-backend-module-github-org": "^0.5.0",
    "@backstage/plugin-permission-backend": "^0.10.0",
    "@backstage-community/plugin-rbac-backend": "^4.0.0",
    "@backstage/plugin-scaffolder-backend": "^1.30.0",
    "@backstage/plugin-scaffolder-backend-module-github": "^0.7.0",
    "@backstage/plugin-techdocs-backend": "^1.30.0",
    "@backstage/plugin-kubernetes-backend": "^0.20.0",
    "@backstage/plugin-search-backend": "^1.10.0",
    "@backstage/plugin-search-backend-module-catalog": "^0.5.0",
    "@backstage/plugin-search-backend-module-techdocs": "^0.5.0",
    "@backstage/plugin-search-backend-module-pg": "^0.7.0",
    "pg": "^8.11.0"
  }
}

Boot it locally with yarn workspace backend start, and you should see one PG migration log per plugin followed by Listening on :7007. If you see database service "default" used with no host info, your app-config.yaml is being ignored — check NODE_ENV and the --config flag.

Step 2 – Software Catalog: Entity Kinds, YAML, Ingestion

The Catalog is the heart of any IDP. Everything else — scaffolder, TechDocs, K8s view, ownership, permissions — pivots on whether your catalog is accurate, freshly ingested, and consistent. The kinds you will care about are five: Component, API, System, Domain, Resource, plus the org graph (User, Group). The diagram below shows how they compose.

Software catalog entity model

A real-world entity file for the checkout-api service looks like this — committed as catalog-info.yaml to the same repo as the code:

# catalog-info.yaml at the root of the checkout-api repo
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: checkout-api
  description: REST API for the customer checkout flow
  annotations:
    github.com/project-slug: acme/checkout-api
    backstage.io/techdocs-ref: dir:.
    backstage.io/kubernetes-id: checkout-api
    backstage.io/kubernetes-namespace: payments
    pagerduty.com/integration-key: ${PD_KEY_CHECKOUT}
    sonarqube.org/project-key: acme:checkout-api
  tags:
    - java
    - kafka
    - tier-1
  links:
    - url: https://grafana.example.com/d/checkout/checkout-overview
      title: Grafana - Checkout Overview
      icon: dashboard
spec:
  type: service
  lifecycle: production
  owner: group:default/payments-team
  system: checkout
  providesApis:
    - checkout-rest-v2
  consumesApis:
    - fraud-grpc
  dependsOn:
    - resource:default/checkout-pg-db
    - component:default/shared-lib
---
apiVersion: backstage.io/v1alpha1
kind: API
metadata:
  name: checkout-rest-v2
spec:
  type: openapi
  lifecycle: production
  owner: group:default/payments-team
  system: checkout
  definition:
    $text: ./openapi.yaml

Why this works. The github.com/project-slug annotation lets the GitHub plugin show PRs and releases. The backstage.io/techdocs-ref ties the entity to its docs directory. The kubernetes-id label is what the K8s plugin uses to filter pods. The $text reference inlines the OpenAPI file at ingestion time, so the API Explorer renders a live spec without you maintaining two copies. What fails if you skip annotations: every plugin tab on the entity page either disappears or shows “no data” — annotations are the wiring.

Configure ingestion in app-config.yaml:

catalog:
  rules:
    - allow: [Component, System, API, Resource, Location, Domain, User, Group, Template]
  providers:
    githubOrg:
      production:
        id: acme-org
        githubUrl: https://github.com
        orgs: ['acme']
        schedule:
          frequency: { minutes: 15 }
          timeout: { minutes: 10 }
    github:
      production:
        organization: 'acme'
        catalogPath: '/catalog-info.yaml'
        filters:
          branch: 'main'
          repository: '.*'
        schedule:
          frequency: { minutes: 5 }
          timeout: { minutes: 3 }
  locations: []   # everything comes from providers; no static URLs

The githubOrg provider walks your GitHub organization and creates User + Group entities from members and teams. The github provider scans every repo on main for catalog-info.yaml and ingests whatever it finds. Refresh on a 5-minute cadence for component discovery and 15 minutes for org metadata — anything tighter and you hit GitHub’s secondary rate limits at scale.

The trap most teams fall into: they enable discovery without enforcing schema validation on the YAML. Add catalog.rules.allow to whitelist kinds, run a CI check (yarn backstage-cli repo lint plus a custom YAML schema validator) on every PR that touches a catalog-info.yaml, and refuse to merge if validation fails. Without it, one developer fat-fingering kind: componnent will leak a dangling entity into the catalog and nothing will ever clean it up.

Step 3 – Scaffolder Templates: Self-Serve Repository Creation

The Scaffolder is where Backstage stops being a documentation site and starts being a platform. A template takes a form submission from an engineer and runs an ordered list of actions — generate code from a Cookiecutter, push to a new GitHub repo, register the result in the Catalog, optionally open a PR against an infra repo. Here is a working template that creates a TypeScript service, sets up CI, and registers it:

# templates/typescript-service/template.yaml
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
  name: typescript-service
  title: TypeScript Service
  description: Scaffold a TypeScript microservice with Dockerfile, CI, and catalog entry
  tags: [typescript, recommended]
spec:
  owner: group:default/platform-team
  type: service

  parameters:
    - title: Service basics
      required: [name, description, owner]
      properties:
        name:
          title: Name
          type: string
          pattern: '^[a-z0-9-]+$'
          ui:autofocus: true
          ui:help: 'Lowercase, hyphenated. Used for repo + image + namespace.'
        description:
          title: Description
          type: string
        owner:
          title: Owner
          type: string
          ui:field: OwnerPicker
          ui:options:
            allowedKinds: [Group]
        system:
          title: System
          type: string
          ui:field: EntityPicker
          ui:options:
            catalogFilter:
              kind: System

    - title: Repository
      required: [repoUrl]
      properties:
        repoUrl:
          title: Repository Location
          type: string
          ui:field: RepoUrlPicker
          ui:options:
            allowedHosts: [github.com]
            allowedOwners: [acme]

  steps:
    - id: fetch
      name: Fetch skeleton
      action: fetch:cookiecutter
      input:
        url: ./skeleton
        values:
          name: ${{ parameters.name }}
          description: ${{ parameters.description }}
          owner: ${{ parameters.owner }}
          system: ${{ parameters.system }}

    - id: publish
      name: Publish to GitHub
      action: publish:github
      input:
        repoUrl: ${{ parameters.repoUrl }}
        description: ${{ parameters.description }}
        defaultBranch: main
        repoVisibility: private
        requireCodeOwnerReviews: true
        protectDefaultBranch: true
        deleteBranchOnMerge: true
        topics: ['typescript', 'service', 'tier-2']

    - id: register
      name: Register in Catalog
      action: catalog:register
      input:
        repoContentsUrl: ${{ steps.publish.output.repoContentsUrl }}
        catalogInfoPath: '/catalog-info.yaml'

    - id: open-infra-pr
      name: Open PR against infra repo
      action: publish:github:pull-request
      input:
        repoUrl: github.com?repo=infra&owner=acme
        branchName: add-${{ parameters.name }}
        title: 'Add ${{ parameters.name }} to fleet manifests'
        description: 'Generated by Backstage scaffolder for ${{ parameters.name }}.'
        sourcePath: ./infra-overlay
        targetPath: ./services/${{ parameters.name }}

  output:
    links:
      - title: Repository
        url: ${{ steps.publish.output.remoteUrl }}
      - title: Open in Catalog
        icon: catalog
        entityRef: ${{ steps.register.output.entityRef }}
      - title: Infrastructure PR
        url: ${{ steps.open-infra-pr.output.pullRequestUrl }}

Why this works. The four steps run as a DAG: publish waits for fetch, register waits for publish (it needs the repo URL), open-infra-pr runs in parallel because nothing depends on it. The RepoUrlPicker and OwnerPicker widgets enforce that the engineer can only pick legal values — allowedHosts: [github.com] and allowedKinds: [Group] are not cosmetic; they are guardrails that prevent the scaffolder from being abused to write to arbitrary destinations. What fails if you skip the pickers: a curious engineer types github.com/acme-evil/private-fork, gets a 403 from the GitHub token, and you spend a Friday triaging “scaffolder is broken”.

Register the template by adding a Location entity that points at the template.yaml, or — and this is the cleaner way at scale — ingest the entire templates/ directory from a dedicated acme/backstage-templates repo via the same GitHub catalog provider. Templates are entities; treat them like every other source-controlled artifact.

Configure scaffolder behavior in app-config.yaml:

scaffolder:
  defaultAuthor:
    name: Backstage Bot
    email: backstage-bot@acme.com
  defaultCommitMessage: 'feat: scaffolded by backstage'
  concurrentTasksLimit: 5
  taskTimeoutJanitorFrequency: { minutes: 30 }
  taskWorkers: 3

concurrentTasksLimit: 5 and taskWorkers: 3 matter — without them a single buggy template can spin up dozens of concurrent jobs and exhaust your GitHub API quota across the org.

Step 4 – TechDocs with MkDocs and S3 Storage Backend

TechDocs is Backstage’s docs-as-code system: engineers write Markdown next to their code, MkDocs builds it, Backstage renders it. The defaults serve docs by building on-demand inside the backend pod, which is fine for ten components and falls over at fifty. The production pattern is external build, S3 publish, Backstage as read-only renderer — sometimes called “external builder, awsS3 publisher”. The diagram below traces the pipeline.

TechDocs build and serve pipeline

Configure it in app-config.yaml:

techdocs:
  builder: 'external'
  generator:
    runIn: 'local'
  publisher:
    type: 'awsS3'
    awsS3:
      bucketName: 'acme-backstage-techdocs'
      region: 'us-east-1'
      credentials:
        accessKeyId: ${AWS_ACCESS_KEY_ID}
        secretAccessKey: ${AWS_SECRET_ACCESS_KEY}
      bucketRootPath: 'production/'

Why this works. builder: 'external' tells the backend “do not build docs in this pod — assume they are already in S3”. The CI side does the building: every push to main in a component repo triggers a GitHub Actions job that runs npx @techdocs/cli generate --source-dir . --output-dir site followed by npx @techdocs/cli publish --publisher-type awsS3 --storage-name acme-backstage-techdocs --entity default/Component/checkout-api. The CLI writes both the HTML and the techdocs_metadata.json file under the conventional prefix <namespace>/<kind>/<name>/. What fails if you skip the external builder: every docs view triggers an in-pod build, which spins up MkDocs, downloads Python deps, and serializes per-component-id behind a single Node process — at scale, the backend pod CPU pegs at 100 percent and the catalog refresh starts missing its schedule.

A minimal docs/mkdocs.yml in each repo:

site_name: 'checkout-api'
nav:
  - Home: index.md
  - API Reference: api.md
  - Runbook: runbook.md
  - ADRs:
      - 0001 Choose Kafka: adr/0001-kafka.md
plugins:
  - techdocs-core   # ships in @techdocs/cli
markdown_extensions:
  - admonition
  - pymdownx.superfences
  - pymdownx.tabbed

The CI job (GitHub Actions excerpt):

- name: Build TechDocs
  run: npx @techdocs/cli generate --source-dir . --output-dir site --no-docker
- name: Publish to S3
  env:
    AWS_ACCESS_KEY_ID:     ${{ secrets.TECHDOCS_AWS_KEY }}
    AWS_SECRET_ACCESS_KEY: ${{ secrets.TECHDOCS_AWS_SECRET }}
  run: |
    npx @techdocs/cli publish \
      --publisher-type awsS3 \
      --storage-name acme-backstage-techdocs \
      --entity default/Component/${{ github.event.repository.name }}

For very large fleets, put CloudFront in front of the bucket with signed URLs and a 5-minute TTL — Backstage will fetch HTML through the backend (so auth still works), but each engineer’s browser caches the static assets globally.

Step 5 – Auth, RBAC, and Kubernetes Plugin

The last subsystem ties everything together. The auth flow has to do four things: (1) authenticate the user against your IdP, (2) resolve the token to a Catalog User, (3) issue a session, and (4) gate every subsequent API call through the Permission backend. The sequence is below.

Auth and RBAC flow

The OIDC provider configuration in app-config.yaml:

auth:
  environment: production
  session:
    secret: ${SESSION_SECRET}
  providers:
    oidc:
      production:
        metadataUrl: https://acme.okta.com/.well-known/openid-configuration
        clientId: ${OKTA_CLIENT_ID}
        clientSecret: ${OKTA_CLIENT_SECRET}
        prompt: auto
        scope: 'openid profile email groups'
        signIn:
          resolvers:
            - resolver: emailMatchingUserEntityProfileEmail
            - resolver: emailLocalPartMatchingUserEntityName

The two-resolver list is the safety net: try strict email match first, fall back to local-part match (alice@acme.com -> User:default/alice) only if the first misses. Anything that misses both errors out — no guest fallthrough.

Permissions live in a policy file loaded by @backstage-community/plugin-rbac-backend:

permission:
  enabled: true
  rbac:
    policies-csv-file: /etc/backstage/rbac-policy.csv
    policyFileReload: true
    admin:
      users:
        - name: user:default/alice
      superUsers:
        - name: user:default/platform-admin

And rbac-policy.csv:

p, role:default/platform-team, catalog.entity.read, read, allow
p, role:default/platform-team, catalog.entity.create, create, allow
p, role:default/platform-team, kubernetes.proxy, use, allow
p, role:default/payments-team, scaffolder.template.parameter.read, read, allow
p, role:default/payments-team, scaffolder.action.execute, use, allow
g, group:default/platform-team, role:default/platform-team
g, group:default/payments-team, role:default/payments-team

Finally, the Kubernetes plugin reads pods, services, and deployments per entity. Configure it with one cluster entry per cluster — and prefer AWS IAM / GCP workload identity over long-lived service-account tokens:

kubernetes:
  serviceLocatorMethod: { type: 'multiTenant' }
  clusterLocatorMethods:
    - type: 'config'
      clusters:
        - name: prod-us-east-1
          url: https://EXAMPLE.eks.us-east-1.amazonaws.com
          authProvider: 'aws'
          skipTLSVerify: false
          skipMetricsLookup: false
        - name: prod-eu-west-1
          url: https://EXAMPLE.eks.eu-west-1.amazonaws.com
          authProvider: 'aws'

Why this works. The backstage.io/kubernetes-id annotation on the Component entity matches the app.kubernetes.io/instance label on the pods; the plugin queries each cluster with the engineer-scoped IAM role and returns only matching workloads. What fails if you skip the IAM auth: you end up baking a long-lived cluster-admin SA token into Backstage, which then becomes the most valuable credential in your fleet.

Production Patterns: HA, Observability, Cost

A trio of things that the docs underplay.

High availability. Three or more backend replicas, anti-affinity by node, PodDisruptionBudget with maxUnavailable: 1. Postgres in a managed multi-AZ deployment. Run scheduled jobs only on one pod via advisory locks — the backend does this for you, but if you write a custom plugin, use coreServices.scheduler rather than setInterval. A setInterval in a custom plugin running on three pods will ingest your GitHub org three times concurrently and either rate-limit or duplicate-write entities.

Observability. Wire the @backstage/backend-defaults OpenTelemetry exporter at boot, point it at an OTel Collector, and ship traces to Tempo or Honeycomb. The most valuable spans are catalog.refresh, scaffolder.task, and auth.signIn — slowness in any of them is an outage in slow motion. Track these RED-style metrics: catalog refresh duration p95, scaffolder task duration p99, auth.signIn error rate. Alert on auth error rate > 1 percent over five minutes — it usually means your IdP changed a claim and your resolver is silently failing.

Cost. A reasonable IDP for an organization with 500 engineers and 2000 components runs on roughly: 3x backend pods (1 vCPU, 2 GB) + 2x frontend pods (0.5 vCPU, 256 MB) + RDS Postgres db.t4g.medium (2 vCPU, 4 GB) + an S3 bucket with < 10 GB of TechDocs + an OIDC IdP you already pay for. That is on the order of 200 to 400 USD per month, before CloudFront. Treat anything more expensive as a smell — usually it means catalog refresh has a runaway loop or TechDocs is being built in-pod.

Docker-compose for local + staging. The same shapes survive a smaller deployment. A working docker-compose.yml for an internal staging environment (and the file we use during onboarding to let new platform engineers tinker safely) looks like this:

version: '3.9'
services:
  postgres:
    image: postgres:16
    environment:
      POSTGRES_USER: backstage
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
      POSTGRES_DB: backstage
    volumes: ['pgdata:/var/lib/postgresql/data']
    healthcheck:
      test: ['CMD', 'pg_isready', '-U', 'backstage']
      interval: 10s
      retries: 5

  minio:
    image: minio/minio:latest
    command: server /data --console-address ":9001"
    environment:
      MINIO_ROOT_USER: backstage
      MINIO_ROOT_PASSWORD: ${MINIO_PASSWORD}
    volumes: ['miniodata:/data']
    ports: ['9000:9000', '9001:9001']

  backstage:
    image: ghcr.io/acme/backstage:1.30.0
    depends_on:
      postgres:  { condition: service_healthy }
      minio:     { condition: service_started }
    environment:
      POSTGRES_HOST: postgres
      POSTGRES_PORT: '5432'
      POSTGRES_USER: backstage
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
      AWS_ACCESS_KEY_ID: backstage
      AWS_SECRET_ACCESS_KEY: ${MINIO_PASSWORD}
      AWS_ENDPOINT_URL_S3: http://minio:9000
      AWS_S3_FORCE_PATH_STYLE: 'true'
      OKTA_CLIENT_ID: ${OKTA_CLIENT_ID}
      OKTA_CLIENT_SECRET: ${OKTA_CLIENT_SECRET}
      SESSION_SECRET: ${SESSION_SECRET}
    ports: ['7007:7007']

volumes:
  pgdata: {}
  miniodata: {}

MinIO stands in for S3 in this stack — the AWS_ENDPOINT_URL_S3 override means the AWS SDK in the TechDocs publisher routes everything to MinIO with zero code change. Treat this as a staging artifact, not a production one: it gives the same shape (Postgres + object store + Backstage) so issues you find here reproduce in real environments, but it does not give you HA, anti-affinity, or replicated state. Migrate to the Kubernetes deployment described earlier the moment more than one team starts using it.

If you are extending this further into the cluster, see our companion guides on ArgoCD vs Flux for industrial GitOps fleets and on CNI choice for Kubernetes at the edge — Calico, Cilium, Flannel, Multus, both of which describe the substrate Backstage’s Kubernetes plugin will end up surfacing. For the observability pipeline specifically, the patterns in our OpenTelemetry Collector architecture guide drop in cleanly.

Trade-offs and Anti-patterns

Trade-off: monorepo vs polyrepo for catalog-info.yaml. Putting every entity in a single acme/catalog repo gives you one PR review point and easy bulk editing, at the cost of a constant merge-conflict tax. Co-locating each catalog-info.yaml with its code keeps ownership clean but requires the GitHub provider to scan every repo. We recommend co-location for components and a central repo only for Domain, System, and shared Group definitions.

Anti-pattern: one giant scaffolder template. Engineers want a single “create my whole service” wizard, and it is tempting to build one. Don’t. Templates that do more than 10 steps become unreviewable, untestable, and impossible to evolve. Compose: a typescript-service template that scaffolds the repo, a separate add-postgres-database template that opens an infra PR, a third create-grafana-dashboard template. Engineers chain them.

Anti-pattern: skipping the User/Group ingestion. Discussed above but worth restating — without User and Group entities, every permission rule degrades to “any signed-in user”. You will discover this the day an intern accidentally deletes a System entity.

Anti-pattern: storing GitHub PATs in app-config.yaml. Use a GitHub App. PATs are tied to a person; the day that person leaves, your scaffolder breaks. A GitHub App is owned by the org, scoped per-repo, and supports rotating secrets via your secret manager. For deeper service-mesh patterns at the network layer (which often sit underneath an IDP-managed fleet), our Cilium eBPF service mesh write-up covers identity-aware connectivity that complements Backstage’s K8s plugin. And if you are stretching Backstage to expose ML/IoT workloads, the federated learning IoT architecture guide shows the kinds of Component entities and ownership boundaries that work well.

FAQ

Q1. Do I need to migrate to the new backend system if my Backstage 1.28 deployment still works?
Yes. The legacy createServiceBuilder path is fully removed in 1.32, and most plugin modules published after April 2026 only ship /alpha exports compatible with createBackend(). Plan a one-sprint migration: move each plugins/*.ts to a single backend.add(import(...)) line, then delete the old index.ts. The migration is mechanical and well documented at backstage.io.

Q2. Can I run Backstage without PostgreSQL — just SQLite — in production?
No. SQLite is single-writer; with two backend replicas you get write-lock contention and the scheduler service cannot acquire advisory locks. The Backstage maintainers explicitly warn against SQLite in any multi-replica deployment, which means any HA deployment. PostgreSQL 14+ is the only production-supported database; 16 is recommended for partitioning support.

Q3. How do I prevent the catalog from filling up with stale entities when a repo is deleted?
Set catalog.orphanStrategy: delete in app-config.yaml. By default the catalog only marks entities as orphaned and leaves them visible. With delete, an entity whose source Location disappears (because the repo was deleted or the YAML was removed) is removed from the catalog on the next refresh. Pair this with the strict ingestion rules from Step 2.

Q4. How does Backstage’s Permission framework compare to OPA/Gatekeeper?
They solve different problems. OPA evaluates Kubernetes admission decisions on cluster-side data. The Backstage Permission framework evaluates Backstage UI and API decisions on Catalog data — “can Alice view this Component”, “can Bob run this Template”. You usually want both. The Permission policies are evaluated server-side in the backend and can be expressed in CSV (RBAC) or TypeScript for resource-typed conditions (e.g., “allow read if spec.lifecycle == 'experimental'“).

Q5. Can Backstage be the source of truth for ownership, or should the catalog mirror an external system?
Mirror, don’t own. Treat your HRIS/IdP as the source of truth for User and Group entities, ingested via OrgEntityProvider. Let engineers own Component and System enti

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *