Catena-X Automotive Data Space: Reference Architecture

Catena-X Automotive Data Space: Reference Architecture

Catena-X Automotive Data Space: Reference Architecture

A practical automotive data space architecture is no longer a research curiosity. It is the operating model that Volkswagen, BMW, Mercedes-Benz, BASF, Bosch, and several hundred suppliers now use to move carbon-footprint figures, traceability records, and quality data across company boundaries — without anyone surrendering control of their data. Catena-X is the first production data space to reach scale in manufacturing, and its design choices are quietly becoming the template for the broader Manufacturing-X movement.

The hard problem it solves is not data transport. SFTP and APIs have moved files between OEMs and suppliers for decades. The hard problem is sovereignty: how do you share a dataset, attach machine-enforceable usage rules to it, prove who you are without a central broker, and revoke access later — all while every participant runs their own software stack? Catena-X answers this with a federated, peer-to-peer model built on the Eclipse Dataspace Connector.

What this covers: the data-space concept and the Gaia-X/IDS lineage, the Catena-X reference architecture and its EDC connector internals, the core use cases, a step-by-step contract-negotiation walkthrough, and the trade-offs that bite teams in production.

Context: what a data space actually is

A data space is a federated network where participants exchange data under shared rules, retain control of their own data, and trust each other through verifiable identity rather than a central platform. There is no data lake in the middle. There is no hyperscaler holding everyone’s records. Each participant keeps data in their own systems and exposes it selectively through a connector that enforces policy at the point of exchange.

This concept did not emerge from automotive. It comes from the International Data Spaces Association (IDSA), which published the IDS Reference Architecture Model defining sovereignty, the connector pattern, and usage control. Gaia-X added a European federation layer: trust frameworks, compliance rules, and the policy that data infrastructure should not lock participants into a single cloud. Catena-X is the automotive vertical built on these foundations, and Manufacturing-X is the effort to generalize the same playbook across chemicals, aerospace, and discrete manufacturing.

Why does automotive need this badly? The industry runs on a digital thread that spans a dozen tiers — raw-material producers, chemical suppliers, cell manufacturers, module integrators, Tier-1s, and OEMs. Regulations like the EU Battery Regulation and the Carbon Border Adjustment Mechanism now demand cradle-to-grave data that no single company holds. A Tier-3 chemical input contributes to a battery cell’s product carbon footprint, which rolls up into the OEM’s vehicle-level number. Getting that figure today means hundreds of bilateral spreadsheet exchanges with no provenance and no usage control. A data space replaces that with a standardized, sovereign, auditable exchange.

The sovereignty guarantee rests on three pillars. First, decentralized identity — every participant holds verifiable credentials issued by trusted authorities, so identity is proven cryptographically, not by a login to someone’s portal. Second, usage policies travel with the data and are enforced by the connector. Third, contracts are negotiated machine-to-machine before any bytes move. These three ideas drive the entire reference architecture below.

The Catena-X reference architecture

The reference architecture is layered. At the top sits governance — the Catena-X Association defines standards, certification, and the rulebook every participant signs. Below that are federation services that make the network discoverable and trustworthy: identity and trust, discovery and registry, and verifiable-credential issuance. The participant layer is where OEMs, suppliers, and service providers operate their own connectors. At the bottom is the sovereign data-exchange layer, where the Eclipse Dataspace Connector and digital-twin representations actually move data.

Layered Catena-X automotive data space architecture showing governance, federation, participant, and data-exchange layers

Figure 1: The automotive data space architecture is a four-layer stack — governance and standards on top, federation services for identity and discovery, a participant layer of OEM and supplier connectors, and a sovereign data-exchange plane built on the Eclipse Dataspace Connector and AAS digital twins.

What makes this design federated rather than centralized is the absence of any data-bearing hub. The federation services hold no business data — they hold only the trust anchors and pointers needed to find and verify participants. All payloads flow connector-to-connector. That distinction is the whole point: it means no single operator can read, monetize, or hold hostage the data flowing through the network. It also means the architecture has no single point of failure for data, though the federation services do remain a coordination dependency.

The connector: Eclipse Dataspace Connector internals

The connector is the workhorse. Catena-X standardized on the Eclipse Dataspace Connector (EDC), an open-source implementation of the Dataspace Protocol. The EDC splits cleanly into a control plane and a data plane, and understanding that split is essential to operating it well.

Eclipse Dataspace Connector internals showing control plane services, data plane framework, and trust and identity components

Figure 2: The EDC separates a control plane — catalog, contract negotiation, policy engine, and transfer-process manager — from a data plane that streams the actual payload, with a decentralized identity layer underpinning both.

The control plane handles metadata, negotiation, and policy. The catalog service publishes the offers a participant is willing to make: each offer is a dataset plus the policy that governs it. The contract-negotiation component runs a stateful handshake between two connectors until both agree on terms. The policy engine evaluates access and usage rules — for example, “only Tier-1 suppliers with a valid membership credential may pull this PCF dataset, and only for sustainability reporting.” The transfer-process manager coordinates the actual movement once a contract is signed.

The data plane is deliberately thin. The EDC data-plane framework hands off to protocol adapters — HTTP, cloud object storage, Kafka, and others — so the connector never has to become a universal data broker. It issues a short-lived endpoint and token, and the consumer pulls the data directly. This keeps large payloads out of the control plane entirely. Underpinning both planes is the identity layer, where decentralized identifiers and the identity-and-trust protocol let two connectors verify each other’s credentials without phoning a central authority on every request. The connector model has clear parallels to the Asset Administration Shell reference architecture, where standardized submodels expose asset data through a uniform interface.

Identity, credentials, and the catalog

Trust in a data space is decentralized by design. There is no master directory of who-can-do-what. Instead, an OEM holds a membership credential proving it belongs to Catena-X, and possibly a business-partner credential proving a specific relationship with a supplier. These are verifiable credentials in the W3C sense — cryptographically signed, presented on demand, and verifiable offline against the issuer’s public key.

When a consumer connector requests another participant’s catalog, the provider checks the consumer’s presented credentials against the access policy before it even reveals which offers exist. This is subtle and important: the catalog itself is access-controlled, so a competitor cannot enumerate what data you hold. Two suppliers can both be members of the network and still see entirely different catalogs from the same OEM, because each sees only what their relationship credentials unlock. This credential-gated discovery is what makes the network safe to join without exposing your commercial relationships to everyone else on it.

The digital twin and AAS binding

Catena-X does not move raw, undocumented files. Data is structured according to standardized models, and many of those models are expressed as Asset Administration Shell submodels. A battery’s digital twin, a part’s traceability record, and a material’s PCF figure all follow agreed schemas registered in a Digital Twin Registry. This is what turns point-to-point exchange into an interoperable thread: a consumer knows exactly how to parse a PartTypeInformation or Pcf submodel because the model is standardized network-wide. The same digital-thread discipline underpins a robust digital-thread PLM architecture, and the principle of one authoritative semantic model echoes the unified namespace architecture for industrial IoT.

Core use cases

The architecture earns its complexity through the use cases it unlocks. Each maps a regulatory or operational need onto sovereign data exchange.

Use-case map of the Catena-X data space spanning carbon footprint, traceability, quality, battery passport, and demand-capacity

Figure 3: Five anchor use cases sit on the Catena-X data space — PCF exchange feeding sustainability reporting, traceability enabling targeted recalls, quality data driving root-cause analysis, the battery passport satisfying EU compliance, and demand-capacity management improving supply resilience.

Product carbon footprint (PCF) exchange is the flagship. Instead of estimating supplier emissions with industry averages, an OEM requests the actual primary PCF figure for each purchased part, which itself was computed from the supplier’s upstream PCF data. The figures roll up the supply chain as real, provenance-backed numbers. This is the only credible way to meet CBAM and corporate sustainability reporting without guesswork.

Traceability lets a manufacturer follow a part or batch across company boundaries. When a defective component is identified, the OEM can scope a recall to the exact serialized parts affected rather than recalling a model year. The data stays with each owner; only the linked twin references travel, so traceability does not require centralizing everyone’s BOM.

Quality and field data flows the other direction — OEMs share anonymized field-failure data back to suppliers so root-cause analysis happens against real fleet behavior. Demand and capacity management shares forecast and capacity signals to absorb shocks before they cascade. And the battery passport is becoming the highest-stakes case as the EU Battery Regulation takes effect; the regulatory and data-model detail is covered in depth in the battery passport and EU regulation guide.

Walkthrough: a contract negotiation and data exchange flow

The clearest way to understand the architecture is to trace one exchange end to end. Suppose a Tier-1 supplier (the consumer) needs the PCF figure for a material from a chemical producer (the provider). Both run an EDC.

Sequence diagram of a Catena-X contract negotiation from catalog request through credential verification to sovereign data transfer

Figure 4: The exchange flows from catalog request, through credential verification and policy-gated catalog return, into a stateful contract negotiation, and finally a data-plane transfer where the consumer pulls the sovereign payload using a short-lived endpoint and token.

First, the consumer’s connector requests the provider’s catalog. The provider does not answer blindly — it asks the identity service to verify the consumer’s presented credentials. Once the credentials check out, the provider returns a catalog filtered to exactly the offers this consumer is entitled to see. Each offer carries its policy inline.

Next, the consumer initiates contract negotiation against a chosen offer. This is a stateful, multi-message handshake. The provider evaluates its access policy against the consumer’s verified attributes — is this party a network member, does it hold the right business-partner relationship, is the intended usage permitted? If every clause passes, both connectors record a signed contract agreement with a unique agreement ID. Nothing has moved yet; this is purely the legal-and-technical agreement to move it.

Only now does the consumer request the actual transfer, referencing the agreement. The provider’s transfer-process manager provisions a data-plane endpoint and issues a short-lived access token. The consumer pulls the payload directly through the data plane. The control plane is out of the data path entirely — it set up the deal and stepped aside. Throughout, the policy that was agreed remains attached to the data, so usage control persists beyond the moment of transfer.

This flow repeats thousands of times a day across the network, almost always machine-to-machine with no human in the loop. The negotiation that once happened over email and legal review now happens in milliseconds because the rules are encoded as policy and the trust is encoded as credentials.

Trade-offs and what goes wrong

The model is elegant, but production teams hit real friction. The first is operational weight. Every participant must run, secure, and update a connector — a stateful service with a database, identity wallet, and network exposure. For an OEM with a platform team this is routine. For a Tier-3 supplier with five people in IT, standing up and maintaining an EDC is a genuine barrier. The ecosystem’s answer is connector-as-a-service offerings, but that reintroduces a dependency on intermediaries and partially erodes the peer-to-peer ideal.

Second, the spec moves fast. The Dataspace Protocol, the EDC codebase, and the Catena-X standards all evolve on independent release cycles. A connector that interoperated last quarter can break against a counterpart that upgraded. Teams underestimate the integration-testing burden of staying current across a heterogeneous network where everyone runs slightly different versions.

Third, semantic alignment is harder than transport. Getting two connectors to talk is the easy 20 percent. Getting both sides to agree on what a PCF figure actually means — system boundaries, allocation rules, biogenic carbon treatment — is the hard 80 percent, and the data space cannot enforce it. A perfectly transmitted number computed under different assumptions is still wrong. The standardized submodels help, but they constrain structure, not the modeling judgment behind the values.

Fourth, usage control is a promise, not a guarantee. Policies are enforced by the consumer’s connector. Once data lands in a consumer’s system, technical enforcement of “do not forward” depends on that consumer honoring the contract. The model is contractual and auditable, not a digital-rights-management cage. Treat usage policies as enforceable agreements backed by network governance, not as cryptographic impossibility.

Finally, the federation services are a soft centralization. While no business data passes through them, identity, discovery, and registry services are coordination dependencies. Their availability and governance matter, and concentration of control over them would undercut the sovereignty story. This is an area to watch as the network scales.

Practical recommendations

Start with one use case and one trading partner. PCF exchange is the most common entry point because it has a clear regulatory driver and a well-specified data model. Resist the urge to deploy a connector and then look for uses — let a concrete data need pull the architecture in.

Decide early whether you run your own connector or consume a managed one. If data sovereignty is core to your strategy and you have platform capacity, run the EDC yourself and own the trust wallet. If you are a smaller supplier, a reputable connector-as-a-service gets you onto the network faster — just confirm how credentials and policy enforcement are handled, because that is where sovereignty lives or dies.

Invest disproportionately in semantic alignment. Before exchanging PCF or traceability data, align with your counterpart on the exact submodel version and the modeling assumptions behind the values. Budget more time for this than for the technical integration; it is the part that determines whether the data is trustworthy.

Build version-tolerance into your operations. Track the Dataspace Protocol and EDC release notes, maintain a staging connector for compatibility testing, and assume you will be upgrading several times a year. Treat connector operations as a product, not a project.

Finally, anchor your participation in the governance framework. The Catena-X rulebook and certification are what make the network trustworthy at scale; understand the obligations you are signing up to, particularly around usage policy and credential management, before you go live.

FAQ

Is a data space the same as a data lake or data marketplace?
No. A data lake centralizes data in one store under one operator. A data marketplace centralizes the transaction. A data space does neither — data stays with its owner, exchange is peer-to-peer through connectors, and trust comes from verifiable credentials rather than a platform. The defining property is sovereignty: you control your data, including machine-enforced usage rules that travel with it, even after it leaves your systems.

What is the difference between Catena-X, Gaia-X, and IDS?
They are layers of the same stack. IDS (from the International Data Spaces Association) defines the reference model for sovereign data exchange and the connector pattern. Gaia-X adds a European federation and compliance framework. Catena-X is the automotive-specific data space built on those foundations, with concrete standards, use cases, and a member association. Manufacturing-X generalizes the Catena-X approach to other industrial sectors.

Why did Catena-X choose the Eclipse Dataspace Connector?
The EDC is open source, vendor-neutral, and implements the Dataspace Protocol that the network standardized on. Open governance under the Eclipse Foundation matters for a multi-competitor network where no participant wants to depend on a rival’s proprietary software. The control-plane/data-plane separation also fits the model well, keeping large payloads out of the negotiation path and letting participants plug in their own storage and protocols.

Does sharing data in Catena-X mean losing control of it?
That is the problem the architecture is designed to prevent. Usage policies are negotiated before transfer and remain attached to the data, and access is gated by verifiable credentials. The honest caveat: technical enforcement after the data reaches the consumer depends on the consumer’s connector honoring the policy. So control is contractual, auditable, and governance-backed — strong, but not an absolute cryptographic guarantee against a bad actor.

How heavy is it to join the network as a small supplier?
Heavier than an API integration, lighter than it used to be. You need a connector, a managed identity wallet, and alignment on data models. Running your own EDC requires real platform capacity. Many smaller suppliers instead use a connector-as-a-service provider to get onto the network without operating the software themselves, accepting a managed dependency in exchange for far lower operational overhead.

Further reading

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *