AI Model Supply Chain Security: SBOM, Signing, and Provenance for the ML Pipeline

Most teams treat a downloaded model like a static file. They are wrong. A set of weights is executable supply chain, and ai model supply chain security is now as load-bearing as the controls you already apply to npm, PyPI, and container images. The difference is that almost nobody has wired the same rigor into their ML pipeline.

When you pull a checkpoint from a public hub, you inherit its build history, its dependencies, and whatever someone embedded in its serialization format. You usually cannot see any of that. The model arrives as an opaque artifact, and your serving stack runs it with full trust.

This post is a practitioner’s playbook for closing that gap. We treat the model as a first-class artifact with a verifiable origin, a bill of materials, and an admission gate that refuses anything unsigned.

What this covers: model SBOMs and AI-BOM formats, safetensors versus pickle, signing and attestation with Sigstore and in-toto, SLSA provenance for models, gated registries with admission control, weight scanning, backdoor detection, and the trade-offs that bite you in production.

Context and Background

The modern ML pipeline pulls weights, tokenizers, and adapters from public registries like Hugging Face, plus internal model registries built on MLflow or cloud equivalents. Each pull is a trust decision that most pipelines make implicitly. That is the core problem.

The sharpest technical risk is serialization. Python’s pickle format, used by older torch.save checkpoints and by many .bin files, executes arbitrary code on load. A poisoned checkpoint can run a reverse shell the moment you call torch.load. This is not theoretical. Scanners on public hubs routinely flag pickle files that import os or subprocess during deserialization.

Beyond code execution, you face poisoned and backdoored weights. An attacker can fine-tune a model so it behaves normally on benchmarks but emits attacker-chosen output when a trigger phrase appears. The weights look fine. The eval scores look fine. The backdoor is statistical, not syntactic.

Then there is the classic dependency surface. Typosquatting on model names, compromised maintainer accounts, malicious adapters, and tampered tokenizer configs all map onto familiar package-manager attacks. The OWASP Machine Learning Security Top 10 and the OWASP LLM Top 10 both call out supply chain risk explicitly, and the LLM list treats poisoned third-party models and datasets as a top-tier threat.

The dependency surface is wider than the weights file alone. A model repository typically ships a tokenizer, a config, generation parameters, and sometimes custom modeling code loaded with trust_remote_code=True. That flag is a remote-code-execution switch in plain sight. Enabling it runs arbitrary Python from the repo at load time, which means a compromised repo owns your process. Most teams flip it on to make a model “just work” and never revisit the decision.

There is also a temporal dimension that package managers handle and ML pipelines usually do not. Public model repos are mutable. A maintainer can push a new revision to the same name, and a pipeline that pulls by tag rather than commit hash silently picks up whatever is there today. The artifact you scanned last week may not be the artifact you load this week unless you pin by revision and verify the digest on every pull.

Compliance is catching up fast. The EU AI Act pushes documentation and traceability obligations onto high-risk systems, and NIST’s Secure Software Development Framework (SSDF) extends to AI artifacts through its SP 800-218A profile. If you ship models into regulated workflows, provenance stops being optional. For the broader agent-layer threat model, see our deep dive on agentic AI security and prompt injection. For the canonical risk catalog, start with the OWASP Machine Learning Security Top 10.

The uncomfortable truth is that the controls you need already exist. They were built for software supply chains, and they port cleanly to models once you stop treating weights as inert data.

It helps to name the attacker explicitly. Three profiles cover most realistic threats. The first is an opportunistic attacker who poisons a public artifact and waits for someone to pull it, which the pickle-RCE and typosquat cases serve. The second is a targeted attacker who compromises a maintainer account or a CI identity to push a tampered model to a name you already trust. The third is an insider with registry access who swaps an artifact after it was approved. A serious architecture has an answer for all three, and the same primitives, signing, attestation, transparency logging, and admission control, address each one at a different point in the chain.

The Provenance and Trust Architecture

A model trust architecture answers one question at serving time: do I have cryptographic evidence that this exact artifact came from a build I trust, passed the scans I require, and has not been swapped? You answer it by attaching a bill of materials, signing the artifact and its attestations, recording provenance through the lifecycle, and enforcing all of it at an admission gate before the model reaches a serving tier.

The picture above shows the pieces working as a chain. A model from any source is scanned, described by an SBOM, signed, anchored to a provenance store, promoted into a trusted registry, and admitted only after a policy check. Nothing reaches the serving tier on trust alone.

The design principle behind the whole architecture is that trust must be verifiable, not assumed, at every boundary an artifact crosses. Each hand-off, from intake to scan to promotion to load, either carries cryptographic evidence forward or it is a gap an attacker can exploit. The four building blocks are an inventory that says what the artifact is, signatures that say who produced it, attestations that say how it was built, and an enforcement point that refuses anything missing those properties. The sections below take each in turn.

Model SBOMs and the AI-BOM

A software bill of materials lists what is inside an artifact so you can answer “am I affected” when something goes wrong. The same idea applies to models. CycloneDX added a machine-learning extension, the ML-BOM, that describes model components, datasets, frameworks, and their relationships in one document.

A model SBOM captures the base model and its license, the training and fine-tuning datasets, the framework versions, the tokenizer, and any adapters or quantization steps. It records the artifact’s cryptographic digest so the document is bound to specific bytes. When a base model is later found to be poisoned, you query your SBOMs and find every downstream model that inherited it.

Treat the ML-BOM as data you generate at build time, not a form you fill in later. Emit it from the same pipeline step that packages the weights, sign it alongside the model, and store both. An SBOM nobody can verify is documentation, not a control. The CycloneDX project documents the ML-BOM schema and tooling if you want the field-level detail.

The lineage query is the payoff, so design for it. Store SBOMs in a queryable index keyed by component digest, not as loose files in object storage. When a popular base model is disclosed as poisoned, the question you must answer in minutes is “which of my deployed models descend from it.” That query only works if every fine-tune, merge, and quantization recorded its parent’s digest in the SBOM at build time. Retrofitting lineage after an incident is how teams discover their inventory has holes.

A model SBOM also carries license and usage metadata that matters operationally. Many open-weight models ship under licenses with downstream restrictions, and a merge of two models can produce an artifact whose combined license is ambiguous. Recording license per component lets you flag conflicts before a restricted model reaches production, which is a compliance failure mode distinct from security but enforced through the same pipeline.

Safetensors Versus Pickle

The single highest-leverage control is format choice. Prefer safetensors for every model you can. The format stores only tensors and metadata, with no executable code path on load, which removes the deserialization-RCE class entirely. If a model ships only as a pickle, that is a finding, not a footnote.

Where you cannot avoid pickle, you scan before load and load in isolation. Tools like picklescan and ModelScan statically inspect the opcode stream for dangerous imports and calls. They are not perfect, but they catch the common reverse-shell and exfiltration patterns. The strategic move is to convert pickle artifacts to safetensors at ingestion, scan once, and never let the raw pickle reach a GPU node.

This format discipline pays compounding dividends. Once your trusted tier is safetensors-only, your runtime threat model shrinks dramatically, and your admission policy gets simpler to reason about.

Be precise about what safetensors does and does not buy you. It removes the code-execution path during tensor deserialization. It does not vouch for the numbers inside, validate the accompanying config, or stop the trust_remote_code path that loads custom modeling files separately. A safetensors model with a malicious custom modeling module is still dangerous. The format is a necessary floor, not a ceiling, and the rest of the architecture handles the risks it leaves open.

Conversion itself deserves a guarded step. The act of converting a pickle to safetensors requires loading the pickle, which is the exact operation you are trying to avoid on production hardware. Run conversion in a disposable, network-isolated sandbox with no credentials mounted, treat that sandbox as compromised by default, and only the resulting safetensors file leaves it. This keeps the one unavoidable risky load contained to a throwaway environment.

Signing, Attestation, and SLSA for Models

Signing proves who produced an artifact and that it has not changed since. Sigstore’s keyless flow is the path of least resistance: cosign requests a short-lived certificate from Fulcio bound to an OIDC identity, signs the model digest, and records the signature in the Rekor transparency log. There are no long-lived signing keys to leak.

Attestations go further than signatures. An in-toto attestation is a signed statement about an artifact, such as “this digest was produced by this builder from these inputs.” SLSA (Supply-chain Levels for Software Artifacts) defines a provenance attestation format and a ladder of build-integrity levels. Applied to models, a SLSA provenance attestation records the training or packaging build, its inputs, and its builder identity, so a verifier can confirm the model came from your pipeline and not a laptop.

Provenance must span the lifecycle, not just the final upload. Bind attestations to the dataset hash, the training run, the evaluation results, and the packaging step. Each stage emits a signed statement that references the previous artifact’s digest, forming a verifiable chain from data to deployment. The SLSA specification defines the provenance model and the levels you can target incrementally.

Target SLSA levels incrementally rather than chasing the top of the ladder on day one. Level one is simply producing provenance at all, which most pipelines can reach in a sprint by emitting an attestation from the build job. Higher levels demand that provenance be generated by the build platform itself, hardened against tampering by the build’s own scripts, and produced on isolated, ephemeral build environments. For models, reaching the higher rungs usually means moving training and packaging onto a managed build service whose identity signs the provenance, so a leaked developer credential cannot forge it. Most teams get the majority of the value at the lower levels and climb deliberately.

The lifecycle attestation chain is what turns a pile of signatures into a story you can audit. A verifier walking the chain confirms that the deployed digest traces back through packaging to a training run, which traces back to a signed dataset, with each link signed by an identity you trust for that stage. If any link is missing or signed by an unexpected identity, verification fails closed. This is the difference between “we signed the final file” and “we can prove the whole pipeline that produced it.”

The model registry becomes the trust boundary. A registry holds two tiers: an untrusted intake tier where anything can land, and a trusted tier that only accepts artifacts with valid signatures, attestations, and clean scans. Promotion between tiers is the control point, and it is enforced by admission policy rather than convention.

The transparency log under all of this is what makes the scheme robust against insider tampering. Rekor is append-only and publicly auditable, so a signature cannot be quietly created or backdated without leaving a record. A verifier checks not just that a signature is valid but that a matching entry exists in the log with a verifiable inclusion proof. This shifts your trust from “the storage system was not tampered with” to “the public log is consistent,” which is a far stronger property and the heart of why Sigstore-style signing beats bare key signing for supply chains.

Defenses Across the Lifecycle

Architecture is necessary but not sufficient. You need concrete controls at each stage, each one independently verifiable, so that a single bypass does not collapse the whole chain. The signing and verification flow is where most of the cryptographic work happens.

The sequence above shows keyless signing end to end. The builder asks cosign to sign a model digest, Fulcio issues an identity-bound certificate, the signature lands in the Rekor transparency log with an inclusion proof, and the verifier later confirms the entry exists and matches policy. Because Rekor is append-only and public, a verifier does not need to trust the builder’s storage. It trusts the log.

Verification is where teams cut corners, so make it strict. A verifier checks four things: the signature is valid, the signing identity matches an allowlist, the artifact digest matches the bytes you are about to load, and the attestation satisfies your policy predicate. Skip any one and the control is theater. In practice you encode these checks in cosign verify-attestation plus a policy engine that evaluates the attestation payload.

The gated registry is where you turn verification into enforcement. No artifact reaches the serving tier without passing the admission gate.

As the diagram shows, every promotion request fans out to signature verification, attestation verification, and scan-result checks, and a policy engine makes a single allow-or-block decision. A blocked promotion alerts and stops. This is the same pattern as a Kubernetes admission controller, and you can literally implement it as one for model-serving custom resources.

The policy engine is where this stops being a one-off script and becomes governable. Express the rules as code in a policy language such as Rego or the policy syntax your signing tooling ships with, version it in git, and require review to change it. A good policy asserts concrete predicates: the signer identity matches an allowlist, the attestation predicate type is the one you expect, the scan verdict is clean, the model format is safetensors, and the SBOM is present and signed. Because the policy is data, you can test it, dry-run it against historical promotions, and prove to an auditor exactly what was enforced on a given date.

Keep the gate’s failure mode explicit and biased toward safety. If the verifier cannot reach the transparency log, cannot parse an attestation, or hits an unexpected condition, it should block, not pass. Fail-open gates are worse than no gate because they create a false sense of coverage. Pair the fail-closed default with good alerting so that a verification outage is visible and gets fixed, rather than silently waved through under load.

Scanning Weights and Detecting Backdoors

Scanning has two jobs. The first is catching code-execution payloads in serialized files, which picklescan and ModelScan handle for the pickle and Keras-Lambda classes. The second, harder job is detecting behavioral backdoors in the weights themselves.

Backdoor detection is an active research area with no silver bullet. Practical defenses include trigger-reverse-engineering methods that search for small input perturbations causing large output shifts, activation-clustering on a held-out set, and differential testing against a known-good reference model. Treat these as risk reducers, not guarantees, and weight them by how much you trust the source.

For generative models the detection problem is harder still, because the output space is open-ended and triggers can be rare token sequences no fuzzer will stumble on. The pragmatic posture is to lean on provenance to shrink the candidate set, then apply behavioral red-teaming against the specific harms you care about. Run a fixed suite of adversarial prompts, compare outputs against a trusted baseline of the same architecture, and alert on divergence. You are not proving the absence of a backdoor; you are raising the cost and likelihood of catching one, while provenance ensures you only ever run weights from a build you can name.

Mapping Risks to Controls

The table below maps the main supply chain risks to the control that addresses each one. Use it as a coverage checklist when you audit a pipeline.

Risk	Primary control
Pickle deserialization RCE	safetensors-only tier plus picklescan/ModelScan
Tampered or swapped weights	cosign signature plus digest pinning
Unknown build origin	SLSA provenance attestation
Inherited poisoned base model	ML-BOM lineage query
Unverified promotion	gated registry admission control
Backdoored weights	backdoor scanning plus reference diffing
Malicious dependency or adapter	dependency provenance plus pinned hashes
Untrusted runtime load	sandboxed, network-isolated serving

Data, Dependency, and Runtime Defenses

Provenance is only as strong as its weakest input. Hash and sign your training datasets, and reference those hashes in the training attestation, so a swapped dataset breaks verification downstream. The same applies to dependencies: pin framework and library versions by hash, not floating tags, and capture them in the SBOM.

Runtime isolation is your last line. Load models in a sandboxed process with no outbound network access, dropped capabilities, and a read-only filesystem where possible. Even if a poisoned artifact slips the gate, a sandbox limits the blast radius to a contained worker rather than your cluster. If you front models with a gateway, push verification there too, as we describe in the LLM gateway architecture pattern.

The verification should also re-run at load time, not only at promotion. A registry that was trusted yesterday can be compromised, and an artifact can be swapped after promotion if your storage is mutable. The serving node should verify the digest of the bytes it actually loaded against the digest in the signed attestation, as a final check immediately before the model is wired into the i

Figure 4: Model lifecycle provenance chain from training to deployment. Provenance attestations are generated at each stage – data curation, training, evaluation, and packaging – so the deployed artifact carries a verifiable, unbroken chain back to its inputs.

AI Model Supply Chain Security: SBOM, Signing, Provenance

AI Model Supply Chain Security: SBOM, Signing, and Provenance for the ML Pipeline

Context and Background

The Provenance and Trust Architecture

Model SBOMs and the AI-BOM

Safetensors Versus Pickle

Signing, Attestation, and SLSA for Models

Defenses Across the Lifecycle

Scanning Weights and Detecting Backdoors

Mapping Risks to Controls

Data, Dependency, and Runtime Defenses

Related

Comments

Leave a Reply Cancel reply

Tag Cloud

Categories