AI Protein Design: How RFdiffusion Generates New Proteins

For sixty years, biologists could read proteins but not write them. They could sequence a natural protein, crystallize it, and slowly reverse-engineer what it did — but conjuring a brand-new protein that did something useful, starting from a blank page, was almost hopeless. AI protein design with RFdiffusion changed that. RFdiffusion is a generative diffusion model that grows a protein’s three-dimensional backbone out of pure noise, the same way an image generator paints a photo out of static. Feed it a target — a viral spike to neutralize, a metal ion to coordinate, a flat surface to grip — and it returns a folded shape that did not exist anywhere in nature or in any database. A companion network then chooses the amino acids, a structure predictor checks the work, and only then does anything touch a test tube. This article walks the full mechanism, from noise to bench, without the hand-waving.

What this covers: the design problem, RFdiffusion’s denoising mechanism, ProteinMPNN sequence design, AlphaFold-based validation, the wet-lab loop, real trade-offs, and how a team runs a campaign.

Context and Background

A protein is a chain of amino acids that folds into a precise three-dimensional shape, and that shape is almost everything — it determines whether a molecule binds a target, catalyzes a reaction, or does nothing at all. The “protein folding problem” asked how the one-dimensional sequence dictates the folded structure. For decades it resisted brute force: the number of possible conformations is astronomical, and physics-based simulation was too slow and too inaccurate to find the right one.

The breakthrough came in 2020 when DeepMind’s AlphaFold2 predicted folded structures from sequence with near-experimental accuracy (Jumper et al., Nature 2021). That solved prediction — given a sequence, what shape does it take? But prediction is only half the story, and arguably the easier half for engineers. The harder, more valuable question is the inverse: given a shape or a function you want, what protein should you build? This is de novo protein design — designing molecules from scratch rather than borrowing and tweaking natural ones.

Why is design the harder direction? Prediction has a comforting property: every natural protein you are asked about already folds, because evolution selected it to. The predictor only has to recognize patterns that nature already validated billions of times over. Design has no such safety net. The space of possible amino-acid chains is hyper-astronomical — for a modest 100-residue protein there are twenty raised to the hundredth power possible sequences, a number with 130 digits — and the overwhelming majority of them fold into nothing useful, or into a sticky aggregate, or never fold at all. A designer is searching for rare, functional needles in a haystack that has never been touched by natural selection. That is why for years the field could predict far better than it could create, and why a generative model that reliably produces foldable novel structure is such a leap.

That inverse problem is the life’s work of the Baker lab at the University of Washington’s Institute for Protein Design, which spent two decades building Rosetta, a physics-and-statistics toolkit for assembling proteins fragment by fragment. Rosetta worked, but it was laborious and its success rates on hard targets were low. The deep-learning era let the same lab fold its structural knowledge — including the RoseTTAFold network, its answer to AlphaFold2 — into generative models. RFdiffusion is the headline result: a diffusion model that generates backbones on demand. It inherited RoseTTAFold’s learned sense of what real protein structure looks like and repurposed it from a predictor into a creator.

The distinction matters for everything that follows. Prediction asks “what does this sequence do?” Design asks “what should I make to get the behavior I want?” Conflating them is the single most common misunderstanding, and we will keep them separate throughout. A second point worth fixing early: prediction tools and design tools share machinery but serve opposite masters. AlphaFold2 and RoseTTAFold are predictors, yet RFdiffusion repurposes a predictor’s learned knowledge to create, and later the pipeline turns a predictor back into a judge. The same deep understanding of “what real protein structure looks like” gets used three different ways in one workflow — to generate, to assign sequence, and to validate. That reuse is not a coincidence; it is the reason the modern design stack came together so quickly once accurate structure prediction existed. For a sense of how computational and wet-lab biology now interlock, see our explainer on spatial transcriptomics pipelines, which faces the same in-silico-to-bench handoff from a different angle.

How RFdiffusion Designs a Protein

RFdiffusion designs a protein by starting from a cloud of random 3D coordinates and iteratively “denoising” them into a coherent, foldable backbone, optionally steered toward a target or motif. It outputs only the backbone — the chain’s geometric scaffold — and hands sequence design to a second network, ProteinMPNN. Validation by structure prediction follows. The full pipeline is shown below.

Figure 1. The de novo design pipeline. A design goal conditions RFdiffusion, which generates a backbone from noise. ProteinMPNN assigns amino-acid sequences to that backbone. AlphaFold2 or ESMFold predicts whether each sequence folds back to the intended shape; high-confidence, low-deviation designs proceed to gene synthesis and lab assays, while failures loop back for regeneration. Long description: a directed flow from design goal to RFdiffusion to backbone coordinates to ProteinMPNN to sequences to structure-prediction self-consistency check to a pass-fail gate that either advances to expression and assay or returns to regeneration.

The diffusion idea — start from noise, denoise to a backbone

Diffusion models learn by destruction. During training you take a real example — here, a real protein backbone from the Protein Data Bank — and gradually corrupt it by adding small amounts of random noise over many steps, until after enough steps it is indistinguishable from pure noise. The neural network’s job is to learn the reverse of each tiny corruption: given a slightly noisy structure, predict the slightly cleaner one it came from. Repeat that learned denoising step hundreds of times and you can walk all the way back from random noise to a clean, realistic structure. Image diffusion models do exactly this with pixels; RFdiffusion does it with atoms.

Concretely, a protein backbone can be represented as a set of “frames” — for each residue, a position in space plus an orientation describing how that residue’s local geometry is rotated. RFdiffusion adds noise to these frames, perturbing both the positions (translations) and the orientations (rotations), until the chain is a meaningless tangle. To generate something new, it inverts the process. It begins at step T with a fully noised set of frames — random positions, random orientations — and asks the network: given this noise, what does the final clean structure probably look like? Using that prediction it takes one small step toward order, producing a slightly less noisy backbone. It feeds that back in, predicts again, steps again. Over the denoising trajectory the chain visibly organizes itself: first a vague blob, then rough secondary structure, then crisp helices and sheets packed into a plausible fold.

It helps to be precise about why iterating beats a single shot. In principle you could train a network to leap from pure noise to a finished backbone in one step, but that turns out to be brittle: the mapping from “total chaos” to “specific valid protein” is too sharp to learn well, and one-shot guesses tend to be blurry averages of many possible proteins rather than any single coherent one. Diffusion sidesteps this by breaking an impossibly hard jump into hundreds of trivially easy ones. At each step the network is only asked to remove a little noise — to nudge a slightly messy structure toward a slightly cleaner one — which is a smooth, learnable function. The global structure emerges gradually as the consequence of many small, locally sensible corrections. This is the same reason image diffusion models render a photorealistic face by repeated refinement rather than painting it in a single pass.

There is also a useful way to think about what the model is really doing at each step. It is not memorizing and replaying training proteins; it has learned a score — roughly, the direction in which a noisy structure should move to become more protein-like. Following that direction repeatedly is a guided walk through structure space that ends on the manifold of realistic folds. Because the starting noise is random, every run lands somewhere different, which is exactly what you want from a generator: feed RFdiffusion the same instruction ten times and you get ten distinct, valid backbones to choose from, not ten copies of one answer. Diversity is a feature, because most of those backbones will later be discarded and you want a rich pool to filter.

Two practical knobs follow from this picture. The number of denoising steps trades speed for quality — more steps mean smaller, gentler corrections and generally cleaner output, at the cost of compute. A noise-scale setting controls how adventurous generation is: turn it down and designs hug familiar, conservative folds with high success odds; turn it up and the model explores stranger geometry, which can unlock novel function but lowers the hit rate. Designers tune these per campaign depending on whether they want safe, expressible proteins or bold, exploratory ones.

Figure 2. The denoising trajectory. Generation starts from a cloud of pure Gaussian noise and runs the learned reverse process, passing through a blurry rough fold and emergent secondary structure to a clean backbone. Conditioning information — a fixed motif, a binding target, or a symmetry requirement — is injected at every step so the structure organizes around the constraint rather than drifting away from it. Long description: a left-to-right sequence of denoising steps from a noise cloud through intermediate folds to a finished backbone, with a conditioning input feeding into each step.

The crucial property is that RFdiffusion is built on RoseTTAFold, a structure-prediction network already trained on the entire PDB. That pretraining means the denoiser does not have to relearn protein physics from nothing — it already “knows” what real backbones look like, which bond geometries are allowed, how secondary structure packs. The Baker lab fine-tuned this network for the denoising task, and that inherited structural prior is a large part of why RFdiffusion produces designs that are clean and “designable” rather than physically absurd (Watson et al., Nature 2023).

Conditioning — telling the model what you actually want

Unconditional generation — “give me any plausible protein” — is a useful demo but rarely the goal. The power of RFdiffusion is conditioning: steering generation toward a specific functional outcome. Three modes matter most.

Binder design. You provide the structure of a target — a receptor, a viral protein, a tumor antigen — and ask RFdiffusion to grow a new protein whose surface is complementary to a chosen patch of that target. The target is held fixed while the binder is denoised around it, so generation is explicitly shaped to produce a tight, geometrically matched interface. This is how de novo binders against challenging targets are made.

Motif scaffolding. Sometimes you already know the functional piece — a handful of residues that coordinate a metal, an epitope a vaccine must display, an active-site geometry — but you need a stable protein to hold those residues in exactly the right arrangement. You “pin” the motif’s coordinates and let RFdiffusion build a fresh scaffold around them, denoising everything except the fixed motif. The model invents a supporting structure that presents your functional residues in the correct geometry.

Symmetry. Many useful assemblies — nanoparticles, vaccine platforms, channels — are symmetric oligomers. RFdiffusion can enforce a symmetry group during generation so that the denoised subunits tile into a closed, symmetric complex rather than a random aggregate. The constraint is applied at every denoising step, just like the conditioning in Figure 2.

A fourth, increasingly important mode is partial diffusion, where you start not from pure noise but from an existing structure that has been partly noised, then denoise back. This lets you diversify a known fold — generating close cousins of a backbone that already works — instead of inventing from scratch, which is invaluable when you have a promising hit and want a family of variants around it.

In all four cases the conditioning is not a post-hoc filter; it is injected into the network at every step of the reverse process, so the structure assembles itself around the constraint rather than being generated freely and then checked. This is the single most important thing to understand about why these models are useful rather than merely impressive. A generator that only produces “some plausible protein” is a curiosity. A generator you can aim — at a specific viral epitope, a specific catalytic geometry, a specific symmetric assembly — is an engineering tool. Conditioning is what converts a party trick into a design platform, because real problems always come with constraints, and the constraints are the whole point.

ProteinMPNN — inverse folding from backbone to sequence

RFdiffusion gives you a shape, but a shape is not yet a protein — there are no amino acids assigned to the backbone positions. Choosing a sequence that will actually fold into a given backbone is the inverse folding problem, and it is solved by a separate network, ProteinMPNN (Dauparas et al., Science 2022).

ProteinMPNN is a message-passing graph neural network. It treats each residue position as a node and the spatial relationships between them as edges, then reasons over that geometric graph to predict, for every position, which amino acid best stabilizes the fold given its neighbors. It does this autoregressively — deciding residues in an order that lets each choice account for the ones already made — so the resulting sequence is internally consistent rather than a set of independent guesses that happen to clash. It is fast, it is remarkably good, and crucially it tends to produce sequences that are both more stable and more expressible than older physics-based methods — the proteins are more likely to actually fold and be soluble when made.

Why a separate network at all, rather than having RFdiffusion emit the sequence too? Because the two tasks have genuinely different shapes. Generating geometry is a continuous problem in 3D space; choosing amino acids is a discrete labeling problem over a graph. Splitting them lets each network use the representation that suits its job, and it makes the pipeline modular: you can swap in a better sequence designer without retraining the structure generator, run ProteinMPNN on backbones that came from any source, or impose sequence-level constraints — fix a residue, bias away from cysteines, match a desired amino-acid composition — without touching the geometry stage. Designers typically run ProteinMPNN many times per backbone, sampling a diverse set of candidate sequences at a chosen “temperature” that controls how adventurous the choices are, because some sequences will validate and express better than others and you want options. A single backbone might yield dozens of sequence candidates, only a few of which survive the validation gate. The division of labor is clean and deliberate: RFdiffusion handles geometry, ProteinMPNN handles chemistry. Each network is excellent at one job rather than mediocre at both.

From In Silico to the Bench

A designed sequence on a screen is a hypothesis, not a result. Before committing the time and cost of synthesis, designers run an in-silico gate; only the survivors go to the lab, and the lab results feed back into the next round. Figures 2 and 3 frame this loop.

The central computational filter is the self-consistency check, and it is the conceptual heart of the whole method. The idea is a closed loop. RFdiffusion designed a backbone. ProteinMPNN proposed a sequence for it. Now take only that sequence — throw away the backbone you started from — and hand it to an independent structure predictor, AlphaFold2 or ESMFold, that had no role in generating either. Ask the predictor a deceptively simple question: if you fold this sequence from scratch, what shape do you get? If the predictor independently rebuilds the very backbone you originally designed, you have strong evidence the design is real — three different models, trained for different purposes, all agree on the same structure. If the predictor folds it into something else, the design was a fantasy and you discard it. The validator never saw your intended answer, so its agreement is meaningful rather than circular.

Two numbers operationalize this. pLDDT is the predictor’s per-residue confidence, on a 0-to-100 scale; high pLDDT (commonly a threshold around 80–90 for design work) means the model is sure the structure folds as drawn rather than flopping into disordered chaos. RMSD (root-mean-square deviation) measures, in angstroms, how far the predicted fold sits from the target backbone after optimal alignment; low RMSD — typically under one to two angstroms for a confident pass — means the predicted shape matches the design closely. A design that scores high pLDDT and low RMSD is “self-consistent”: generated by one set of tools and independently confirmed by another. The logic holds precisely because the validator is independent of the designer. It is worth stressing the honest limit here, though — passing self-consistency means the design is internally coherent and probably foldable, not that it will work; it is a strong predictor of bench success, not a proof of it.

You filter aggressively. From thousands of generated backbones and tens of thousands of sequences, only those passing strict pLDDT and RMSD thresholds advance. For binders you add an interface check — does the predictor place the binder against the target with the intended geometry, and not merely fold into a nice shape that ignores the target entirely? A monomer can be beautifully self-consistent and still completely fail to bind, so binder campaigns score the predicted complex, not just the designed protein in isolation. Additional cheap filters pile on: radius of gyration to reject overly extended or floppy shapes, secondary-structure content to favor well-packed folds, and sometimes a quick check that the sequence does not contain obvious expression liabilities. This in-silico funnel is cheap relative to the bench, so designers are deliberately ruthless: it is far better to discard a hundred promising-looking designs than to spend weeks and reagents expressing a single protein that was never going to fold. The funnel typically narrows by orders of magnitude — generate ten thousand, order a few hundred, characterize a few dozen, publish a handful.

What survives goes to the wet lab, and this is where humility re-enters. The DNA encoding each design is reverse-translated into a gene, synthesized, cloned into an expression system — often E. coli for speed and cost — and the protein is produced and purified, usually via an affinity tag. The first thing you learn is brutal and binary: does it express as soluble protein at all, or does it dump into insoluble inclusion bodies? A meaningful fraction of designs fail right here, before any function is even testable, which is precisely the designability-versus-expressibility gap made concrete.

For the survivors, the assays depend on the goal. Size-exclusion chromatography reveals whether the protein is a clean monomer of the right size or a smear of aggregates. Circular dichroism reports whether the secondary-structure content matches the design and whether the fold is thermally stable. For binders, biolayer interferometry or surface plasmon resonance measure affinity directly — does the designed protein actually grab its target, and how tightly? Enzymes get activity assays measuring turnover. And the gold standard, when a design matters enough, is an experimental structure by X-ray crystallography or cryo-EM, laid over the computational design to see how closely the real atoms match the intended ones. When that overlay is tight, you have proof: a protein that existed only as noise-derived coordinates a few weeks earlier now sits, folded as designed, in a real structure. Every result — clean success, partial success, or outright failure — informs the next round, which is the build-test-learn loop in Figure 3.

Figure 3. The design-build-test loop. Computational design feeds into gene synthesis and expression, then into binding, folding, and function assays. Designs that meet the target become hits for further characterization; those that fail update the filters and conditioning for the next design round, so each campaign learns from its own bench data. Long description: a cycle from design to build to test to a decision gate; hits advance to characterization and a lead candidate, failures loop back to refine the design step.

The applications are concrete and increasingly real. De novo binders can act as research reagents, diagnostics, or therapeutic leads — proteins built to grip a specific target with high affinity. They are an intriguing alternative to antibodies: smaller, often more thermostable, easier to manufacture, and designable against targets where raising a good antibody is hard. Enzymes designed around a scaffolded active site can in principle catalyze reactions for which no natural enzyme exists, a frontier for green chemistry, drug synthesis, and plastic degradation — though, as noted, getting high catalytic rates remains one of the field’s hardest open problems. Vaccine immunogens use motif scaffolding to display a viral epitope on a stable, symmetric particle that presents the immune system with a clean, focused target, training a more precise antibody response than a whole, messy pathogen would. Beyond these headline cases, designers build symmetric nanomaterials and self-assembling cages for delivery and biosensing, and mini-protein scaffolds that serve as stable starting points for further engineering. The common thread is that all of them begin life as conditioned noise and graduate to the bench through the same funnel. The table below maps common design goals to the method and conditioning used.

Design task	Primary method	Conditioning used
De novo binder to a target	RFdiffusion + ProteinMPNN	Fixed target, interface hotspots
Enzyme / catalytic site	RFdiffusion + ProteinMPNN	Active-site motif scaffolding
Vaccine immunogen	RFdiffusion + ProteinMPNN	Epitope motif + symmetry
Symmetric nanoparticle	RFdiffusion + ProteinMPNN	Symmetry group constraint
Stable mini-protein scaffold	RFdiffusion + ProteinMPNN	Unconditional or length only

Trade-offs, Gotchas, and What Goes Wrong

The pipeline is powerful but not magic, and overstating it does the field a disservice. The first reality is the designability-versus-expressibility gap: a backbone can look beautiful and pass self-consistency yet still fail to express, fold, or stay soluble in a real cell. In-silico confidence is a strong filter, not a guarantee, which is exactly why the wet lab remains non-negotiable.

Second, experimental success rates on hard targets are modest. For well-behaved targets, success can be encouraging; for difficult binders — small molecules, flexible epitopes, tough interfaces — the fraction of designs that bind as intended can be low, and teams routinely screen dozens to hundreds of designs to get a handful of hits. Published headline numbers usually reflect favorable cases; campaign-level rates are lower.

Third, hallucination and over-optimistic prediction. A structure predictor can confidently report a fold that does not materialize at the bench, because the same biases that helped generation can fool validation — if the generator and validator share blind spots, they can agree on something that is wrong. Using a genuinely independent predictor mitigates this, which is why pairing an RFdiffusion-and-RoseTTAFold lineage design with an AlphaFold2 or ESMFold check is more convincing than validating with the same family of model that generated it. But no in-silico agreement, however strong, is a substitute for the bench; it only changes the odds.

Fourth, structure is not function. RFdiffusion produces a static backbone, but real proteins move — catalysis, allostery, and signaling depend on dynamics that a single snapshot does not capture. A geometrically perfect active site may still be a poor catalyst if the protein’s motions are wrong, if it cannot release product, or if it lacks the precise electrostatics that make enzymes fast. This is exactly why designing high-activity enzymes remains harder than designing binders: a binder mostly needs a good static interface, while an enzyme needs choreography. Fifth, compute and throughput: generating, sequence-designing, and validating thousands of candidates is GPU-intensive, and the genuine bottleneck has shifted downstream — to synthesis, expression, and assay capacity. The models can now propose designs far faster than any lab can test them, which makes the in-silico filter not a convenience but a survival necessity.

Practical Recommendations

If a team set out to run a real design campaign today, the workflow would look less like a single clever prompt and more like an engineering funnel with a hard bench gate at the end. Start by defining the goal in structural terms: not “I want a binder” but “I want a protein that contacts these residues on this target with this geometry.” The sharper the structural specification, the better the conditioning works, because RFdiffusion can only aim at a target you have actually described. For binders this means identifying the precise epitope and, ideally, the hotspot residues you want contacted; for enzymes it means knowing the catalytic geometry you need scaffolded; for immunogens it means having the epitope structure in hand. Over-generate at every stage — hundreds to thousands of backbones, many ProteinMPNN sequences per backbone — because the cost of generation is trivial next to the cost of a wasted lab slot, and the funnel will discard most of what you make. Filter ruthlessly on independent self-consistency, and never let a design reach synthesis without it; the few minutes of GPU time a validation run costs is the cheapest insurance in the entire pipeline. Above all, plan the wet lab first: your throughput for expression and assays, not your GPU budget, sets how many designs you can truly test, so design the campaign backward from how many proteins you can realistically characterize. And treat the first round as calibration — its real product is not just hits but the failure patterns that tell you how to set thresholds and conditioning for round two.

Campaign checklist:

Specify the goal as explicit backbone geometry, motif, or interface — not a vague function.
Generate many backbones per target; do not bet on a single shape.
Sample multiple ProteinMPNN sequences per backbone and keep the diverse set.
Validate with an independent predictor (AlphaFold2 / ESMFold); gate on pLDDT and RMSD.
For binders, add an explicit interface / docking check, not just monomer confidence.
Rank and select a manageable shortlist sized to your actual assay throughput.
Express, purify, and run folding plus function assays; confirm hits structurally.
Feed every bench result — wins and failures — back into the next design round.

Frequently Asked Questions

What is RFdiffusion?
RFdiffusion is a generative diffusion model for protein structure, developed by the Baker lab. It builds a new protein backbone by starting from random 3D noise and iteratively denoising it into a coherent, foldable shape, optionally steered toward a target, motif, or symmetry. It generates geometry only; a separate network assigns the amino-acid sequence.

How is protein design different from AlphaFold?
AlphaFold solves prediction — given a sequence, predict its folded structure. Design solves the inverse — given a desired shape or function, produce a new protein that achieves it. RFdiffusion does design; AlphaFold is then reused as an independent validator to check whether a designed sequence folds back to the intended backbone.

What does ProteinMPNN do?
ProteinMPNN solves inverse folding: given a backbone with no amino acids assigned, it chooses a sequence likely to fold into that exact shape. It is a graph neural network that reasons over the spatial relationships between residues and tends to produce sequences that are stable and express well in the lab.

Can AI-designed proteins be made in the lab?
Yes. Designs that pass in-silico validation are encoded as DNA, synthesized, and expressed in systems like E. coli, then purified and tested with folding and function assays. Many designs still fail at the bench, so teams typically test multiple candidates per target, but real de novo proteins are routinely produced and validated.

What are de novo binders used for?
De novo binders are newly designed proteins built to grip a chosen target tightly. They serve as research reagents, diagnostic capture molecules, and therapeutic leads — for example, proteins designed to neutralize a virus or block a receptor — offering an alternative to antibodies that can be smaller and more stable.

Do designed proteins always work?
No. A design can pass every computational filter and still fail to express, fold, or function, because static structure does not capture solubility or molecular dynamics. Success rates on hard targets are modest, which is why aggressive in-silico filtering is paired with mandatory wet-lab testing and an iterative learn loop.

Why use a diffusion model instead of just predicting and tweaking natural proteins?
Tweaking natural proteins limits you to variations on shapes evolution already explored, which is fine when a natural starting point exists but useless when it does not — there is no natural binder for an arbitrary new target, and no natural enzyme for a reaction life never needed. A diffusion model generates genuinely novel backbones, including folds that do not appear in nature, so it can produce a custom protein for a target or function that has no evolutionary precedent. That open-ended generation, steered by conditioning, is exactly what de novo design requires.

What hardware and software does a design campaign need?
The generative steps run on GPUs — RFdiffusion, ProteinMPNN, and the AlphaFold2 or ESMFold validation are all neural networks, and a serious campaign generating thousands of candidates wants substantial GPU time. The tooling is largely open source and increasingly packaged into accessible pipelines, so the harder constraint is usually not the compute but the downstream wet lab: gene synthesis, protein expression, and the assay throughput needed to test what the models propose.

AI Protein Design: How RFdiffusion Generates New Proteins (2026)

AI Protein Design: How RFdiffusion Generates New Proteins

Context and Background

How RFdiffusion Designs a Protein

The diffusion idea — start from noise, denoise to a backbone

Conditioning — telling the model what you actually want

ProteinMPNN — inverse folding from backbone to sequence

From In Silico to the Bench

Trade-offs, Gotchas, and What Goes Wrong

Practical Recommendations

Frequently Asked Questions

Further Reading

Related

Comments

Leave a Reply Cancel reply

Tag Cloud

Categories