RFdiffusion 2: How AI Now Designs Functional Proteins (2026)

In late 2022, designing a protein that would bind a chosen target with nanomolar affinity took years of directed evolution, library screening, and lucky breaks. By 2024, a graduate student with a GPU, the RFdiffusion code, and a $5,000 gene-synthesis budget could ship a functional binder in a weekend. RFdiffusion 2 — sometimes called RFdiffusion-AllAtom or RF2-AA in the community — is the version that crossed the line from “design a backbone shape” to “design a backbone, a sequence, a binding pocket, and the small molecule that sits in it, all in one pass.” This piece unpacks what changed, how the pipeline actually runs, the success rates being reported on real targets, and where the model still falls down.

If you only read one paragraph: RFdiffusion 2 is a diffusion model whose denoiser is a RoseTTAFold-AllAtom network, fine-tuned to invert a noising process over every heavy atom in a protein-ligand complex. Paired with ProteinMPNN for sequence assignment and AlphaFold 2/3 for in silico validation, it has converted de novo binder design from a research curiosity into a routine engineering workflow at a few dozen labs. The remaining open problems — function beyond binding, conformational dynamics, allostery, true negative-design specificity — are real, but the ones the model has solved are no longer in dispute.

Context: protein design before, during, and after AI

Before 2020, de novo protein design was largely a Rosetta story. You sampled backbones from idealized fragments, threaded sequences with energy-function-driven Monte Carlo, and validated by expression. Success on a hard target was a multi-year PhD thesis. The hit rates on novel binder campaigns sat in the 0.01 to 0.1 percent range — you’d order a thousand designs and hope one bound at micromolar.

Two changes broke that ceiling. First, AlphaFold 2 (Jumper et al., 2021) gave the field a reliable structure predictor — suddenly you could screen a million designed sequences in silico before ordering any DNA. Second, diffusion models matured in image generation, and the Baker Lab realized the same denoising machinery could generate protein backbones if you replaced pixels with 3D atomic coordinates and used a structure-aware backbone (RoseTTAFold) as the score network.

RFdiffusion v1, published by Watson et al. in Nature in July 2023, demonstrated this concretely. On binder design campaigns against SARS-CoV-2 spike, IL-7Rα, PD-L1, and a handful of other targets, RFdiffusion paired with ProteinMPNN (Dauparas et al., 2022) and AlphaFold2 filtering produced experimentally validated binders at hit rates of 1 to 20 percent — two to three orders of magnitude better than prior methods. It was, by any honest reading, one of the largest single-paper jumps in the field’s history.

But v1 had a hard ceiling: it operated only on protein backbones. It could not see ligands, cofactors, metals, or nucleic acids. Enzyme design — arguably the highest-value application — required hand-grafting active sites after the fact, with predictably poor success. Membrane proteins and GPCRs, where the binding pocket is defined by both protein and lipid context, were nearly out of reach. A successor had to handle atoms, not just amino-acid centroids. That successor — RFdiffusion-AllAtom (Krishna et al., 2024) and the families of models the community now collectively calls “RFdiffusion 2” — is what made the difference.

How RFdiffusion 2 works

At a high level, RFdiffusion 2 is a denoising diffusion probabilistic model whose denoiser is a fine-tuned RoseTTAFold-AllAtom network. You start from random 3D coordinates — the protein equivalent of static noise — and iteratively denoise toward a structure that satisfies whatever constraints you specified.

Figure 1 — RFdiffusion 2 architecture. The denoiser is an SE(3)-equivariant RoseTTAFold-AllAtom backbone. Conditioning inputs (hotspots, symmetry, motif scaffolds, ligand pockets) steer denoising at every step.

Diffusion over backbone coordinates

The forward process is straightforward: take a real protein structure from the PDB, add increasing amounts of Gaussian noise to its Cα coordinates over T = 200 steps, and at step T you have something indistinguishable from random. Training the model means teaching a neural network to invert one step at a time — given x_t, predict x_{t-1}. The clever bit is the parameterization. RFdiffusion does not predict the noise residual directly; it predicts the clean structure x_0 at every step, then mixes that prediction with the current noisy state. This makes long-range geometric coherence easier to learn — at any point during sampling, the model has an explicit guess about what the final fold looks like, and that guess steers the rest of denoising.

The denoiser is SE(3)-equivariant, which is the formal way of saying that rotating or translating the input rotates or translates the output the same way. This is a hard constraint to bake into a neural network and it matters enormously for proteins, because the absolute coordinate frame is meaningless — only relative geometry counts. RoseTTAFold’s three-track architecture (1D sequence, 2D pair, 3D coordinates) already had equivariant frame updates, and RFdiffusion inherits them.

RoseTTAFold structure prior

The denoiser is not a freshly initialized network. It is a fine-tuned RoseTTAFold — the structure predictor Baker Lab published in 2021 as a contemporaneous and complementary alternative to AlphaFold 2. RoseTTAFold’s prior on what a real protein looks like — secondary-structure propensities, packing densities, hydrogen-bond geometries, disulfide patterns — comes essentially for free. Diffusion training only has to learn the trajectory from noise to structure, not the structure manifold itself.

This is also why RFdiffusion-style models generalize so well to constraints they weren’t explicitly trained on. The prior is so strong that even when you give the model an unusual symmetry, an exotic hotspot pattern, or a topology nobody has shipped before, the denoiser will refuse to leave the realm of plausible folds.

The training corpus matters too. RFdiffusion was fine-tuned on the same Protein Data Bank snapshot RoseTTAFold trained on — roughly 200,000 experimentally determined structures, filtered for resolution and sequence identity. That sounds like a lot, but the protein universe is vastly larger; the PDB is heavily biased toward soluble, well-behaved, crystallizable proteins, and that bias propagates into anything the model generates. A practical consequence is that designs tend to look like idealized cousins of natural folds: clean secondary structure, well-packed hydrophobic cores, hydrogen bonds in textbook geometries. That is usually a feature — designs that look “too natural” tend to express and fold reliably — but it is also why the model struggles with intrinsically disordered regions, very long loops, and unusual topologies.

All-atom support — the v2 leap

In v1, the network’s output was four atoms per residue: N, Cα, C, O — enough to define the backbone, but the side chains had to be added later by ProteinMPNN. That worked when the design problem was “make a backbone that fits this target shape.” It broke when the problem became “make a backbone with a pocket that holds this specific small molecule in this specific orientation.”

RFdiffusion 2 (the AllAtom variant from Krishna et al., 2024, and its successors) denoises all heavy atoms simultaneously — backbone, side chains, ligand atoms, metal ions, nucleic acid atoms — within the same diffusion process. Conditioning on a ligand SMILES string and a desired pose gives you a backbone whose side chains are already positioned to make the right hydrogen bonds, salt bridges, and π-stacking contacts to that ligand. The output is a fully realized enzyme active site, not a backbone you then hope is graftable.

A naming note worth flagging: the community uses “RFdiffusion 2,” “RFdiffusion-AllAtom,” and “RF2-AA” somewhat interchangeably, and the Baker Lab’s own preprints sometimes treat them as distinct models with shared lineage. For the purposes of this article, “RFdiffusion 2” refers to the AllAtom family of models released after mid-2024, including the open-source RFdiffusionAA codebase and the closed-source variants used in Baker Lab’s enzyme-design papers.

The implementation detail that makes all-atom denoising tractable is the atom-frame representation. Rather than treating every atom as an independent point, the model groups atoms into rigid frames — backbone N-Cα-C, side chain χ angles, ligand rotatable bonds, metal coordination geometry — and predicts the frame parameters at each diffusion step. This drops the effective dimensionality by an order of magnitude relative to free-atom diffusion, keeps long-range geometric constraints (chirality, bond lengths) automatically satisfied, and lets the network share parameters across chemically similar substructures. The cost is a more elaborate noising schedule: rotations of rigid bodies live on SO(3) and need a specific noise distribution (the isotropic Gaussian on SO(3) used in FrameDiff and adopted here), not the standard Gaussian you would use on Euclidean coordinates.

The design pipeline

A single RFdiffusion 2 forward pass produces one structure. A real campaign produces hundreds of thousands of structures, filters them through several models, and ends with a few dozen synthesized genes. The pipeline matters as much as the architecture.

Figure 2 — End-to-end design pipeline. RFdiffusion 2 generates backbones, ProteinMPNN assigns sequences, AlphaFold validates, and only the top fraction reach the wet lab.

Step 1 — RFdiffusion 2 generation. You specify the target structure (usually a PDB of the protein you want to bind), one or more hotspot residues on that target (the residues your designed binder should contact), a desired binder length (typically 60 to 100 residues for de novo binders), and any additional constraints — symmetry, motif scaffolds, ligand SMILES, secondary-structure bias. The model runs T = 50 to 200 denoising steps. A typical campaign generates 10,000 to 100,000 backbones in this step. On an H100, you can generate roughly 1 to 5 backbones per second depending on size and step count.

Step 2 — ProteinMPNN sequence assignment. RFdiffusion 2 emits coordinates, but the sequences it produces are not optimal — the AllAtom variant gives reasonable side chains, but ProteinMPNN, a graph neural network trained specifically on the inverse-folding problem (given a backbone, what sequence folds to it?), produces more thermostable and expressible sequences. You typically sample 2 to 8 sequences per backbone, multiplying the candidate count.

Step 3 — AlphaFold validation. This is the filter that actually carries the campaign. For every (backbone, sequence) pair, predict the structure with AlphaFold 2 or AlphaFold 3, compute three metrics: pLDDT (per-residue confidence, want above 80), pAE at the interface (want low, indicating high confidence in the relative pose of binder and target), and ipTM (interface predicted TM-score, want above 0.8). The standard cutoff for ordering DNA is “predicted structure within 2 Å RMSD of the designed backbone AND ipTM > 0.8 AND interface pAE < 10.” This pipeline typically passes 0.1 to 1 percent of candidates.

Step 4 — wet lab. Order 10 to 100 genes, express in E. coli or yeast, purify, and assay binding by bio-layer interferometry (BLI), surface plasmon resonance (SPR), or — for serious campaigns — structure determination by cryo-EM. Reported binder hit rates after this funnel sit between 1 and 30 percent depending on target difficulty.

The throughput math is brutal but doable. A two-person team running RFdiffusion 2 against a moderately hard target can go from “we picked the target on Monday” to “we have validated binders in hand” in 6 to 10 weeks, at a reagent cost of $20,000 to $80,000. That used to be a 3-year PhD project costing ten times as much.

A few practical gotchas the published papers under-emphasize. Hotspot selection dominates outcome. If you pick the wrong target residues to bias the binder toward, the campaign is dead before it begins — most failed campaigns we have seen are hotspot-selection failures, not model failures. Hotspots should ideally be solvent-exposed, structurally rigid (low B-factor in the reference structure), and biologically meaningful for the function you want to disrupt or activate. Designed-vs-predicted RMSD is necessary but not sufficient. A design can pass the standard 2 Å RMSD cutoff and still fail at the bench because AlphaFold’s confidence does not always reflect interface chemistry — it scores geometric plausibility, not energetic favorability. Many teams add a Rosetta interface-energy filter (ddG_cross) as a fourth metric to catch designs that look right but have poor shape complementarity or buried-polar penalties. Expression and solubility are still real bottlenecks. Even sequences with perfect computational metrics fail roughly 20 to 40 percent of the time at the expression step, usually due to misfolding or aggregation in E. coli. Some groups now route designs through a small-batch yeast expression pre-screen before committing to large-scale purification.

Real-world results

Headline success rates only mean something when broken down by target class. RFdiffusion 2 has been benchmarked across roughly four families of design problems, with very different outcomes in each.

Figure 3 — Reported binder success rates across target classes. Easy targets see double-digit hit rates; GPCRs and enzyme active sites remain harder, but AllAtom narrows the gap.

Easy targets — flat hydrophobic surfaces, viral spike RBDs. Cao et al. (2022) on SARS-CoV-2 spike reported hit rates approaching 25 percent for some binders, and follow-on work with RFdiffusion has stayed in the 5 to 20 percent range. These targets reward AI design because the binding surface is large, well-defined, and tolerant of slightly imperfect docking.

Medium targets — cytokine receptors, soluble immunoreceptors. Bennett et al. (2023) and follow-ups have reported 2 to 15 percent hit rates against IL-7Rα, PD-L1, the insulin receptor ectodomain, and a battery of cytokine receptors. These targets require precise hotspot placement but have well-resolved structures and are accessible to standard expression and purification.

Hard targets — GPCRs, ion channels, membrane proteins. Without AllAtom, RFdiffusion v1 hit rates here were typically under 1 percent. GLP-1R, CXCR4, and similar receptors have small, highly polar binding pockets where backbone-only design simply does not capture enough of the relevant chemistry. With RFdiffusion 2’s all-atom denoising, recent reports place hit rates at 3 to 8 percent — still harder than soluble targets, but no longer hopeless.

Enzyme active sites. This is where RFdiffusion 2 unlocks a category v1 could not address. Designed serine hydrolases, retro-aldolases, and Kemp eliminases reported in 2025-2026 preprints achieve k_cat values of 1 to 100 per second — orders of magnitude better than the earliest Rosetta-designed enzymes a decade ago, though still typically below natural enzymes’ k_cat values.

A consistent finding across all four classes: AlphaFold filtering does the work. Designs that pass strict ipTM and interface pAE cutoffs experimentally validate at roughly 30x the rate of designs that fail those cutoffs. The diffusion model generates a hypothesis space; AlphaFold tells you which hypotheses are coherent.

Two caveats on the headline numbers are worth stating in plain English. First, “hit rate” definitions vary across papers — some count any detectable binding (often K_D in the high-micromolar range), others count only sub-100-nM binders, and a few count only binders that also pass an orthogonal cell-based functional assay. When comparing campaigns, always check what the denominator and the cutoff are. Second, the easy and medium target classes are partly easy because their reference structures are excellent — high-resolution X-ray or cryo-EM with well-resolved side chains in the region of interest. Targets where the only available structure is a homology model or a low-resolution cryo-EM map underperform consistently, regardless of which generative model is used. The unspoken prerequisite for an RFdiffusion 2 campaign is a target structure good enough to define a real hotspot.

A worked example: designing a binder against a kinase

To make this concrete, consider a hypothetical (but representative) campaign: design a 90-residue binder that recognizes the kinase domain of EGFR in its active conformation, with the goal of blocking dimerization. The team starts with the 1M17 PDB structure as the target and picks four hotspot residues on the dimer interface — leucines and phenylalanines that form the activation arm contact. They condition RFdiffusion 2 on the target plus hotspots, set length = 90, request 50 denoising steps, and generate 50,000 backbones overnight on a four-H100 node. ProteinMPNN samples four sequences per backbone, giving 200,000 candidates. AlphaFold 3 predicts each (binder, target) complex in a chunked batch over 36 GPU-hours; the team filters on ipTM > 0.82, interface pAE < 8, and a Rosetta interface-energy cutoff. About 600 designs pass.

A diversity filter (cluster by Cα RMSD, keep one representative per cluster) cuts that to 96 unique designs, sized to fit a single 96-well synthesis order. Genes arrive in 12 days from Twist; the team expresses in E. coli BL21(DE3) with a SUMO tag, purifies by IMAC and size-exclusion, and screens binding by BLI on a Sartorius Octet. Twenty-one of 96 designs express solubly. Of those, four bind EGFR with K_D below 100 nM, and one of those four binds at 4 nM — a hit rate of 4 percent on ordered designs, or 19 percent on expressed designs. The 4 nM binder is then characterized by cryo-EM, validated as a true dimer-interface blocker in a cell-based phospho-EGFR assay, and queued for in vivo work. End-to-end timeline: 11 weeks. Total reagent and CRO cost: ~$54,000.

This is a synthesized but representative narrative — actual published campaigns vary in scale, target, and reported metrics — but it captures the workflow most teams run in 2026.

What changed in v2

RFdiffusion 2 is not a single new model so much as a set of capability extensions over v1, all enabled by all-atom denoising.

Figure 4 — Capability comparison between RFdiffusion v1 and v2. AllAtom denoising unlocks ligand-aware, multi-state, and enzyme-design workflows.

All-atom generation. Side chains, ligands, metals, and nucleic acid atoms are denoised jointly with the backbone. This means pocket geometry and protein scaffold co-evolve during sampling, instead of pocket being grafted onto a fixed backbone after the fact.

Ligand-aware design. You can condition on a small-molecule SMILES and a desired pose. The model has been benchmarked on designing pockets for cofactors like NADH, heme, FAD, and on small-molecule drugs. This makes RFdiffusion 2 the first practical tool for de novo enzyme and biosensor design at any scale.

Multi-state design. Some functional proteins must adopt at least two conformations — receptors that switch on ligand binding, allosteric enzymes, motor proteins. RFdiffusion 2 supports conditioning on multiple target structures simultaneously, asking the model to find a sequence whose energy landscape supports both states. This remains a research frontier rather than a routine workflow, but the v1 architecture could not even pose the problem.

Better symmetry and motif scaffolding. v2 inherits and improves v1’s a

RFdiffusion 2: How AI Now Designs Functional Proteins (2026)

RFdiffusion 2: How AI Now Designs Functional Proteins (2026)

Context: protein design before, during, and after AI

How RFdiffusion 2 works

Diffusion over backbone coordinates

RoseTTAFold structure prior

All-atom support — the v2 leap

The design pipeline

Real-world results

A worked example: designing a binder against a kinase

What changed in v2

Related

Comments

Leave a Reply Cancel reply

Tag Cloud

Categories