Executive Summary

The 2024 Nobel Prize in Chemistry went to AlphaFold, for solving a half-century-old problem: predicting the three-dimensional shape a protein folds into from its amino acid sequence alone. Yet this achievement has a boundary that rarely gets mentioned. What AlphaFold draws is a protein's single most stable pose, in effect a still photo. This article looks at the distance between that still photo and the real protein.

A living protein never stops moving. It changes shape to relay signals, to recognize partners, to receive a drug. Its function comes from that motion. In June 2026, a roadmap paper co-written by 43 researchers confronted exactly this point: what deep learning solved is static structure prediction, while a quantitative understanding of how proteins move remains unsolved.

Why doesn't AlphaFold know motion? Not because the algorithm falls short. It's because the only thing we measured and turned into data was the still frame. The format of the data sets the limit on what patterns a model can learn. Pebblous has long argued that the data, not the model, builds the ceiling. Here that argument repeats, in the same shape, in the life sciences.

Key Figures

Source: arXiv:2606.08647, AlphaFold Protein Structure Database

AlphaFold's predictions sweep wide enough to cover nearly every known protein. Yet the experimental data the model actually learned from is more than a thousand times smaller, and all of it was captured in a single format: the still photo. That gap between scale and format, and the two blind spots a still-photo-only dataset leaves behind, are what the four numbers below carry.

214 million

AlphaFold predictions

Predicted structures released in the database, covering nearly every known protein

~180,000

PDB experimental structures

The experimental data AlphaFold trained on — mostly crystallized or frozen, static structures

~1/3

Low-confidence residues

Share of amino acids lacking atomic-level precision — mostly flexible or disordered regions

1.1%

Small-protein coverage

Share of EMDB structures under 50 kDa — exposing the gap in dynamics data

1

What AlphaFold Solved, and What It Left

Protein folding was one of biology's oldest riddles. A protein is built from a chain of amino acids strung in a line, and only when that chain folds itself into a particular three-dimensional shape does it begin to work. Could you compute that shape from the sequence alone? AlphaFold effectively answered the question that had stayed unsolved for 50 years, and because the accuracy of its answer rivaled experiment, the work led to a Nobel Prize.

What AlphaFold returns is the most stable form a protein takes — the single, lowest-energy structure. In photographic terms, the one best shot. For a drug researcher quickly checking the shape of a target protein, or gauging where an enzyme's active site sits, that one shot is plenty powerful. That's why more than two million researchers across 190 countries use the tool routinely.

What's left out is every moment that one shot can't hold. How the protein arrived at that shape, which region bends and how as it does its job, how its form shifts when it meets another molecule. None of this realm of motion lives anywhere in AlphaFold's output. We've obtained the protein's portrait, but what the protein does while it's alive still has to be worked out separately.

What AlphaFold Returns vs. What the Protein Actually Does AlphaFold Output State A 1 still photo Single structure · lowest-energy state The Living Protein State A State B Moves between conformations to function AlphaFold outputs the single lowest-energy structure. A living protein shifts between shapes as it works — Pebblous original diagram
▲ AlphaFold returns one structure at the energy minimum (left), while a living protein moves between multiple conformational states as it functions (right) | Pebblous original diagram

The point: AlphaFold solved the protein's still photo. What stays unsolved is the film. Knowing the structure and knowing the motion are different problems, and the latter is the problem of function.

2

Proteins Are Never Still

A protein's function almost always emerges from a change in shape. An enzyme's structure closes like a mouth shutting as it grips a substrate; a receptor relays form inward when it catches a signal; a transport protein moves like a door opening and closing to let material through. A single frozen structure can't explain how any of this happens.

Hexokinase, an enzyme, is a good example. This protein has one shape when sugar is absent and another when it has grabbed onto sugar. When it meets sugar, its two-lobed structure clamps down like a pair of tongs, wrapping the sugar inside. But AlphaFold offers up just one of the two shapes, usually the open, sugar-free state, because that state appeared more often in the training data. The same protein's other face stays outside the prediction.

Hexokinase: Conformational Change and AlphaFold's Blind Spot Open form (no sugar) Glucose binding Closed form (sugar gripped) AlphaFold usually predicts this state glucose Outside AlphaFold's prediction The same protein takes a different shape depending on its function. AlphaFold outputs whichever state appeared more in training data — Pebblous original diagram
▲ Hexokinase shifts from an open form (left) to a closed form (right) when it grips glucose. AlphaFold typically predicts only the open state | Pebblous original diagram

The more extreme case is intrinsically disordered proteins (IDPs). These proteins have no fixed shape to begin with. Changing form depending on the situation is how they work. When AlphaFold hits such a region, it outputs a "low confidence" signal. It isn't so much that the model is wrong; the very framework of assuming one correct structure doesn't fit these proteins. A large share of living proteins are inherently floppy, yet our data and our models presume a solid statue.

The point: A protein's function comes from motion. One protein moves between several shapes, and some proteins have no fixed shape at all. A still photo captures only one of those moments.

3

Why AlphaFold Can't See Motion

The reason AlphaFold doesn't know motion lies in what the model looked at and learned from. Its training data is the roughly 180,000 experimental structures piled up in the Protein Data Bank (PDB). And once you look at how those structures were made, the problem becomes clear. X-ray crystallography pins a protein into a crystal lattice to image it; cryo-electron microscopy (cryo-EM) flash-freezes a protein and photographs the halted state. Both are methods that stop the motion first, then take the picture.

On top of that, these structures are usually an average over countless molecules. They show not what shape an individual protein held at a given instant, but the shape into which uncountably many molecules settle on average. So PDB data is a record of "what it looks like," not of "how it moves." AlphaFold learned its patterns from this pile of still photos, so its output can only be a still photo too. A model can't conjure a kind of data that isn't there.

Static structure and dynamics differ in the very shape of the chain that runs from measurement to prediction. On the static side, crystal and freeze imaging becomes 180,000 PDB structures, and those structures lead on to AlphaFold's predictions — three links connected without a break. On the dynamics side, the measurement techniques clearly exist, yet their results never gather into a single, unified training dataset, so the chain already snaps at the second link. The very format the model would learn from is empty.

The data format determines the model's output Measure / collect Training data Model output Crystal / freeze imaging motion is halted 180,000 static structures PDB · still photos Static structure prediction AlphaFold · one shot Dynamics measurement smFRET · NMR · MD Sparse, fragmented data no unified database Dynamics prediction gap no format to learn from If only still photos become data, even the largest model outputs only still photos — Pebblous original diagram
▲ For static structure the measure→data→predict chain holds (top); for dynamics the data format itself is empty, so the chain breaks (bottom) | Pebblous original diagram

It's not that no method exists to capture motion. Single-molecule FRET (smFRET) observes the conformational switching of individual proteins in real time, nuclear magnetic resonance (NMR) catches atomic-level jitter, and molecular dynamics (MD) simulation computes motion over short stretches of time. The trouble is that each of these techniques peers into a different window of time, and they don't overlap. The data is sparse and fragmented, and there is still no unified dynamics database paired with the PDB structures.

The point: AlphaFold's ceiling is the format of the data, not the algorithm. Because we turned only still photos into data, the model can only learn as far as the still photo.

4

Where the 2026 Roadmap Points

The roadmap paper "Protein Dynamics Beyond Structure Prediction," released in June 2026, is a kind of consensus statement co-written by 43 researchers. They make the case that protein dynamics is by nature a stochastic, time-varying process, and so cannot be described by static coordinates alone. It is a declaration that the next science has to begin where structure prediction ends.

The core of the direction the paper lays out is not algorithms but data. The proposal: use single-molecule techniques to turn how individual molecules move over time into time-series data, and unify the heterogeneous data coming from different time windows and measurement methods into one. By pairing static structures with dynamics measurements, the goal is to let a model learn, for the first time, the format we might call "motion."

In other words, the bottleneck for the next breakthrough is not a cleverer neural network but the absence of data to train on. Just as 180,000 still photos made static structure prediction possible, only once we measure and gather the time of proteins in motion, abundantly and in a consistent format, does the next stage of dynamics prediction open up. Where the roadmap points is not a new model architecture but a new format of data.

The 2026 Roadmap: The New Data Format That Would Enable Dynamics AI Measure Unify Predict smFRET single-molecule, real-time NMR atomic-level motion MD simulation short-timescale compute Unified Dynamics DB static structures + time-series data unified format Dynamics AI Prediction predicting how proteins move over time The next breakthrough comes not from a larger model but from data infrastructure that unifies heterogeneous measurements — based on arXiv:2606.08647, Pebblous original diagram
▲ The 43-author roadmap's prescription: gather smFRET, NMR, and MD measurements into a unified dynamics database, pair with static structures, and train a dynamics AI | Based on arXiv:2606.08647, Pebblous original diagram

The point: The 43-author roadmap's prescription converges on data. Measuring protein motion as time series, and unifying scattered measurements into a single format, is the premise for the next science.

5

The Next Breakthrough Comes from Measurement

For anyone who works with data, the lesson to take from AlphaFold's story is plain. Even a Nobel-grade model can't cross the line drawn by the format of its training data. AlphaFold getting smarter won't produce a film, because there is no film in the data. A model's ceiling is set in advance not inside the model, but in what that model looked at and learned from.

So the life sciences' next AI revolution most likely won't come from a bigger model. It will come from new ways of measuring that turn the time of proteins in motion into data. What we measure, and in what format we record it, determines what a model can learn next. Innovation in measurement precedes innovation in algorithms. The data has to exist first before the model can see what comes after.

This is a scene where the AI-Ready Data principle Pebblous has steadily argued for is confirmed again, this time in the life sciences. If the data isn't prepared in a format an AI can learn from, the model cannot see beyond that boundary. Just as the still photo of the protein built the ceiling of one field, in any field that ceiling is drawn not by the size of the model but by the format of the data. The person who makes the next breakthrough is not the one who writes a better algorithm, but the one who turns into data what has not yet become data.

In closing: AlphaFold solved the still photo almost to perfection. It's just that no one has yet filmed and gathered the moving picture. The next breakthrough depends not on a bigger model, but on what we measure and turn into data.

R

References

Primary Source

  • 1.Griffié, J., Volpe, G., Olsson, S., Pereira, J. B. et al. (2026). "Protein Dynamics Beyond Structure Prediction." arXiv preprint. arXiv:2606.08647 — 43-author roadmap; main source for this article

Related Academic Literature

  • 2.AlQuraishi, M. (2024). "AlphaFold2: Why It Works and Its Implications for Understanding the Relationships of Protein Sequence, Structure, and Function." Briefings in Bioinformatics. — AlphaFold2 conformational dynamics limitations
  • 3.Zheng, L. et al. (2024). "Advantages and Limitations of AlphaFold in Structural Biology." The Protein Journal. — PDB training bias and AlphaFold structural limitations
  • 4."Advancements in characterization of protein dynamics with machine learning." npj Soft Matter. (2026). — ML approaches to protein dynamics characterization