Robust AI Vision From Data Order, Not Volume

Executive Summary

For the first few months of life, a human infant lives in a blurry, black-and-white world. Visual acuity, color vision, and the ability to tell light from dark all open slowly, over months. When we train an AI vision model, we do the opposite: we pour in sharp, full-color images at full resolution from day one. A 2026 study in Nature Machine Intelligence suggests that difference may have been the problem all along. The point was not more data. It was a different order — feeding the model in the sequence a baby comes to see the world.

The team created no new data. They filtered existing images so the training set moved from blurry to sharp, from grayscale to color, from low contrast to high — reproducing an infant's developmental order as a curriculum, nothing more. The result was the strongest shape bias reported to date, together with an eye that held up better against image corruption and adversarial attack. In a related study, shuffling that order at random pulled the effect all the way back to baseline.

The Pebblous blog has argued more than once that clean data is not the same as usable data. This study pushes that argument one step further. Refining data until it is usable turns out not to be enough; what you expose, and in what order, is what builds robustness. It is a case for reading data quality not as a state (clean) but as a trajectory (curriculum).

22%

Standard CNN shape bias

The other 78% leans on texture — fragile to corruption and attack

19.8→29.1%

Shape bias (CATDiet)

+9.3 pts once fed in infant order

25.0→18.8%

Corruption error (mCE)

VAC curriculum, improved on CIFAR-10-C

Shuffle → 0

Effect vanishes

Random order regresses to baseline — the order itself is the point

1

The Chronic Flaw in AI Vision — It Sees Texture, Not Shape

When we say an AI recognizes a cat, we tend to assume the model is looking at what a cat looks like. Often it isn't. A standard CNN trained on ImageNet leans on surface texture more than silhouette when it recognizes an object. It responds first to high-frequency patterns — the fine grain of cat fur, the repeating stripes of a zebra.

Geirhos and colleagues showed this cleanly in 2019. Show a CNN an image with the shape of a cat but the texture of elephant skin, and it answers "elephant." Show the same picture to a person, and most say "cat." When shape and texture conflict, people trust shape and machines trust texture. This is texture bias.

The numbers make the gap plain. A ResNet-50 trained on standard ImageNet has a shape bias of just 22%, meaning the other 78% rides on texture. Even a large Vision Transformer (ViT) sits at only around 55%, well short of the overwhelming shape-first preference people show.

Just how wide that gap is becomes clear when the texture is stripped away. On images that keep the shape cue but remove surface pattern, people held 76% accuracy while the CNN collapsed to 28%. People stay upright on shape when the pattern is gone; the machine, with no texture left to lean on, loses its way.

▲ Shape recognition accuracy after texture removal — humans hold 76% while the CNN collapses to 28% | Original Pebblous diagram (reinterpreting Geirhos et al. 2019)

An eye that leans on texture can post high accuracy on paper and still buckle in the field. Let an image blur slightly, pick up noise, or gain compression artifacts, and performance drops sharply. It also falls easily to adversarial attacks — tiny pixel perturbations, invisible to the human eye, engineered to induce misclassification. Step even a little outside the training distribution and it breaks. An eye that has memorized surface patterns loses its footing the moment those patterns wobble.

Texture bias did not go away on its own as we added more data. We spent decades scaling models and datasets, yet shape-based vision and robustness never simply followed from scale. So researchers change the question. Not how much, but how to feed it.

2

Why a Baby Sees Only Black and White at First

A newborn's eyes are very different from an adult's. Acuity is a blurry 20/400 to 20/600, the cone cells that separate colors are not yet mature, so the world is close to black, white, and gray. The power to tell light from dark, or contrast sensitivity, is low too. For the first few weeks, a baby responds only to large, high-contrast patterns.

These constraints lift in sequence over several months. By one to two months, red starts to register; by two to three months, color and contrast grow together. By four to six months, acuity climbs to around 20/40 and color vision approaches an adult's. This progression — from blurry grayscale to sharp color, from low contrast to high — varies little from one child to the next.

The table below lays out that sequence along three axes: acuity, color vision, and contrast sensitivity.

Age	Acuity	Color vision	Contrast sensitivity
Birth	20/400–20/600 (very blurry)	Black, white, gray only	Very low (only high-contrast patterns)
1–2 months	Improving	Red begins to register	Low
2–3 months	Improving	Red and green strengthen	Improving
4–6 months	~20/40	Near adult	Greatly improved
12 months	Adult level	Well developed	Still maturing

▲ Infant visual development trajectory — acuity, color vision, and contrast mature in sequence. Color vision's delayed onset (~1–2 months) is characteristic | Original Pebblous diagram

For a long time this blurry start was read as mere immaturity — a defect to grow out of as fast as possible. The recent view is different. A low-resolution, low-contrast, grayscale start may in fact be a developmental scaffold. In a blurry world with the detail erased, there is nothing to cling to in the way of fine texture. What remains is the big picture — global shape. The brain gets a firm hold on that shape first, and only then descends into detail.

Blur looks like it takes information away, but what it really does is decide what to attend to. When the shortcut of texture is blocked, learning grabs the sturdier cue of shape first. There was a reason, it turns out, for a baby's eyes to open slowly.

3

The Developmental Visual Diet — Designing Order, Not Data

The Developmental Visual Diet (DVD), published by Zejin Lu and colleagues in Nature Machine Intelligence, ports this developmental order straight into an AI training curriculum. The premise is a simple inversion: instead of feeding sharp color in bulk from day one, open the images in the order a baby comes to see the world.

The key is that they created no new data. Synthesizing decades of developmental-psychology research, the team filtered existing images to match each developmental stage, moving three axes together along the infant's trajectory.

• Acuity: start blurry and grow sharper (the trajectory from 20/400 at birth toward an adult's 20/20).
• Contrast sensitivity: a gradual shift from low contrast to high.
• Color vision: begin desaturated and grayscale, then move gradually toward full color.

▲ DVD curriculum attribute exposure by phase — color starts nearly closed in the early phase and only opens fully in the late phase | Original Pebblous diagram (reinterpreting Lu et al. 2026)

The same idea is ripening in several labs at once. IIT Delhi's VAC (Visual Acuity Curriculum) applies a strong Gaussian blur early in training and slowly lifts it as training proceeds — treating the first 20% as a "deprivation period" of maximum blur, and mixing earlier blur back in to prevent forgetting. NTU Singapore's CATDiet handles saturation, resolution, and temporal continuity together, warming up on an infant diet for the first 30% before switching to standard augmentation.

The common thread is clear. They did not change the list of data; they designed the order in which that data is opened.

4

Order Builds Shape — the Results

For changing nothing but the order, the results were large. The model trained with DVD showed the strongest shape bias reported to date. It surpassed the previous state of the art on abstract shape recognition and held up better against both image corruption and adversarial attack. Across several measures of robustness, it moved a step closer to human judgment.

The numbers from related work point the same way. CATDiet lifted shape bias from a baseline of 19.8% to 29.1% (+9.3 pts) and cut the error rate on the CO3D corruption benchmark from 86% to 72%. VAC reduced the corruption error (mCE) on CIFAR-10-C from 25.03% to 18.78% — more than eight points. Neither gain came from feeding in more images.

▲ DVD curriculum results — both shape bias and corruption robustness improve, with no additional images | Original Pebblous diagram (reinterpreting CATDiet/VAC results)

The most striking piece is the counter-experiment. When the CATDiet team reversed or randomly shuffled the feeding order, the gains disappeared and performance returned to baseline. Same images, same filters, same volume — but scramble the order and the effect was gone. It is the cleanest evidence that the order itself is the cause.

A biological trace showed up too. The curve along which the CATDiet model absorbed information as it trained overlapped with how synaptic density develops in the primary visual cortex (V1) of the macaque monkey. Imitate the infant's developmental order, and even the trace of that development seems to follow.

Lu's team sums up the conclusion in a sentence: robust AI vision is built not by how much a model learns, but by how it is guided to learn. That points to a different axis entirely from the direction of the past — where we grew the volume of data.

5

Data Quality: From a State to a Trajectory

Here we return to the perspective of the people who work with data. We usually see data quality as a state: are there missing values, are the labels correct, have duplicates and noise been removed? These questions make data "usable." Yet the DVD experiment shows that refining data into a usable state is not enough to bring robustness along with it — because with the very same clean data, scrambling the order in which it is opened made the effect disappear.

So one more question is added. Not only what you put in, but in what order you expose it. The table below places three ways of looking at data side by side.

Lens	Core question	Limit / implication
Volume (scale)	How much do we feed?	Scaling alone never closed the texture-bias and robustness gap.
State (clean)	How clean is it?	Even clean, without an order it did not yield a shape-seeing eye.
Trajectory (curriculum)	In what order do we show it?	Order builds robustness. The direction this research points.

This shift is not confined to images. In large language model (LLM) training, too, curriculum strategies that expose high-quality, high-difficulty data later are taking hold. Arranging data turns out to separate performance as much as selecting it does. And the DVD family got its results without buying or making new data — by transforming what it already had into blurry, desaturated versions. In getting robustness from small, low-spec resources, it also speaks to the conversation about data efficiency.

The question I want to leave with readers is one. Does our data pipeline manage only what goes in, and leave the order of exposure untended? The order sheet of a pipeline is usually set by convenience, not by performance. DVD is a reminder that the order sheet itself is a thing to be designed.

Editor's Note

What Pebblous has emphasized in talking about AI-Ready Data comes down to the same point. It is not about cleaning data once and being done, but about how you prepare it and in what flow you handle it — that is what changes a model's outcome. The DVD research adds one more piece of evidence, from outside the lab, for that claim. A perspective that widens data quality from a matter of state to a matter of trajectory will, we think, prove its worth more and more often.

R

References

KeyKey Paper

1.Lu, Z., Thorat, S., Cichy, R. M., & Kietzmann, T. C. (2026). "Adopting a human developmental visual diet yields robust, shape-based AI vision." Nature Machine Intelligence. DOI: 10.1038/s42256-026-01228-6. (arXiv:2507.03168)

RelatedRelated & Background Work

2.Raj, A., Prajaapat, K., Gandhi, T., & Arora, C. (2025). "Mimicking Human Visual Development for Learning Robust Image Representations." arXiv:2512.14360.
3.Cai, Y., Lin, Q., Nunna, B. S., & Zhang, M. (2025). "Learning to See Through a Baby's Eyes: Early Visual Diets Enable Robust Visual Intelligence in Humans and Machines." arXiv:2511.14440.
4.Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F. A., & Brendel, W. (2019). "ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness." ICLR 2019.