An AI Co-Scientist Hypothesizes Only Within the Papers It Can Read

Pebblous Data Communication Team

Executive Summary

On 19 May 2026, two AI research agents appeared in Nature on the same day: Google DeepMind's Co-Scientist and FutureHouse's Robin. One pitted hypotheses against each other in a tournament; the other autonomously picked a drug candidate for a blinding eye disease. Yet the two systems share something that rarely gets named. Both can only form hypotheses inside the papers that are already open.

A Nature editorial published in the same issue nailed the point in one line: efficiency is not the same as insight. The fact that AI reads faster and combines more does not automatically carry over the insight humans draw from failure and detour. And the range in which that efficiency operates is already narrow. Sixty-four percent of biomedical papers sit behind paywalls, beyond the reach of both systems.

The ceiling of automated discovery is set not by how clever the model is, but by the boundary of the data it can read. So who is left to widen that boundary — to build the data that does not yet exist, and to open the data that stays closed?

The achievement and the limit of both systems live together in four numbers: the share that open literature holds, the share locked outside it, and what Robin actually produced inside that narrow field of view.

36%

Open-access share

Of the ~39M papers indexed in PubMed, the share freely available

64%

Behind paywalls

Biomedical papers AI research agents cannot reach

10 wks

Robin drug repurposing

Time to scan 10 diseases using only open literature

0 papers

Prior proposals

No existing paper had proposed ripasudil for dAMD

1

Two AI Researchers, Published the Same Day

Co-Scientist is built from six specialized agents on top of Gemini. An agent that generates hypotheses, one that critiques them, and one that ranks them all interlock and turn together. The hypotheses compete in an Elo tournament, and as the agents simulate a kind of scientific debate among themselves, the surviving hypotheses rise to the top. Researchers at Calico Life Sciences went on to confirm one of the system's integrated stress response hypotheses in the lab.

Robin divides the roles more sharply. Crow summarizes the literature and proposes experiments, Falcon writes in-depth technical reports, and Finch analyzes raw data such as RNA-seq. Given dry age-related macular degeneration — a condition affecting roughly 196 million people worldwide — as input, the system narrowed about 400 papers down to 30 candidates and singled out ripasudil, a glaucoma drug, as a new indication candidate. No prior study had ever linked the drug to dry AMD.

The two systems differ in character, but they draw the raw material for their hypotheses from the same well. Co-Scientist leans on web search and open data such as ChEMBL and UniProt. Robin reads only open-access literature. Both run on top of repositories anyone can open — arXiv, PubMed Central, Semantic Scholar. Neither sees inside subscription databases.

▲ Both systems draw hypothesis material from the same open-access repository. The remaining 64% lies beyond their field of view. | Pebblous original diagram (Fig. 1 reinterpretation)

What the two systems showed is clear. Read the open literature quickly and recombine it in new ways, and you can find connections people had missed. Linking ripasudil to dry AMD is the proof. But every piece of that connection was already scattered across papers that were open to begin with.

2

The Line the Editorial Drew: Efficiency vs. Insight

Nature ran an editorial alongside the two papers in the same issue. It was titled "Why AI cannot do good science without humans," and its core sentence reads:

"AI systems might offer greater efficiency in some instances, but we don't yet know whether greater efficiency equates to greater insight."

AI systems may offer greater efficiency in some situations, the editorial says, but we do not yet know whether that efficiency is the same thing as deeper insight. In the same vein, it added that human wisdom, empathy, and even their messiness are as much a part of progress as process and efficiency.

What that sentence points to is more than sentiment. Human scientists learn something even from failed experiments, dead ends, and attempts that never became papers. Yet much of that learning is never recorded in the published literature. What an AI research agent reads is only the results that survived to publication. The wisdom of abandoned attempts sits outside the corpus from the start.

3

The Data Boundary Is the Ceiling of Discovery

When we discuss the performance of an AI research agent, we usually look at the model's reasoning ability — bigger models, smarter tournaments, more refined division of labor among agents. But the real ceiling the two systems revealed together lies elsewhere: the boundary of the data the system can read.

The numbers show that boundary. About 39 million biomedical papers are indexed in PubMed, but only about 14 million of them — 36% — are freely available. The remaining 64% sit behind subscriptions or paywalls. The open share splits by field. Astronomy and tropical medicine clear 80%, while pharmacology and chemical engineering fall below 10%. When AI tries to find a new drug, it ends up knocking on the door of the field that is most tightly closed.

▲ The pharmacology field that AI uses most for drug discovery has an open-access rate of just 8%. | Pebblous original diagram (Fig. 2 reinterpretation)

Three kinds of data lie outside this boundary: published papers locked behind paywalls, unpublished negative results and failed experiments, and proprietary clinical data held by companies. For the two systems, this territory is not data that is hard to reach — it is data that does not exist at all. You cannot even form a hypothesis about what you cannot see.

So what Co-Scientist and Robin found is the discovery the open literature allowed. The discovery held inside the locked literature remains locked. Making the model smarter does not raise this ceiling. To raise it, you have to widen the readable data itself.

4

The People Who Widen the Boundary

The question then moves from the model to the data. Who makes the data that does not yet exist? Designing new experiments, running them, and leaving behind results is still a human job. Who opens the data that stays closed? Getting past paywalls, refining proprietary data into a shareable form, and recording unpublished negative results is a human job too.

And making the data is not enough on its own. There are already reports of AI interpreting the same data differently from humans. So alongside who makes the data, who verifies its quality determines how much we can trust automated discovery. The people who make the data and the people who verify it effectively draw the upper bound of AI discovery.

An AI co-scientist is clearly a fast and capable colleague. But the world that colleague can see reaches only as far as the data we have made readable. Beyond that line remains the territory people have to make and verify first before handing it over.

Editor's Note. Pebblous makes data that AI can read and write — AI-Ready Data — and verifies its quality. The boundary this piece points to, the data that does not yet exist or stays closed, is exactly where that work is headed.

R

References

Academic Literature

1.Gottweis, J., Weng, W.H., Daryin, A. et al. (2026). "Accelerating scientific discovery with AI co-scientist." Nature. doi.org/10.1038/s41586-026-10644-y
2.Ghareeb, A.E., Chang, B., Mitchener, L. et al. (2026). "A multi-agent system for automating scientific discovery." Nature. doi.org/10.1038/s41586-026-10652-y
3.Nature Editorial (2026). "Why AI cannot do good science without humans." Nature 653, 650. doi.org/10.1038/d41586-026-01551-3
4.Zheng et al. (2026). "Science behind a paywall: restricted access limits the promise of artificial intelligence." Learned Publishing. doi.org/10.1002/leap.2059

Industry Analysis

5.IntuitionLabs (2026). "Full-text access: the main barrier for AI research tools." intuitionlabs.ai