Executive Summary

There is a medical AI that roughly 40% of licensed US physicians query every month. In January 2026, OpenEvidence raised $250 million in a Series D, lifting its valuation from $1 billion to $12 billion in eleven months. The conventional wisdom says the smarter model wins. This case proves the opposite, and this report follows why: the moat lives not in the model but in the sources it has licensed.

The LLM technology under OpenEvidence is not fundamentally different from ChatGPT or Gemini. The difference is what it answers from. The company has licensed more than 300 peer-reviewed medical journals — NEJM's full archive since 1990, JAMA, Cochrane — and forces a source citation on every answer. General-purpose LLMs fail in medicine not for lack of reasoning but for lack of verifiable provenance. One caveat throughout: most of the adoption and revenue figures here are company-stated or third-party estimates, so it is safer to read the structure than the numbers.

For Pebblous readers, this is hard evidence that "borrow the model, fence in the data" holds at a $12-billion scale in a high-stakes, high-regulation domain. What cannot be copied is not the model but the licensing agreements and the curated, provenance-tracked data behind them.

~40%

US physicians using it

~18M queries / month (Dec 2025, company-stated)

12×

Valuation jump

$1B → $12B in 11 months (Series D, Jan 2026)

300+

Licensed peer-reviewed journals

NEJM, JAMA, Cochrane; 35M+ papers indexed

~90%

Estimated gross margin

Free product + pharma ads (Sacra estimate)

1

Eighteen Million Questions

What does a doctor do when they get stuck mid-visit? They used to ask a colleague, page through a textbook, open a search box. Increasingly, in the US, they ask OpenEvidence. By the company's account, as of December 2025 about 40% of licensed US physicians use the tool, and roughly 18 million clinical queries pass through it every month. The January 2026 Series D of $250 million pushed the valuation to $12 billion — twelve times the $1 billion mark of February 2025, eleven months earlier.

A 12× jump in eleven months is not a verdict on model quality. It is a verdict on the depth of adoption. What investors are buying is not a benchmark score but the hundreds of thousands of physicians who have started to lean on this tool in the exam room every day. What makes the phenomenon unusual is how large that denominator is.

The depth of adoption — a vast denominator

There are about 1.03 million active physicians in the United States (AAMC, 2025). OpenEvidence says 760,000 of them have registered — roughly 87.7% of doctors in direct patient care. The usage curve is just as steep. Monthly clinical queries grew from about 350,000 in July 2024 to 18 million by December 2025, a roughly 2,000% rise in a year, and peaked at one million in a single day on March 10, 2026. The company claims that "more US physicians use it than all other medical AI tools combined." Most of these figures carry the caveat of being company-cited, but the direction and slope of the growth are consistent across sources.

The table below traces the growth in monthly clinical queries. The absolute numbers are company-stated, but the shape of a curve that climbs by double-digit multiples in a single year tells the story of an adoption explosion.

Date Monthly clinical queries (est.) Context
Jul 2024 ~350,000 Just after Series A
Dec 2025 ~18 million ~40% of doctors using it (company-stated)
Mar 10, 2026 1 million in one day Single-day peak
Apr 2026 (65% adoption claimed) Updated company claim, no independent verification

Monthly clinical query trend. Absolute figures are company-stated (BusinessWire, CEO interviews) with no independent third-party verification. Sources: BusinessWire (Jan 2026), AI2Work (Apr 2026).

Investors marked the value up 12× in eleven months not because the model got smarter, but because doctors now reach for this tool by habit at the point of clinical decision — and because that habit is hard to copy. The natural next question follows: what built the habit?

2

Not the Model, the Sources

The LLM technology beneath OpenEvidence is not fundamentally different from a competitor's. The company is understood to run on top of frontier models from the likes of OpenAI and Anthropic. So what separates it from typing the same clinical question into ChatGPT? The answer is not the model but what that model answers from.

OpenEvidence has licensed more than 300 peer-reviewed medical journals: NEJM's entire archive since 1990, JAMA's eleven-journal specialty network, plus Cochrane reviews and NCCN guidelines. The company has indexed over 35 million papers and built a retrieval-augmented generation (RAG) pipeline so that every answer is generated only from this licensed literature. The decisive last step is this: every answer is forced to carry a source citation. The physician receives an answer and, in the same moment, can confirm which paper and which sentence it came from.

The citation on every answer is the product

A general-purpose LLM has opaque training-data provenance. You cannot trace which text produced an answer, and citation is not guaranteed. In medicine, that difference is decisive. A physician will not act on "this drug interaction works like so" on its own. They need the peer-reviewed basis behind that sentence, verified well enough to use in a clinical decision that carries liability. What OpenEvidence sold is not an answer but a verifiable answer.

This is where it becomes clear that, even with the same LLM, what you answer from is everything. Anyone can borrow the model. No one can borrow NEJM, JAMA, and Cochrane licensed and curated into a clinically usable form. The moat is in the provenance, not the reasoning.

The product is simple to state: "answer only from peer-reviewed literature, and cite a source on every answer." That one line separates a general chatbot from a clinical tool. And what realizes that line is not a bigger model but the licensing agreements and the curation behind them.

NEJM (since 1990) JAMA Network ×11 Cochrane · NCCN 270+ more journals 35M+ papers indexed Licensed Literature RAG Index Vector search + semantic mapping Access: licensed lit only Physician's clinical query "Drug interaction for X+Y?" Frontier LLM OpenAI / Anthropic Answer + Source Citation "NEJM 2024;391:1234, Fig.3" Verifiable · Provenance-tracked
▲ Pebblous original diagram — OpenEvidence RAG pipeline: every answer is generated from licensed literature and carries a traceable citation
3

Why General LLMs Can't Win

Where general-purpose LLMs break down in medicine is not reasoning but trust — and the trust gap shows up as a measurable error rate. Hallucination in ungrounded medical LLMs runs from 15% to 40% depending on context (IEEE JBHI, 2025) and climbs to 43–67% on complex cases (MedRxiv). Open-source models can exceed 80%. The rate at which they fabricate references is more dramatic still: GPT-3.5 invented 39.6% of its citations and early Bard 91.4%, by some measurements.

The bars below visualize the hallucination range with and without grounding. The same model, grounded in peer-reviewed literature, drops its error rate sharply.

Ungrounded LLM (general cases) 15–40%
Ungrounded LLM (complex cases) 43–67%
Open-source models 80%+
Reduction after RAG + peer-reviewed grounding 40%+ reduction

Hallucination range in medical LLMs with and without grounding. Grounding in peer-reviewed literature via RAG cuts hallucination by 40%+ (MEGA-RAG) and improved accuracy by up to 89% in a urology evaluation (Context-Aware RAG). Sources: IEEE JBHI (2025), MedRxiv, Frontiers in Public Health, NIH/PubMed.

Hallucination is a litigation risk

In medicine, an error rate is not merely a quality metric. It is a question of liability. AI-related medical malpractice claims rose 14% year over year in 2024. When wrong clinical information reaches a patient, a lawsuit can follow. In that environment, source citation is not a feature but part of the accountability structure. A physician needs to be able to say not "the AI told me so" but "this decision was grounded in this peer-reviewed paper." OpenEvidence filled exactly that gap by confining its answers to peer-reviewed literature and attaching citations.

A triple moat forms here. The first is a legal licensing barrier. A latecomer wanting legal access to the same literature has to strike the same agreements over again. The second is clinical trust. Once cited answers enter a physician's daily workflow, switching costs appear. The third is liability cover. Verifiable evidence becomes a line of defense. A general-purpose LLM provides none of the three by construction.

A general-purpose LLM provides none of the three by construction ① Legal licensing barrier Latecomer must re-negotiate same agreements ② Clinical trust · switching cost Habit embedded in daily workflow ③ Liability cover Verifiable citation = legal defense General LLM
▲ Pebblous original diagram — OpenEvidence's three-layer structural moat: barriers a general-purpose LLM cannot provide by construction
4

The Economics of Fencing Data In

For doctors, OpenEvidence is free. And yet the company makes money. Revenue comes from pharmaceutical and medical-device advertising. On the loading screen, while a physician waits for an answer, an ad matched to that clinical context appears — a relevant new drug shown to an oncologist asking about chemotherapy, for instance. That contextual fit pushes the CPM (cost per thousand impressions) past $70 and as high as $1,000+. Against the $5–15 of ordinary social media, that is 5× to 70×.

The scale of the revenue depends on whom you ask. Sacra estimates that revenue grew from about $7.9 million in 2024 to roughly $150 million in 2025 (about +1,803%). The CEO says only "$100 million or more." Estimated gross margin is about 90%, and revenue per user (ARPU) is put at roughly $124. The denominator for this model — the US pharmaceutical digital ad market — runs $20–25 billion a year. With Mount Sinai embedding OpenEvidence into its Epic electronic health record (EHR) starting in March 2026, there is talk of ARPU rising 5–10×.

Free for doctors → ~90% gross margin flywheel OpenEvidence flywheel Free for physicians Physician adoption ~40% of US doctors High-CPM pharma ads $70–$1,000+ CPM ~90% gross margin Expand licensing Deepen curation Clinical trust rises Switching costs grow
▲ Pebblous original diagram — OpenEvidence business model flywheel: free access drives adoption, adoption drives high-CPM pharma revenue

Why NEJM and JAMA chose OpenEvidence

What deepens the moat is the incentive of the data holders. NEJM, JAMA, and Cochrane have no reason to enter the model race themselves. By licensing their archives instead, they occupy the heart of the AI value chain while securing content rent even in the AI era. A two-way lock-in results: OpenEvidence gains trust data it cannot copy, and the publishers gain a stable revenue stream. Over time, neither side has much incentive to break the relationship.

One thing is worth stating plainly, though. There is no official confirmation that these agreements are "exclusive." NEJM, JAMA, and Cochrane describe them as "official AI partnerships." Still, given the multi-year terms and the publishers' incentive structure, it would be hard for a latecomer to secure the same data on the same terms. It cannot be called exclusive, but it reads more accurately as a barrier tantamount to exclusive access.

Free product, pharma advertising, ~90% gross margin, and aligned data-holder incentives. The four interlock into a structure that deepens the moat over time. Models get cheaper and more commoditized every year; licensed sources do not. Whoever fences in the data ends up fencing in the value.

5

Beyond Healthcare

Start with the competitive landscape. The market splits not on model performance but on data access. The table below compares where the three players sit.

Dimension OpenEvidence UpToDate DoxGPT (Doximity)
Price Free (ad-funded) $579/yr subscription Free (platform-embedded)
Data source 300+ licensed peer-reviewed journals 7,600+ specialist authors Mixed in-house / external
User base ~40% of US doctors (company-stated) 2M+ global 85%-of-US-doctors platform
Search trend +13.7% -1.6% New / volatile

Clinical decision support tools compared. Sources: Wolters Kluwer (UpToDate), Doximity filings, PMC traffic analysis (2025–2026). DoxGPT's own head-to-head (DoxGPT 61% vs OpenEvidence 26%) comes from a Doximity-side citation and should not be read as a neutral evaluation.

UpToDate, with its specialist-authored model, is highly trusted but paid, and its search trend is declining. Doximity's DoxGPT rides a platform used by 85% of US physicians and claims superiority in its own evaluation — but that claim cannot be cited without noting its source is Doximity itself (indeed, in 2025 the two companies sued each other over reverse-engineering and disinformation). In the end the axis collapses to one question: who has legal access to verifiable data?

It is also worth making clear that test scores don't reveal the moat. OpenEvidence announced a 100% score on the USMLE (the US medical licensing exam) in August 2025. But on MedXpertQA, which covers more complex clinical scenarios, accuracy was just 34%. That is the size of the gap between a multiple-choice standardized test and real clinical practice. A test score is a marketing signal, not the moat itself — the moat is the licensed sources.

Generalizing "curated, provenance-tracked data = product"

This pattern does not stay penned inside healthcare. The structure of "borrow the model, fence in the data" spreads to any domain where verifiable, proprietary sources are the core of value. Case law and statutes in law, regulatory filings and research in finance, papers and patents in science — all sit in the same place. Three common conditions decide where it applies.

  • Licensable proprietary data — the domain data must exist as an asset one party can hold and license.
  • High cost of being wrong — the price of a wrong answer must be high enough that source verification is mandatory, not optional.
  • No legal access for general LLMs — general-purpose models must not be able to reach that data freely.
"Borrow the model, fence in the data" — any domain that meets the three conditions Licensed proprietary data + RAG + mandatory source citation Law Case law · statutes Legal database access Finance Regulatory filings · research Bloomberg · EDGAR Healthcare ← proven NEJM · JAMA · Cochrane $12B (OpenEvidence) Science / R&D Papers · patents Specialist journal access Regulation Guidelines · codes Official doc licensing Conditions: ① Licensable proprietary data ② High cost of being wrong ③ No free access for general LLMs
▲ Pebblous original diagram — domain map where "borrow the model, fence in the data" applies

The more the model is commoditized, the more verifiable proprietary data is worth. Frontier models grow more powerful and cheaper every year, but the very fact that anyone can use the same model pushes differentiation toward the data. OpenEvidence has simply proven that proposition first, at a $12-billion scale, in healthcare.

The Pebblous View

What OpenEvidence proves is the same proposition Pebblous has long argued. General-purpose LLMs fail in medicine not for lack of reasoning but because their training data has opaque, unverifiable provenance. The evidence is in the numbers: ungrounded models hallucinate at 15–67%, while grounding in peer-reviewed literature via RAG cuts that by 40%+ and raises accuracy by up to 89%. Data quality is output reliability — the clinical proof, running in the opposite direction from "garbage in, garbage out."

For companies and institutions that hold domain data, this case reduces to a single practical question: how do you turn the data you own into a defensible asset for the AI era? Just as NEJM licensed its archive, a holder of proprietary data can occupy the heart of the AI value chain through licensing and curation without ever entering the model race. But licensed data does not become a product on its own. The moat lives in the step of refining and structuring it into clinically usable quality and connecting its provenance.

Editor's Note

The problems Pebblous has worked on — diagnosing and refining data quality (DataClinic) and producing provenance-tracked AI-Ready Data — sit in the same place as the baseline demand this report describes. Building bigger models is the domain of Big Tech, but curation, provenance tracking, and quality assurance of data form a separate, defensible market. Reading Pebblous as the infrastructure layer that helps domain AI like OpenEvidence acquire and verify trustworthy data is not a leap that bends the report's conclusion toward the company — it is the same structure seen from another angle.

R

References

Academic

  • 1.IEEE Journal of Biomedical and Health Informatics. (2025). "Hallucination in Medical Large Language Models: A Review."
  • 2.MedRxiv. (2025, November). "Accuracy and Repeatability of OpenEvidence on Complex Subspecialty Scenarios."
  • 3.npj Digital Medicine. (2025). "A Framework to Assess Clinical Safety and Hallucination Rates of LLMs for Medical Tasks."
  • 4.Nature Communications. (2026). "Hyper-RAG: Reducing Hallucination in Domain-Specific Medical QA."
  • 5.Frontiers in Public Health. (2025). "MEGA-RAG: Multi-Evidence Grounded Augmentation for Medical LLMs."
  • 6.NIH / PubMed. (2025). "Context-Aware Retrieval-Augmented Generation in Urology."
  • 7.PMC. (2025–2026). "Public Interest in an AI-Enabled Clinical Decision Support System."

Policy, Statistics & Industry

Pebblous-Adjacent

※ Company-related figures for adoption, query volume, and revenue are mostly company-stated (BusinessWire, CEO interviews) or third-party estimates (Sacra), with no independent third-party verification. The "exclusivity" of the licensing agreements has not been officially confirmed and is described here as "official AI partnerships."