Executive Summary

Data rights rest on a single, simple premise: if your data went into a model, you can find the trace it left behind. Membership inference attacks (MIA) hold that premise up. The idea is that a model grows unusually confident about text it has seen during training, so measuring that confidence tells you whether a given passage was used. But rewrite the original just before training — keeping the meaning, changing only the style and structure — and the whole premise falls apart.

The model remembers only the rewrite and stays unmoved by the original query. In one experiment, Llama-2 was trained solely on a lyrical-essay rewrite of Wikipedia passages, and detection accuracy (AUC) against the originals dropped to 0.54 — indistinguishable from a coin flip. This article calls that phenomenon "data laundering" and walks through SDR (Synthesis Data Reversion), a reverse-tracing method that pulls the lost signal back up, lifting standard detection from 62.7% to 75.5%.

The story does not end with one technique. When SDR restores the signal, an attacker simply moves the rewrite beyond that detection boundary, and a new method gives chase again. Data sovereignty turns out to be less about a one-time mark you stamp on your data and more about the ability to follow how that data was transformed and where it flowed.

Key Numbers

Sources: Combating Data Laundering in LLM Training (arXiv 2604.01904), DebugLM (arXiv 2603.17884)

Four numbers capture the tug-of-war between laundering and reverse-tracing: how far laundering collapses the detection signal, how much reverse-tracing brings it back, whether it still holds on the latest models, and where the signal is lost again.

0.54

Standard detection AUC after laundering

Train Llama-2 only on a lyrical rewrite, and detection of the original falls to near-random-guess (0.5) levels

75.5%

Detection AUC after SDR recovery

Reverse-tracing lifts a Loss AUC that standard methods left stuck at 62.7% on lyrical Wikipedia laundering

0.81

Recovery AUC on the latest model

Loss AUC of 0.65 → 0.81 on data laundered with DeepSeek-v3 — it works even on a strong frontier model

23

Style registers SDR sweeps

Lyrical, news, legal, interview, and more. Rewrites outside this set (e.g., pseudo-translation via low-resource languages) stay in the blind spot

1

Rewriting the Sentences Is Data Laundering

Data laundering means feeding copyrighted data into a model while changing only the surface — keeping the meaning intact — to dodge detection. Just as money laundering blurs the origin of cash, data laundering blurs the origin of text. The core tool is nothing elaborate. You simply ask an auxiliary LLM to "rewrite this passage in a different style."

Picture a single encyclopedia entry. You could render the same content as a lyrical essay, recast it in stiff legal prose, or unfold it as an interview dialogue. The sentences look nothing alike, yet the facts inside are identical. This very flexibility of natural language becomes the vulnerability, because the same information can be reshaped almost without limit into news reports, academic abstracts, e-commerce product blurbs, or social-media posts.

The paper at the center of this issue, Combating Data Laundering in LLM Training, organizes these rewrite forms into 23 language registers — distinct writing genres such as lyrical prose, news, legal documents, and interviews. Whoever wants to launder data just picks one of these and swaps the original out wholesale. What enters the model is the rewrite, not the original, and the original never appears directly anywhere.

Data Laundering Flow: Style Rewriting Defeats Detection Original Text copyrighted Aux LLM style rewrite Rewrite 1 of 23 registers training data Target LLM learns rewrite only Detection attempt: rights holder has only the original Original → MIA probe original never trained on — no signal Target LLM no familiarity AUC 0.54 ≈ random guess (0.5)
▲ How data laundering neutralizes membership inference detection | Pebblous original diagram (arXiv:2604.01904 concept reinterpretation)

The core: Data laundering steals the content but changes the form. Because the meaning is preserved, the model gets just as smart; because the surface is altered, any attempt to trace the original swings at empty air.

2

Why the Detection Signal Vanishes

The standard tool for checking whether your data went into a model is the membership inference attack (MIA). Its logic is intuitive. A model grows more familiar with the sentences it saw during training, and it shows higher confidence — lower loss — on them. So when you feed a suspect original to the model and the confidence is unusually high, you rule that "the model has seen this data."

Laundering severs exactly this link. What the model trained on is the rewrite, not the original. So even when you feed it the original, the model finds nothing particularly familiar about it — its response is no different from one to a sentence never used in training. The rights holder owns only the original, yet the model carries no direct trace of that original inside it.

The numbers are brutal. Train Llama-2-7B only on a lyrical rewrite of Wikipedia text, then try to detect it using the original Wikipedia, and the AUC sits between 0.54 and 0.60. An AUC of 0.5 is a coin flip — a random guess carrying no information. Standard MIA has effectively stopped working. And the whole thing happens quietly: the training data is never published and the rewriting is opaque, so the rights holder can hardly even know their data was laundered.

Detection Accuracy Comparison (Loss AUC) 1.0 0.75 0.5 random baseline 0.54 Standard MIA (after laundering) 75.5% SDR Recovery (lyrical Wikipedia) 0.81 SDR Recovery (DeepSeek-v3)
▲ Detection accuracy (AUC) before and after laundering — SDR restores ~20 percentage points | Pebblous original diagram (arXiv:2604.01904 figures reinterpreted)

Why it matters: Detection stands on the assumption that "the model is more familiar with the original." Laundering shifts that familiarity over to the rewrite, so a question asked with the original gets no signal back at all.

3

SDR: Tracing the Lost Signal Back

The idea behind SDR (Synthesis Data Reversion) is a simple inversion. If the rights holder has only the original, then enlist an auxiliary LLM to synthesize, in reverse, "the rewrite the model most likely actually saw." Instead of asking with the original as-is, you rebuild it into a rewritten form the model would find familiar and feed that in — and the lost difference in confidence comes back to life.

3.1A Two-Stage Pipeline

The first stage is target identification. You rewrite the original into each of the 23 language registers, feed the opening into the model, and measure the confidence the model shows as it continues. You then narrow down to the top five registers that draw the strongest response, tightening the pool of style candidates most likely used in the laundering. The second stage is fine-grained inference. Within those narrowed registers, an auxiliary model produces rewrites, and the process of extracting common patterns from the target model's responses is repeated about ten times — refining the rewrite to sit closer to the original training data until detection performance stops improving.

SDR Two-Stage Pipeline: Recovering the Lost Detection Signal Stage 1: Target Identification ① Rewrite original into each of 23 registers ② Feed each rewrite prefix → measure model loss ③ Select top 5 registers with strongest model response (style candidates) 23 registers → top 5 candidates Stage 2: Fine-Grained Inference ① Generate rewrites within top 5 registers ② Extract common patterns from model response ③ Repeat ~10× until detection converges (refine rewrite toward original training data) Detection AUC: 62.7% → 75.5% recovered
▲ How SDR (Synthesis Data Reversion) works across two stages | Pebblous original diagram (arXiv:2604.01904 Fig. 2 reinterpreted)

3.2The Numbers That Came Back

The results are clear. On lyrical Wikipedia laundering, SDR lifted a Loss AUC that standard methods left stuck at 62.7% up to 75.5%, and on the Min-K metric it climbed from 64.2% to 73.6%. Under laundering that mixed several styles (Mixed Prompt), Recall AUC recovered from 0.559 to 0.757. Even when the data was laundered with a strong frontier model like DeepSeek-v3, Loss AUC rose from 0.65 to 0.81. Conversely, on clean data that was never laundered, the AUC stayed near 0.5, so false accusations of "this data was trained on" were rare.

If SDR is detection after the fact, a complementary approach intervenes at training time instead. DebugLM teaches a model the provenance tags of its data while training, so that later it can trace at runtime which data a given response came from. When problem data is found, it allows targeted correction without full retraining. The catch is that you have to design the training that way from the start, so it does not apply to models already deployed. Against models already out in the world, detection of the SDR variety — after the fact — is what you are left with.

In one line: SDR restores roughly 20 percentage points of the lost signal by feeding the original back as a rewrite. It is evidence that laundering did not end detection — and, at the same time, a boundary line showing how far that recovery reaches.

4

The Arms Race and the State of Data Sovereignty

SDR's limit sits in the same place as its strength. Because it sweeps the 23 registers to narrow down the rewrite style, detection wobbles the moment laundering steps outside those 23 categories. A rewrite that has been run through one pseudo-translation via a low-resource language, for instance, is something this taxonomy struggles to catch. The more decisive problem is the one the paper itself concedes: once an attacker learns how SDR detects, they can simply move to a rewrite strategy beyond that boundary.

So the real shape of this fight is not a single match but an arms race. Laundering erases the signal, SDR restores it; SDR draws a boundary, the attacker steps outside it; and a new method gives chase again. Detection and evasion keep pushing each other, endlessly shifting ground. It is not a structure in which either side wins for good.

The Detection–Evasion Arms Race Cycle Data Laundering style rewrite → LLM training MIA probe on original MIA Fails AUC 0.54 — signal gone SDR responds SDR Reverse-Traces AUC restored to 75.5% attacker adapts → steps outside boundary New Evasion Strategy rewrite outside 23 registers launder again
▲ The endless cycle of detection and evasion pushing each other | Pebblous original diagram (paper §5 limitations discussion reinterpreted)

Regulation is moving in a direction that presses on this trend, too. The EU AI Act began requiring disclosure of training-data sources and composition as its general-purpose AI (GPAI) provisions took effect in August 2025, and in the United States copyright suits targeting Anthropic, Meta, and OpenAI are in full swing. Even as courts lean toward "training is fair use," it is unlikely that covering illegally collected data with synthetic rewrites will also be excused. Data of unclear origin is steadily becoming a risk asset.

The signal for anyone handling data is plain here: sovereignty is not secured by a single mark saying "I own this data." The real asset is whether you can follow, all the way through, which model your data flowed into and in what transformed shape. Clean proof of origin is only a starting point; the actual substance of data sovereignty is the ability to trace the trajectory of the transformations that follow.

To close: Data sovereignty is not a stamp you press once but the power to follow a transformation all the way to the end. SDR pushed that tracing one step forward — but the next step has yet another rewrite waiting.

R

References

Academic Papers