Executive Summary
On September 11, 2026, Korea's amended Personal Information Protection Act (PIPA) takes effect. The headline is tougher enforcement — the cap on administrative fines rises from a share of related revenue to 10% of total revenue. But the change that matters for Pebblous readers sits elsewhere. Liability for a data breach has left the security manager's desk and become an agenda item for the CEO and the board, and the law has begun, for the first time, to spell out a concrete procedure for using data lawfully in the age of AI.
The center of gravity of the amendment is not bigger punishment. Public infrastructure for safely analyzing pseudonymized data, a pseudonymization exemption for AI training, guidance that splits generative AI into service-based, off-the-shelf, and self-developed tiers to manage risk step by step — the regulation is paving a path that says not "don't use it" but "use it this way and it's lawful." In return, companies inherit a duty to prove the origin and processing history of their data.
This report follows how the amendment, on the eve of its September enactment, lifts data governance from a CPO operational task to a CEO-level agenda — and how that shift changes the very definition of "data quality." To put the conclusion up front: AI-Ready data is no longer a matter of quality and structure alone. It now means data that also carries a legal basis for use and a record of traceability to prove it.
3% → 10%
Penalty cap
Of total revenue. Exceeds GDPR's 4% and the EU AI Act's 7% — the highest globally
$431M
Record fine (KRW 624.6B)
Coupang breach — 37.55M people affected, a preview of tougher enforcement
10 million
Punitive-fine threshold
Scale of harm, repeat within 3 years, or non-compliance as aggravating factors
70%
Orgs with ≤1 privacy staff
The duty rose to the board, but the hands to carry it out are still empty
Where the Liability Moved
When a data breach happened, the first person called in used to be the head of security or the Chief Privacy Officer (CPO). The weight of responsibility stayed on the operational line, and executives were closer to reading the apology statement. The amended PIPA taking effect in September changes that arrangement. It lifts ultimate accountability for safeguarding obligations up to the level of the organization's CEO and management, so that a data incident is treated not as "an employee's mistake" but as "a failure of governance" (Boannews).
The signal of the shift is legible in how the penalties are designed. Once a fine is calculated against "total revenue" rather than the revenue directly tied to the violation, that number can no longer be absorbed into one department's budget line. An amount that lands directly on the company's bottom line naturally becomes a matter for the board. This is the moment data governance is reclassified from "a cost-center risk" to "a variable in enterprise value."
The point is not the size of the punishment but the fact that the coordinates of accountability have moved. Once how data is collected, stored, and used becomes a question the CEO has to answer, data governance is no longer a year-end checklist item but a standing management agenda.
This elevation also tracks the global current. It runs in the same direction as GDPR requiring the independence and direct reporting line of the Data Protection Officer (DPO), and the EU AI Act naming governance of high-risk AI systems as a management responsibility. Korea's amendment adds to this the strongest financial signal of all — "10% of total revenue" — making it impossible for executives to look away from a data problem.
Why 3% Jumped to 10%
There is a backstory to why the penalty cap tripled. Under the old regime, the fines actually levied, divided by the number of personal-data records breached, came to roughly $0.70 (KRW 1,019) per record. In a structure where a person's name, contact, and payment history are priced as a $0.70 risk, accepting the fine after the fact becomes a more rational choice than investing before it. Yet over the same period, the volume of personal data leaked surged nearly thirteenfold. With the price of a single risk pinned at $0.70 while the total volume of that risk exploded, the diagnosis that the penalties had lost their deterrent force becomes the quantitative basis for the "why 10%" question.
The preview of tougher enforcement has already arrived. The KRW 624.6 billion (~$431M) fine the Personal Information Protection Commission (PIPC) imposed on Coupang was the largest on record, in a case that affected 37.55 million people. The Commission attributed the cause not to some elaborate hack but to "a lack of basic safety-management systems." Under the strengthened law, a penalty of this magnitude becomes not an exception but a baseline.
That said, the 10% does not apply automatically. Punitive-level fines are designed to trigger when certain aggravating factors overlap. The representative cases: harm exceeding 10 million people, a violation repeated within three years, or failure to comply with a corrective order. Conversely, a company that can prove it invested in safeguards in advance can have its fine reduced by up to 40%. The penalty design itself is built to act not as "punishment" but as "an incentive that draws investment."
Global comparison — Korea sits highest
By international standards, the amendment's cap ranks among the most aggressive. The table below compares the penalty caps of major regimes on a total-revenue basis. Korea's 10% surpasses both the EU AI Act's 7% and GDPR's 4%.
| Regulation | Penalty cap (% of total revenue) | Nature of trigger |
|---|---|---|
| Korea, amended PIPA | 10% | Aggravated by scale of harm, repetition, non-compliance |
| EU AI Act | 7% | Violation of prohibited AI practices |
| GDPR | 4% | Serious breach of processing principles |
Look only at the cap, and Korea is the most aggressive. Read the triggers alongside it, and the picture changes. All three regimes designed the path to the "worst case" narrowly, and all recognize prior investment and cooperation as grounds for reduction. The regulatory message is consistent: the scariest number can be avoided, and the cost of avoiding it is, precisely, investment in data governance.
Training AI Lawfully on Pseudonymized Data
The penalties took the headline, but for data practitioners the more consequential provision is the special exemption for processing pseudonymized data. Article 28-2 of PIPA permits the processing of pseudonymized data without the data subject's consent for the purposes of statistical compilation, scientific research, and archiving in the public interest. And the interpretation taking hold is that "AI model training" can secure its legal basis through the "scientific research" route among these. In effect, the law has explicitly opened a lawful data channel for AI training.
That channel, however, has a clear boundary. The pseudonymization exemption applies to the "research and analysis" stage; it does not carry over to a stage where the output is put into live service to re-identify a specific individual or treat them differently. Training may be lawful, but operation requires its own separate legal basis. A well-known case that ignored this boundary was the privacy controversy around a chatbot service, and that lesson is reflected in today's stage-by-stage approach.
Even with the same pseudonymized data, "analyzing it for research" and "operating it as a service" are legally distinct acts. If you do not separate and document the legal basis for the training stage and the service stage when designing an AI pipeline, a project that began lawfully can flip to unlawful at the moment of deployment.
Here the data-quality problem follows immediately. If pseudonymization is insufficient and re-identification is possible, the exemption's protection disappears. The data itself must be able to prove which fields were pseudonymized or anonymized and how, and how high the re-identification risk is when combined with other sources. "Data you can use lawfully" and "data whose quality is verified" are increasingly becoming the same phrase.
Which LLM You Use Decides the Risk
Ahead of the law's enactment, the PIPC used its "Guide to Personal Information Processing in Generative AI" to lay out how the weight of responsibility shifts depending on how a company adopts an LLM. Even within the same "generative AI adoption," pulling in an external API as-is and training your own model from scratch carry entirely different risk profiles. The guide divides this into three adoption types.
| Adoption type | Description | Core risk |
|---|---|---|
| Service-based (API calls) | Using a third party's generative-AI API as-is | Risk of personal data sent out and trained on via input prompts; managing outsourcing and cross-border transfers |
| Off-the-shelf model | Taking an open or commercial model and fine-tuning / combining RAG on your own data | Legal basis for the fine-tuning data; risk of personal data resurfacing in model outputs |
| Self-developed | Building a model in-house from pre-training onward | Duty to prove the collection lawfulness, pseudonymization, and provenance of the entire training dataset |
The guide's structure goes beyond a simple taxonomy. It crosses each type with the lifecycle stages of data collection, training, service provision, and disposal, and assigns different obligations within the same stage depending on whether you are a "model developer" or a "model user." As a result, the question a company has to answer sharpens from "do we use generative AI?" to "in which type, at which stage, and in which role do we use it?"
In practice, this becomes a decision checklist. If you use the service-based type, you have to check input filtering and outsourcing contracts so personal data does not flow into prompts; if you go self-developed, you have to be able to reconstruct the origin and pseudonymization history of every piece of data used in training. The moment you choose your adoption method, the size of your compliance burden is already determined.
The Infrastructure the State Is Paving
The other axis of the amendment is not regulation but support. To safely combine and analyze pseudonymized data, you need a controlled environment that processes the data without exporting it. The PIPC is investing in public analysis infrastructure for exactly this — about $2.0M (KRW 2.9B) for a cloud-based environment for using pseudonymized data (the "Innovation Zone"), and about $1.4M (KRW 2.0B) for technical-analysis functions that support pseudonymized-data combination. The state has begun to share part of the "cost of using data safely."
The whole picture is bigger. The PIPC's 2026 budget is about $50M (KRW 72.9B), up 9.1% from the prior year, and of that, roughly $9.2M (KRW 13.3B) is allocated to AI-related R&D. Detailed programs for safe-AI-use technology, alignment with global standards, training of specialists, and trustworthy AI are folded in. It is a structure where one hand tightening regulation and another hand helping lawful use move together.
The design philosophy of the amendment is closer to "we'll build you a safe road to use data, so travel that road" than to "don't use it." The public infrastructure is the paving on that road. Once a standard path for lawful data use exists, the gap between companies ready to put their data on that path and those who are not becomes, itself, a gap in competitiveness.
Yet the state of readiness still lags the pace of regulation. By one survey, roughly 70% of organizations have one or fewer dedicated privacy staff. The duty rose to the board, but the hands to actually carry it out are still empty. That gap is the most direct engine for the growth of the data-governance market over the coming years. Forrester projects that the AI-governance solutions market will grow more than 30% a year to reach $15.8 billion by 2030.
Traceability Is Design, Not Forensics
When regulation demands "data trained lawfully," a question follows naturally: how do you prove it after the fact? The intuitive answer is to analyze the finished model and trace back whether a specific piece of data was used. Membership Inference Attacks (MIA) are exactly that kind of technique, statistically estimating whether a given record was included in a model's training.
But the academic verdict is sober. Membership inference cannot establish, to a standard of legal proof, that a specific piece of data was used in training. Its accuracy swings widely with data distribution and model architecture, and it carries both false positives and false negatives. In other words, opening up a model to prove lawfulness after the fact does not amount to reliable forensics. This is not a mere technical limitation but a fact that determines the direction of governance design.
If you cannot pry open a model after the fact to prove lawfulness, then lawfulness has to be designed at the entrance where data flows in. "Traceability by design" — attaching origin, consent, pseudonymization, and scope of use to the data as metadata — is the only reliable path.
The implication of this shift is large. View traceability as "the ability to investigate after an incident," and logs and audits become central. View it as "the ability to prove the data's standing every time you use it," and provenance becomes a structure that must be designed at the very top of the data pipeline. The burden of proof the amendment demands points to the latter. Lawfulness becomes not a property that gets inspected but a property the data must carry from the moment it is born.
'Lawfulness' Enters Data Quality
Everything so far converges on one sentence: the amendment wedges "lawfulness" into the definition of "data quality." For a long time, data quality has been discussed along five axes — accuracy, completeness, consistency, timeliness, and structural integrity. After September, for data used in AI in Korea, a sixth axis is added. Is there a legal basis to use this data, and can you prove it?
This sixth axis differs in nature from the first five. Accuracy and consistency are properties internal to the data, but lawfulness is a property about the data's relationship with the world — from whom, under what consent, and within what scope it came. That is why lawfulness is hard to measure after the fact and cannot be reconstructed if it was not recorded at the entrance. This is why traceability becomes a precondition of data quality.
In practice, this forces a rewrite of what "AI-Ready" means. Even data cleaned and structured nicely for a model is not AI-Ready in post-September Korea if it cannot prove its origin, consent, and pseudonymization history. Data that is well-ordered but cannot prove its standing is not an asset but a latent liability. When regulation changes the definition of data quality, the priorities of companies that handle data change with it.
What to check before September
The time remaining before enactment overlaps with the autumn strategy-planning cycle. The questions a data leader should be checking now are these.
- For the data we use in AI training, can we produce a document showing the legal basis (consent, pseudonymization exemption, contract) for each piece?
- Are the legal bases for the training stage and the service-operation stage recorded separately?
- Have we defined the scope of responsibility by generative-AI adoption type (service-based, off-the-shelf, self-developed)?
- Are a data's origin, consent, and scope of use attached as metadata at the entrance, or do we rely on after-the-fact estimation?
- When fines on a total-revenue basis become reality, is that risk being reported as a board agenda item?
If you can answer these five questions with confidence, the September change is an opportunity, not a threat. The moment a standard path for lawful data use appears, the company that paved its stretch of that road first is the one that can run on it fastest.
References
Policy & Press
- 1.Boannews. "Amended PIPA takes effect in September — data-incident liability and tougher fines." boannews.com
- 2.Lawtimes. "The amended PIPA and AI data use — the pseudonymization exemption and corporate response." lawtimes.co.kr
- 3.Byline Network. "Personal data in the generative-AI era — risk and governance by adoption type." byline.network
- 4.Personal Information Protection Commission (PIPC). "Guide to Personal Information Processing in Generative AI" (2025). Three-way taxonomy (service-based / off-the-shelf / self-developed) and lifecycle-stage obligations.
- 5.PIPA Article 28-2 (special provisions on processing pseudonymized data) and Article 64-2 (calculation of administrative fines).
Academic
- 6.Shokri, R., Stronati, M., Song, C., & Shmatikov, V. (2017). "Membership Inference Attacks Against Machine Learning Models." IEEE S&P 2017. arXiv:1610.05820
- 7.Carlini, N., Chien, S., Nasr, M., Song, S., Terzis, A., & Tramèr, F. (2022). "Membership Inference Attacks From First Principles." IEEE S&P 2022. arXiv:2112.03570
- 8.Longpre, S., Mahari, R., Chen, A., et al. (2023). "The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI." NeurIPS 2024 D&B Track. arXiv:2310.16787
- 9.Dwork, C., & Roth, A. (2014). "The Algorithmic Foundations of Differential Privacy." Foundations and Trends in Theoretical Computer Science, 9(3–4).
Statistics & Comparative Regulation
- 10.PIPC. Resolution on the Coupang personal-data breach fine (KRW 624.6B / ~$431M, 37.55M people affected).
- 11.European Union. Regulation (EU) 2024/1689 (AI Act) — penalty cap of 7% of total revenue.
- 12.European Union. General Data Protection Regulation (GDPR), Art. 83 — penalty cap of 4% of total revenue.
- 13.Forrester Research. "AI Governance Solutions Market Forecast, 2024–2030." Forrester Research (2024) — $15.8B market size by 2030, CAGR 30%+.