ISO/IEC 5259 is an international standard series for systematically managing the quality of data used in AI/ML systems. It defines core Quality Measures — including completeness, accuracy, consistency, and timeliness — and provides a framework for assuring AI model performance and reliability starting from the data layer. ISO/IEC 5259 compliance is a key feature supported in Pebblous' DataClinic and Data Greenhouse platforms.

The ISO/IEC 5259 series builds upon earlier standards — ISO/IEC 25012 and ISO/IEC 25024 — which defined data quality models for traditional structured databases from a metadata perspective. However, as AI advanced, these standards proved insufficient for unstructured data (text, images, audio) and the diverse requirements of model training tasks. ISO/IEC 5259 was born to bridge this gap, adding ML-specific quality characteristics such as diversity, representativeness, similarity, and balance — evaluating not just whether data is accurate, but how suitable it is for AI model training.

Pebblous is building deep expertise in data quality, connecting international standards with proprietary technology. DataLens, originally DNN-based, has evolved into a neuro-symbolic engine — now expanding to cover multi-modal datasets and semantically intensive regulatory domains. Combined with Data Imaging, it realizes the Quality Measures required by the standard through automated computation. Pebblous is now applying Agentic AI to fully automate the quality assessment process, from diagnosis to certification — eliminating manual intervention entirely. Most recently, Pebblous published both the theory of applying ISO/IEC 5259-2 to image datasets and independent evaluation results for three real-world datasets: ImageNet, WikiArt, and SpectralWaste.

Series Guide

ISO/IEC 5259-2: Data Quality Measures (QM) Cheat Sheet

Quick reference to the Quality Measures defined in ISO/IEC 5259-2. Ideal starting point for defining data quality requirements and diagnosing issues in AI/ML projects.

LLM Text Data Quality Assessment Guide

How to evaluate LLM text data quality using ISO/IEC 5259 standards. Covers methodologies and practical cases for the new paradigm of AI-era data quality assessment.

DataClinic × ISO/IEC 5259-2 Mapping Analysis (Summary)

1:1 technical mapping between ISO/IEC 5259-2 QMs and Pebblous DataClinic. Introduces completeness, similarity, and representativeness measurement through neuro-symbolic DataLens and Data Imaging.

DataClinic × ISO/IEC 5259-2 Mapping Analysis (Detailed)

Deep-dive into the quantitative mapping. Systematic analysis of how each QM category maps to DataClinic's automated measurement capabilities.

Image Dataset Quality Has Two Layers — ISO/IEC 5259 Applied to Images

Image data quality splits into pixel-level and task-level layers. A complete ISO/IEC 5259-2 QM matrix across Type A, B, and C datasets with DataClinic support indicators.

When Diagnostic Data Meets ISO 5259 — Three DataClinic Cases for Image Quality

DataClinic diagnoses of ImageNet, WikiArt, and SpectralWaste mapped to ISO/IEC 5259-2 QM codes. Covers Bal-ML, Div-ML, Rep-ML across three datasets with practical methods for unsupported items.

Data Quality Standardization & Global Certification Roadmap

Global certification roadmap including KOLAS accreditation strategy and patent-based technology moat. Covers how Pebblous aims to become the first accredited AI data quality certification body in Korea.

ISO/IEC 25024 Data Quality Measurement Practicum

Hands-on guide to implementing ISO/IEC 25024 data quality metrics in SQL. Learn the structured-data quality standard that forms the foundation of ISO/IEC 5259 through working code.

Extracting an Ontology from an ISO Standard: ISO/IEC 5259-2 Case Study

How to extract concepts, relations, and constraints from ISO/IEC 5259-2 and formalize them as an ontology. A step-by-step walkthrough of transforming a standard into a machine-readable knowledge structure.

CURK: Ontology-Based PDF Navigator

A practicum on structuring ISO/IEC 5259-2 standard documents as an ontology and navigating them via the CURK explorer. A new approach to navigating dense standard documents as a concept graph.

Image Dataset Quality Has Two Layers — ISO/IEC 5259 Applied to Images

Image data quality splits into pixel-level and task-level layers. A complete framework of 23 QMs from ISO/IEC 5259-2, organized by type A/B/C with practical guidance for computer vision applications.

When Diagnostic Data Meets ISO 5259 — Three DataClinic Cases for Image Quality

DataClinic diagnoses of ImageNet, WikiArt, and SpectralWaste mapped to ISO/IEC 5259-2 QM codes. A case-driven guide to bridging theory and real dataset evaluation.

Grading AI's Textbook with ISO 5259

An independent ISO/IEC 5259-2:2024 evaluation of ImageNet's 1.4 million images. See what score the world's most influential benchmark dataset earns when held to international standards.

What Can We See When Art Becomes Data

An independent ISO/IEC 5259-2:2024 evaluation of WikiArt's 81,444 images. Reveals the structural quality challenges that emerge when an art dataset is used for AI training.

When ISO 5259 Diagnoses a Recycling Dataset

An independent evaluation of the SpectralWaste recycling waste image dataset (2,794 images, 6 classes) against ISO/IEC 5259-2. Shows what it looks like when a real industrial dataset passes through the standard.