ISO/IEC 5259 is an international standard series for systematically managing the quality of data used in AI/ML systems. It defines core Quality Measures — including completeness, accuracy, consistency, and timeliness — and provides a framework for assuring AI model performance and reliability starting from the data layer. ISO/IEC 5259 compliance is a key feature supported in Pebblous' DataClinic and Data Greenhouse platforms.
The ISO/IEC 5259 series builds upon earlier standards — ISO/IEC 25012 and ISO/IEC 25024 — which defined data quality models for traditional structured databases from a metadata perspective. However, as AI advanced, these standards proved insufficient for unstructured data (text, images, audio) and the diverse requirements of model training tasks. ISO/IEC 5259 was born to bridge this gap, adding ML-specific quality characteristics such as diversity, representativeness, similarity, and balance — evaluating not just whether data is accurate, but how suitable it is for AI model training.
Pebblous is building deep expertise in data quality, connecting international standards with proprietary technology. DataLens, originally DNN-based, has evolved into a neuro-symbolic engine — now expanding to cover multi-modal datasets and semantically intensive regulatory domains. Combined with Data Imaging, it realizes the Quality Measures required by the standard through automated computation. Pebblous is now applying Agentic AI to fully automate the quality assessment process, from diagnosis to certification — eliminating manual intervention entirely. Most recently, Pebblous published both the theory of applying ISO/IEC 5259-2 to image datasets and independent evaluation results for three real-world datasets: ImageNet, WikiArt, and SpectralWaste.
Quick reference to the Quality Measures defined in ISO/IEC 5259-2. Ideal starting point for defining data quality requirements and diagnosing issues in AI/ML projects.
How to evaluate LLM text data quality using ISO/IEC 5259 standards. Covers methodologies and practical cases for the new paradigm of AI-era data quality assessment.
1:1 technical mapping between ISO/IEC 5259-2 QMs and Pebblous DataClinic. Introduces completeness, similarity, and representativeness measurement through neuro-symbolic DataLens and Data Imaging.
Deep-dive into the quantitative mapping. Systematic analysis of how each QM category maps to DataClinic's automated measurement capabilities.
Image data quality splits into pixel-level and task-level layers. A complete ISO/IEC 5259-2 QM matrix across Type A, B, and C datasets with DataClinic support indicators.
DataClinic diagnoses of ImageNet, WikiArt, and SpectralWaste mapped to ISO/IEC 5259-2 QM codes. Covers Bal-ML, Div-ML, Rep-ML across three datasets with practical methods for unsupported items.
Global certification roadmap including KOLAS accreditation strategy and patent-based technology moat. Covers how Pebblous aims to become the first accredited AI data quality certification body in Korea.
Hands-on guide to implementing ISO/IEC 25024 data quality metrics in SQL. Learn the structured-data quality standard that forms the foundation of ISO/IEC 5259 through working code.
How to extract concepts, relations, and constraints from ISO/IEC 5259-2 and formalize them as an ontology. A step-by-step walkthrough of transforming a standard into a machine-readable knowledge structure.
A practicum on structuring ISO/IEC 5259-2 standard documents as an ontology and navigating them via the CURK explorer. A new approach to navigating dense standard documents as a concept graph.
Image data quality splits into pixel-level and task-level layers. A complete framework of 23 QMs from ISO/IEC 5259-2, organized by type A/B/C with practical guidance for computer vision applications.
DataClinic diagnoses of ImageNet, WikiArt, and SpectralWaste mapped to ISO/IEC 5259-2 QM codes. A case-driven guide to bridging theory and real dataset evaluation.
An independent ISO/IEC 5259-2:2024 evaluation of ImageNet's 1.4 million images. See what score the world's most influential benchmark dataset earns when held to international standards.
An independent ISO/IEC 5259-2:2024 evaluation of WikiArt's 81,444 images. Reveals the structural quality challenges that emerge when an art dataset is used for AI training.
An independent evaluation of the SpectralWaste recycling waste image dataset (2,794 images, 6 classes) against ISO/IEC 5259-2. Shows what it looks like when a real industrial dataset passes through the standard.