SpectralWaste ISO/IEC 5259-2 Quality Evaluation Report

• • Reading time: ~15 min • 한국어

🔬

Evaluation Methodology

This report re-interprets DataClinic's three-level diagnostic results (Level I / II / III) through the ISO/IEC 5259-2:2024 Quality Measure (QM) framework as an independent evaluation. The metrics, charts, and outlier samples measured by DataClinic are mapped to each ISO QM's formal definition, with Pass / Fail / Caution verdicts rendered independently.

DataClinic L1 Diagnostics → DataClinic L2/L3 Diagnostics → ISO 5259-2 QM Interpretation

Summary: We independently evaluated the SpectralWaste recycling waste image dataset (2,794 images, 6 classes) against ISO/IEC 5259-2:2024 Quality Measures (QMs). DataClinic's three-level diagnostic metrics and charts were mapped to ISO QM definitions. Of the 14 QMs assessed, 3 passed, 5 failed, and 3 received caution flags. The core issues are severe class imbalance (19.6:1 max ratio) and a lack of representativeness and diversity caused by a single capture environment. DataClinic's "Bulk-up" recommendation aligns precisely with the ISO Bal-ML-1 and Eft-ML-1 Fail verdicts.

3 / 10

Measured QMs Passed

Failed Items

Caution Items

Unmeasured (Roadmap)

1 Dataset Overview

Basic Information

Dataset	SpectralWaste
Source	Kaggle
Diagnosed Images	1,709 (out of 2,794 total)
Classes	6
Image Size	276 x 256 px (RGB)
DataClinic Score	68 / 100 (Moderate)

Class Distribution (L1 Diagnosis)

Class	Samples	Ratio
video_tape	646	37.8%
basket	384	22.5%
film	248	14.5%
cardboard	199	11.6%
bag	199	11.6%
filament	33	1.9%

Max/Min class ratio: 19.6 : 1 (video_tape vs filament)

Representative image collage from the SpectralWaste dataset -- six types of recycling waste on a conveyor belt

SpectralWaste is a recycling waste dataset collected by synchronizing RGB and hyperspectral imaging on a prototype conveyor belt. Each image includes a composite bar chart summarizing the spectral signature of each object. While the dataset was designed for training automated recycling classification models, class imbalance and a homogeneous capture environment may limit model performance.

2 ISO/IEC 5259-2 Evaluation Framework

This report independently applies the Quality Measures (QMs) defined in ISO/IEC 5259-2:2024 to the SpectralWaste image dataset. DataClinic's three-level diagnostic outputs are mapped to ISO QM definitions and independently interpreted — connecting what DataClinic measured with how ISO evaluates it.

DataClinic Level	What Is Measured	Mapped ISO 5259-2 QMs
Level I	Class counts, missing values, pixel statistics, mean images	Com-ML-1/3/5, Bal-ML-1, Eft-ML-1
Level II	General-purpose embedding (1280-dim) density, outliers, similarity	Sim-ML-1/2, Rep-ML-1/3, Div-ML-1, Con-ML-2, Acc-ML-7
Level III	Domain-specific lens (32-dim) density and cluster analysis	Rep-ML-1, Div-ML-1/2, Bal-ML-2

Intrinsic DQCs (3)

Accuracy, Completeness, Consistency
DataClinic Level I

AI/ML DQCs (9)

Balance, Diversity, Representativeness, Similarity, Relevance, Effectiveness, Auditability, Identifiability, Timeliness
DataClinic Level II/III

Verdict Criteria

Pass Meets criteria
Fail Below threshold
Caution Needs further review
-- N/A Not measured

3 Intrinsic Quality Assessment

QM ID	Item	ISO Definition	Verdict
Com-ML-1	Value Completeness	Proportion of data items without null values	Pass
Com-ML-3	Feature Completeness	Proportion of feature-related items without null values	Pass
Com-ML-5	Label Completeness	Proportion of samples with complete labels	Pass
Con-ML-2	Label Consistency	Proportion of similar samples with consistent labels	Caution
Acc-ML-7	Label Accuracy	Estimated mislabel rate via outlier detection	Caution

Com-ML-1/3/5 -- Completeness Pass Rationale

DataClinic Level I diagnosis confirmed zero missing values across the dataset. All 1,709 images have three RGB channels intact, and labels for all six classes are correctly assigned. This satisfies ISO 5259-2 completeness criteria (value, feature, and label).

bag

basket

cardboard

filament

film

video_tape

Mean images per class -- labels are correctly assigned and mean images render normally for all six classes

Con-ML-2 / Acc-ML-7 -- Caution Rationale

Con-ML-2 (Label Consistency): ISO 5259-2 requires that similar instances in embedding space share the same label. The Level II low-density distribution reveals multi-modal clusters with ambiguous class boundaries in certain regions. Potential label cross-contamination between similar samples cannot be ruled out and warrants further review.

Acc-ML-7 (Label Accuracy): Twenty low-density outliers were identified at both Level II and Level III. Among these, the low-density samples in the filament and cardboard classes may stem from the peculiarities of composite spectral bar chart images, but the possibility of labeling errors should also be investigated.

filament (low density)

cardboard (low density)

video_tape (high density)

L2 outlier samples -- low-density (outliers) concentrate in filament and cardboard; high-density (typical) in video_tape

4 Balance Assessment -- Bal-ML

QM ID	Item	ISO Definition	Measurement	Verdict
Bal-ML-1	Class Balance	Degree of balance in class-wise sample counts	Std. dev. 242.7, max ratio 19.6:1	Fail
Bal-ML-2	Feature Balance	Balance of feature distributions within the dataset	Color/size skew (confirmed at L3)	Fail

Bal-ML-1 -- Severe Class Imbalance

ISO 5259-2 Bal-ML-1 measures the degree of balance across class-wise sample counts. A max/min class ratio exceeding 10:1 is generally considered to cause severe model bias toward minority classes. SpectralWaste's video_tape (646 images) to filament (33 images) ratio stands at 19.6:1, qualifying as severe imbalance under ISO criteria. The filament class with only 33 images falls well below the minimum threshold commonly required for deep learning (typically 100+ images). Training under these conditions is highly likely to cause the model to misclassify filament as video_tape or another majority class.

L2 density box chart -- comparing density distribution spread across classes; video_tape shows the widest range

L3 domain-specific lens box chart -- inter-class density variance is even more pronounced than at L2

Bal-ML-2 -- Feature Imbalance

ISO 5259-2 Bal-ML-2 measures whether intrinsic features such as color, size, and shape are evenly distributed across the dataset. Level III analysis (domain-specific 32-dimensional lens) repeatedly revealed "identical waste colors and small-size features." This reflects how the single conveyor belt environment has homogenized lighting, background, and viewing angle characteristics. In real-world industrial settings, a variety of lighting conditions, backgrounds, and waste states exist, meaning this feature skew could create a significant domain gap.

bag

filament

film

video_tape

L3 per-class density plots -- different distribution shapes and positions across classes indicate feature imbalance

5 Similarity Assessment -- Sim-ML

QM ID	Item	ISO Definition	Measurement	Verdict
Sim-ML-1	Duplicate Instance Ratio	Proportion of duplicate or near-duplicate samples	L2 low density = low duplication	Pass
Sim-ML-2	Intra-class Similarity	Average similarity among samples within the same class	High-density concentration in video_tape	Caution

Sim-ML-1 -- Pass: Low Duplication

ISO 5259-2 Sim-ML-1 measures the proportion of samples that are excessively close in embedding space (effectively duplicates). High duplication leads to overfitting. The Level II diagnosis rated overall density as low, which paradoxically means there are few duplicate samples. SpectralWaste actually falls on the data-scarce side. While this is a Pass from the Sim-ML-1 perspective, it feeds directly into the data sufficiency issue (Eft-ML).

L2 density histogram -- overall low density distribution suggests data scarcity rather than duplication

L3 density histogram -- the same low-density pattern is confirmed under the domain-specific lens

Sim-ML-2 -- Caution: High Intra-class Similarity in video_tape

Sim-ML-2 flags the risk that when samples within a class are too similar, the model fails to learn broad decision boundaries for that class. At both Level II and Level III, all top-4 high-density outliers belong to the video_tape class. These samples are concentrated from the same capture date, time, and session (e.g., train__20230119_03_*), resulting in low intra-class diversity and high intra-class similarity.

ins1 -- 0.1795

ins3 -- 0.1701

ins13 -- 0.1667

ins13 -- 0.1649

L2 top-4 high-density samples -- all video_tape, all from the same session (20230119_03)

6 Representativeness Assessment -- Rep-ML

QM ID	Item	ISO Definition	Verdict
Rep-ML-1	Target Domain Coverage	Degree to which diverse real-world deployment conditions are covered	Fail
Rep-ML-3	Distribution Distance (KL-divergence)	Divergence between training data distribution and real-world distribution	Fail

Rep-ML-1 -- Insufficient Target Domain Coverage

ISO 5259-2 Rep-ML-1 evaluates whether the training data adequately covers the diverse conditions found in the actual deployment environment. SpectralWaste was collected on a single prototype conveyor belt. Real-world recycling facilities encounter varied lighting conditions (fluorescent, natural, nighttime), belt speeds, overlapping waste, contaminated materials, and diverse viewing angles. The Level III diagnosis confirming "urban setting, single cluster" directly reflects this domain bias. Under Rep-ML-1 criteria, real-world deployment coverage is severely inadequate.

L2 PCA overall distribution -- six classes overlap or scatter across embedding space

L3 PCA distribution -- under the domain-specific lens, samples collapse into a single cluster, confirming lack of environmental diversity

Rep-ML-3 -- Distribution Gap (KL-divergence)

Rep-ML-3 measures KL-divergence between the training data distribution and the real-world deployment distribution. While no real-world reference data is available to compute an exact KL-divergence score, the Level II density contour map shows a low-density, fragmented distribution, suggesting that the training data fails to represent the continuous distribution expected in production. Given the constraint of a single conveyor belt capture environment, the risk of distribution shift after deployment is high.

L2 overall density contour -- low density, fragmented cluster pattern

L3 overall density contour -- a single concentrated density region under the domain lens

7 Diversity Assessment -- Div-ML

QM ID	Item	ISO Definition	Verdict
Div-ML-1	Intrinsic Dimensionality	Effective dimensionality of the data -- how many distinct features exist	Caution
Div-ML-2	Feature Diversity	Diversity in visual features such as color, shape, and size	Fail

Div-ML-1 -- Multi-modal Distribution, but Limited Cluster Count

ISO 5259-2 Div-ML-1 measures diversity through the intrinsic dimensionality of the data. At Level II, a multi-modal distribution is observed, giving an initial impression of diversity. However, at Level III (domain-specific 32-dimensional lens), the data converges into a single cluster. This means that while the general-purpose lens (1,280 dimensions) shows separated clusters, the actual diversity of recycling-domain-relevant features is low. A Caution verdict is warranted under Div-ML-1.

L2 density contour -- multi-modal distribution with multiple clusters under the general-purpose lens

L3 density contour -- converges into a single cluster under the domain lens, indicating low effective diversity

Div-ML-2 -- Insufficient Visual Feature Diversity

Div-ML-2 measures diversity across visual features including color, size, shape, background, and lighting. Level III analysis found that "identical waste colors and small-size features" dominate the dataset. The pixel histogram also confirms that RGB distributions are concentrated in a narrow color range. This results from a single conveyor belt, fixed capture distance, and uniform lighting environment. Real-world recycling classifiers must handle crumpled, contaminated, or mixed waste in a wide range of sizes, colors, and backgrounds, making this dataset severely lacking under Div-ML-2 criteria.

L1 pixel histogram -- RGB channel pixel distributions are concentrated in a narrow brightness and color range

8 Effectiveness & Identifiability Assessment

QM ID	Item	ISO Definition	Measurement	Verdict
Eft-ML-1	Effective Sample Ratio	Proportion of classes meeting the training threshold	Min. class: 33 images (filament)	Fail
Idn-ML-1	Identifiability (PII)	Presence of personally identifiable information	Waste images only -- no PII	Pass

Eft-ML-1 -- Insufficient Effective Samples

ISO 5259-2 Eft-ML-1 measures whether each class meets the minimum sample threshold for effective model training. The typical minimum for deep learning classification is 100+ images per class, with 300+ recommended in practice. SpectralWaste's filament class has only 33 images, falling far short of this threshold. The bag and cardboard classes also have only 199 images each, below the recommended 300. In total, four of the six classes fail to meet the recommended threshold. This finding aligns precisely with DataClinic's "Data Bulk-up" recommendation and the ISO Eft-ML-1 Fail verdict.

Idn-ML-1 -- Pass: No PII Risk

ISO 5259-2 Idn-ML-1 requires that datasets contain no personally identifiable information (faces, license plates, names, etc.). SpectralWaste consists entirely of images showing recycling waste on a conveyor belt, with no people, personal data, or any identifiable elements present. The dataset is safe from a PII standpoint, and no personal data processing issues arise for commercial use (separate licensing restrictions notwithstanding).

9 Unmeasured Items (Auditability, Relevance, Timeliness)

QM ID	Item	ISO Definition	Status	Verdict
Aud-ML-1/2	Auditability	Data lineage tracking, quality audit logs	Planned for AADS extension	-- N/A
Rel-ML-1/2	Relevance	Contextual/purpose relevance, outlier detection	Planned for AADS extension	-- N/A
Tml-ML-1	Timeliness	Data freshness, appropriateness of collection date	On roadmap	-- N/A

Tml-ML-1 (Timeliness) note: SpectralWaste was collected between 2022 and 2023. As recycling waste types and packaging trends continue to evolve (e.g., new-material films, biodegradable bags), the dataset may not reflect current recycling conditions. Once timeliness measurement tools are in place, this item can also be assessed.

10 Summary & Recommendations

DQC Group	QM ID	Item	Verdict	Severity
Completeness	Com-ML-1/3/5	Value, Feature & Label Completeness	Pass	--
Consistency	Con-ML-2	Label Consistency	Caution	Medium
Accuracy	Acc-ML-7	Label Accuracy	Caution	Medium
Balance	Bal-ML-1	Class Balance	Fail	Critical
Balance	Bal-ML-2	Feature Balance	Fail	High
Similarity	Sim-ML-1	Duplicate Instance Ratio	Pass	--
Similarity	Sim-ML-2	Intra-class Similarity	Caution	Medium
Representativeness	Rep-ML-1	Domain Coverage	Fail	Critical
Representativeness	Rep-ML-3	KL-divergence	Fail	High
Diversity	Div-ML-1	Intrinsic Dimensionality	Caution	Medium
Diversity	Div-ML-2	Feature Diversity	Fail	High
Effectiveness	Eft-ML-1	Effective Sample Ratio	Fail	Critical
Identifiability	Idn-ML-1	PII Risk	Pass	--
Auditability, Relevance, Timeliness	Aud/Rel/Tml	--	-- N/A	--

Immediate Action

Bal-ML-1: Collect or synthesize additional filament data (target: 300+ images minimum)
Eft-ML-1: Bulk up all four under-threshold classes via data collection or augmentation

Mid-Term Improvement

Rep-ML-1: Expand capture environments with diverse lighting, backgrounds, and viewing angles
Div-ML-2: Include contaminated, crumpled, and mixed waste samples
Bal-ML-2: Diversify color and size feature distributions

Monitoring

Con-ML-2: Cross-verify labels between similar samples
Acc-ML-7: Manually review all 20 low-density outlier labels
Sim-ML-2: Diversify video_tape capture sessions

DataClinic Recommendation vs. ISO 5259 Verdict Alignment

DataClinic's "Data Bulk-up" recommendation aligns precisely with the ISO 5259-2 Bal-ML-1 (class imbalance) and Eft-ML-1 (insufficient effective samples) Fail verdicts. The fact that two independent frameworks -- using different methodologies -- arrive at the same conclusion validates DataClinic's diagnostic results in the language of ISO international standards. This confirms that DataClinic effectively implements the ISO 5259-2 Quality Measures in practice.

References

[1] ISO/IEC JTC 1/SC 42. (2024). ISO/IEC 5259-2:2024 -- Part 2: Data quality measures.
[2] DataClinic Report #223 -- SpectralWaste. dataclinic.ai/en/report/223
[3] SpectralWaste Dataset. Kaggle
[4] Pebblous. (2025). AI Data Quality Standards and Pebblous DataClinic: Quantitative Mapping to ISO/IEC 5259-2