Executive Summary

1.22 million thermal images were gathered to teach AI to spot fires, leaks, and overheating across industrial complexes. AI Hub's Thermal Camera Image Dataset (dataSetSn=235) photographs 10 object types (storage tank, transport pipe, transport valve, switchboard, AC outdoor unit, factory exterior/interior, person, car, ship) in both normal and anomaly states. Across 20 classes, that is 1,223,849 images. DataClinic Report #128 shows where this asset is solid and where it is still soft.

At L1, image channels split into 80% RGBa and 20% RGB, and class counts span an 11.14× spread. In L2's generic neural lens (1,280 dimensions), the median density gap between normal and anomaly sits around 0.45. Switch to L3's domain-optimized lens (120 dimensions) and that gap widens to roughly 1.3, almost 3× wider. Dimensionality optimization sharpens the classification boundary.

Yet some signals remain unreachable by dimensions alone. The most typical Pipe-Normal image (density 2.77) and Pipe-Leak (median ≈ 1.0) form the dataset's most dramatic contrast within a single domain, while the low-density outliers in Person-Normal (D1) and Car-Normal (D30) recur as the same images at rank 1–2 in both L2 and L3. Meanwhile, the high-density top of Person-Anomaly-Identify is occupied by files named CAR_AON_21.02.05_…_D20_: bboxes from a car-anomaly capture session that ended up labeled as person-anomaly. Three threads of the same diagnosis, lined up on one page.

20
classes (10 objects × normal/anomaly)
1.22M
images diagnosed
11.14×
class imbalance (max/min)
L2→L3 separation widening

⚠️ Note on missing scores — DataClinic's composite score and L1/L2/L3 grades are gated behind authentication and were not collected by this pipeline. The article avoids asserting absolute scores and grounds its analysis in qualitative findings about distributions, outliers, and labels. Neighbor-report scores appear only in the Comparison Frame section.

📊 DataClinic's Three-Level Diagnostic System

DataClinic looks at a dataset through three lenses of increasing depth. It starts with surface statistics, moves to a generic neural view, and ends with a view tuned to the domain. Each level surfaces finer quality issues than the last.

L1

Basic Quality Diagnostic

Checks dataset hygiene: image channels, resolution, missing values, class balance. This is where the dataset's mixed RGB/RGBa channels and 11× class imbalance first surface.

L2

DataLens — Generic Neural Net (1,280 dim)

Analyzes distribution, geometry, and density in Wolfram ImageIdentify Net V2's 1,280-dimensional feature space. Every image is first seen as a "general image."

L3

Domain-Optimized Lens (120 dim)

Reduces dimensions to fit the domain so its native patterns surface. For this dataset, 1,280 dimensions compress to 120 (≈10.7×), and normal-vs-anomaly separation sharpens about 3×.

Dataset Overview — 1.22M Images Built to Detect Industrial Disasters

Fire, leaks, and overheating are among the most common — and most belatedly noticed — accidents in industrial complexes. By the time the human eye catches them, it is usually too late. Thermal cameras can see the warning signs earlier, and an AI that reads those thermal frames automatically can sound the alarm earlier still. The Thermal Camera Image Dataset (AI Hub dataSetSn=235) is a national-scale AI training dataset built to train exactly that alarm system.

Ten object types common in industrial environments were photographed in both normal and anomaly states. Normal frames totaled 770,765 and anomaly frames 263,864 (1,034,629 in all). The cropped-to-bbox version is the subject of this diagnostic: 20 classes, 1,223,849 images. The dataset was led by NH Networks with Dongwon Safety System, Tium Welfare Foundation, and VTW Inc. as participating partners.

Thermal Camera Image Dataset collage — a cross-section of 20 classes and 1.22M frames

▲ A cross-section of 20 classes across 1.22M frames — the signature palette of industrial thermal imagery: orange-yellow heat sources floating on purple-blue cooled backgrounds.

What this dataset is and the conditions under which it was collected come down to three metadata axes. 640×480 PNGs encode temperatures from -20°C to 1000°C (±2°C accuracy), the same objects are shot at distances from 20cm to 90m and tagged with distance tokens D0.5–D200, and collection ran continuously for six months from September 2020 through February 2021. These three axes — temperature range, shooting distance, collection window — determine where and how the distribution splits in the L2 and L3 analyses that follow.

🌡️

Thermal Specs

640×480 PNG
-20°C ~ 1000°C (±2°C)

📏

Shooting Distance

20cm ~ 9,000cm
D0.5 ~ D200 labels

📅

Collection Window

2020.09 ~ 2021.02
6 months continuous

Filename structure  PIP_NOM_21.03.15_CIC_A1016-A1110_D1_003078_bbox0.png

PIP = object code (PIP=Pipe, CAR=Car, HUM=Human, AIR=AC outdoor unit …) NOM / AON = state (NOM=Normal, AON=Anomaly) 21.03.15 = capture date (YY.MM.DD) CIC = site code A1016-A1110 = session (start time – end time) D1 = shooting distance (D0.5 = 0.5m, D200 = 200m) 003078 = sequence number bbox0 = box index within the same frame

The distance tokens (D1·D30·D200) in filenames turn out to be decisive in later analysis. The §"Two Things Dimensions Could Not Reach" section explains why the same car shot at 1m and 30m looks like nearly different classes to the AI.

Below are two annotated samples published on the official AI Hub page. They make visible the metadata each frame carries into the dataset: class code (Person, Car), inspection distance (Inspection_distance: 300cm), resolution (640×480), and even ambient temperature, humidity, and wind speed. The cropped version diagnosed here is just the bbox slice of these source frames, and the distance tokens and class labels discussed throughout this article all originate here.

AI Hub official annotation sample — Person-Normal (640×480, 300cm distance, 2.8°C ambient)
▲ Person-Normal — sample with annotation metadata overlay (640×480 · D3 · 2.8°C)
AI Hub official annotation sample — Car (640×480, 300cm distance, 2.8°C ambient)
▲ Car — same half-hour session, identical environment, different object class

Source: AI Hub · Thermal Camera Image Dataset (dataSetSn=235)

AI Hub also publishes the annotation file structure: a four-block JSON of licenses (NH Networks), info (created 2021-01-27), categories, images, and annotations. Every image carries not just its class category but ambient temperature (Environment_temperature: 2.8°C), inspection temperature range (Inspection_temperature_range: -3°C ~ 6°C), and bbox coordinates, all in one record. The RGB/RGBa channel mix and the Person-Anomaly-Identification label drift this article surfaces are patterns that emerge when any field of this schema is empty or misaligned.

AI Hub thermal dataset annotation JSON schema — licenses, info, categories, images, annotations
▲ Official AI Hub annotation JSON schema — even the metadata fields this diagnosis does not touch, laid out side by side. | Source: AI Hub

L1 — Hygiene Passes, but Two Signals Surface in Channels and Class Balance

L1 checks the dataset's basic health. Missing values are essentially perfect (1 in 1,223,850), and no label-integrity issues are reported. But two items light up: channel consistency and class balance.

80% RGBa + 20% RGB — Channels split two ways

The same dataset contains RGB images (19.92%) and RGBa images (80.08%). One side carries an extra alpha channel, so the model's input channel count differs. Unless the training pipeline forces a unified channel format, the same object shot by the same camera arrives at the model as two different inputs.

Image channel ratio Total 1,223,849 images
RGBa · 80.08%
RGB · 19.92%

Without unifying to a single format in preprocessing, model input channels remain inconsistent.

11.14× class spread — Car-Normal vs. Ship-Overheat

Across 20 classes, image counts average 61,192 (σ = 50,079), a wide distribution to start with. The largest, Car-Normal, holds 225,954 frames; the smallest, Ship-Overheat (anomaly), only 20,290, an 11.14× gap.

Image count by class (top 6 + bottom 4)

Car-Normal
225,954
Person-Normal
106,720
Ship-Normal
98,297
Storage Tank-Normal
96,976
Person-Anomaly-Identify
87,561
AC Outdoor-Normal
79,300
… (10 classes)
Switchboard-Short Circuit
21,334
Pipe-Leak
20,651
Valve-Leak
20,605
Ship-Overheat
20,290

Normal classes dominate the top, while Anomaly classes cluster at the bottom. Likely a reflection of natural occurrence rates, but the gap is wide enough to require class weighting during training.

Cold Background, Narrow Heat Source — Industrial Thermal Color Splits in Two

Pixel-level RGB distribution shows that purple-blue (low R) pixels overwhelmingly dominate the dataset, while orange-yellow (high R) pixels rise as a narrow, sharp spike. The cooled background fills most of the frame area and the heat source burns small and bright. A single chart captures the signature color structure of industrial thermal imagery.

L1 pixel histogram — RGB channel distributions

▲ L1 pixel histogram. Purple-blue (low R) pixels dominate; orange-yellow (high R) appears as a narrow, tall peak — the visual identity of industrial thermal imagery.

The Blurrier the Mean, the More Restless the Class — Three Normal/Anomaly Pairs to Pre-read

Each class's per-pixel mean reveals its internal visual diversity. The blurrier the mean, the wider the mix of angles, distances, and temperatures inside the class; the crisper, the more repetitive the composition. Previewing these three pairs makes the L2/L3 density analysis below land faster.

Pipe-Normal sample (D1 close-up) sample
Pipe-Normal mean image mean
Normal Pipe
Pipe-Leak sample (D0.5 ultra-close) sample
Pipe-Leak mean image mean
Anomaly Pipe-Leak
AC Outdoor-Normal sample sample
AC Outdoor-Normal mean image mean
Normal AC Outdoor
AC Outdoor-Overheat sample sample
AC Outdoor-Overheat mean image mean
Anomaly AC-Overheat
Person-Normal sample (D5) sample
Person-Normal mean image mean
Normal Person
Person-Anomaly-Identify sample sample
Person-Anomaly-Identify mean image mean
Anomaly Person-Identify

▲ In each card, the left tile is a class sample and the right tile is the per-pixel mean of the entire class. Notice that Pipe-Normal and Pipe-Leak means converge into nearly the same purplish blur — L1 alone cannot separate normal from leak.

L2 — Where Normal First Splits From Anomaly

L2 reads distribution, geometry, and density in Wolfram ImageIdentify Net V2's 1,280-dimensional space. It is the view that treats every image as a general photograph first. Thermal imagery doesn't sit comfortably in that generic gaze, yet class hierarchy and the first split between normal and anomaly still emerge at this level.

Distribution Splits in Two — A Left Mound and a Tight Spike at 0.68

The L2 density histogram has two peaks and one spike. The left peak (density 0.12–0.15) is where most of the data gathers, while a narrow, tall spike near 0.68 holds the "strong signal" images: industrial-equipment close-ups and their kin. That two-branched shape carries directly into the class hierarchy.

L2 density histogram — bimodal + 0.68 spike

▲ L2 density histogram — a left peak plus a narrow spike near 0.68. Through a generic neural lens, industrial thermal imagery falls far from a uniform distribution.

Normal and Anomaly Diverge by 0.45 — but Each Class Drifts Even Wider

Lining up all 20 classes as box charts, anomaly classes sit in the left (low-density) half and normal classes in the right (high-density) half. But the gap in median density between anomaly and normal averages only about 0.45, and every class's whiskers stretch across the full range, meaning intra-class diversity often exceeds the inter-class distance.

L2 box chart — density distribution across 20 classes

▲ L2 box chart. Anomaly classes lean left of the dataset mean (dashed line), normals lean right. Yet every box's whiskers cover the full range — the distributions split, but classes stay internally diffuse.

The Mean Image Sits at the Edge, Not the Center — PCA's Quiet Warning

Compressed to two axes via PCA, the 20 classes appear as one large blob. Class separation isn't visually recoverable in 2D. One detail does stand out: the white diamond marking the Mean Image Feature lands not at the blob's center but at its left edge. The mean of the 20 classes is not a representative point of the actual distribution.

L2 PCA — single blob with outlying mean image

▲ L2 PCA. Class separation is invisible, and the white diamond (mean image feature) sits on the edge of the distribution.

Same Pipe, Two Distributions — L2's First Visualization of the Normal/Anomaly Asymmetry

Placing class-level density distributions side by side with class samples maps the shape of the distribution directly onto the picture. Pipe-Normal piles narrow and tall on the high-density right; Pipe-Leak spreads flat on the low-density left. AC-Outdoor and Person-Anomaly show the same directional asymmetry.

Pipe-Normal L2 density distribution density
Pipe-Normal sample sample
Normal Pipe
Narrow, tall peak on the high-density right. A textbook pattern with clear heat-source vs. background contrast.
Pipe-Leak L2 density distribution density
Pipe-Leak sample sample
Anomaly Pipe-Leak
Flat distribution on the low-density left. Leak shapes are not uniform.
AC Outdoor-Overheat L2 density distribution density
AC Outdoor-Overheat sample sample
Anomaly AC-Overheat
Broad spread from mid to left. Overheated regions vary widely in position and size.
Person-Anomaly-Identify L2 density distribution density
Person-Anomaly-Identify sample sample
Anomaly Person-Identify
Widest distribution, with a long left tail. Distance, pose, and body-temperature variance exceed every other class.

L3 — Fold the Dimensions, and the Normal/Anomaly Gap Widens 3×

L3 applies a domain-optimized dimensionality to the same data. Wolfram ImageIdentify Net V2's 1,280 dimensions compress roughly 10.7× to 120, this dataset's L3 lens. With fewer dimensions, the feature space densifies and absolute density values grow across the board. The point isn't to directly compare L2's average (≈0.38) with L3's (≈1.75), but to watch how the distribution shape and inter-class gaps change.

The Narrow Spike Dissolves into a Plateau — A Distribution Reshaped by Dimensional Optimization

The L3 density histogram forms a flat plateau across 0.75–1.6 and rises into a single broad peak near 2.25. The narrow, sharp spike at 0.68 in L2 disappears; the distribution flows more naturally. Dimensionality optimization is using the feature space more evenly.

L3 density histogram — plateau + right peak

▲ L3 density histogram. A wide plateau across 0.75–1.6 and a broad peak near 2.25. L2's narrow spike is gone; the distribution flows smoothly.

What the Box Chart Shows — Separability Stretches from 0.45 to 1.3 (3× Wider)

The median gap between normal and anomaly classes, around 0.45 in L2, widens to roughly 1.3 in L3. Absolute density units differ, but within the same class set, separation grows about 3× sharper. Anomaly classes — Ship-Overheat, Valve-Leak — line up on the low-density left, while normal classes — Pipe-Normal, Car-Normal — gather on the high-density right. Yet every box's whiskers still span 0.4–2.5: intra-class diversity remains unresolved at L3.

L3 box chart — density distribution across 20 classes

▲ L3 box chart. Anomaly classes (median 0.9–1.1) on the left, normal classes (median 1.9–2.35) on the right. The gap widens nearly 3× vs. L2.

Same Class, Same Position, Wider Gap — Four Pairs Revisited in L3

We revisit the same four classes in L3. Their positions (low or high density) hold, but absolute density values grow and inter-class distance becomes more pronounced. Pipe-Normal climbs nearer the peak on the high-density right (median ≈ 2.3), and Pipe-Leak sinks deeper on the low-density left (median ≈ 1.0).

Pipe-Normal L3 density distribution density
Pipe-Normal sample (density 2.77, L3 rank 1) sample
Normal Pipe L3 high-density #1
Dataset-wide L3 high-density TOP1 (density 2.77). A textbook industrial-thermal exemplar.
Pipe-Leak L3 density distribution density
Pipe-Leak sample sample
Anomaly Pipe-Leak
Median ≈ 1.0, the most dramatic distance from the normal peak (2.77) of any class pair.
AC Outdoor-Overheat L3 density distribution density
AC Outdoor-Overheat sample sample
Anomaly AC-Overheat
Still on the low-density left, but the distribution narrows compared to L2. The class identity sharpens.
Person-Anomaly-Identify L3 density distribution density
Person-Anomaly-Identify sample sample
Anomaly Person-Identify
A new narrow high-density cluster appears at L3 (2.45–2.57). The §"Car Inside Person-Anomaly" section identifies what it is.

What PCA 2D Cannot Show — A Single Blob Hides the Dimensional Achievement

What happens in high dimensions doesn't show up in a 2D PCA plane. Place L2 and L3 PCAs side by side and both are still single large blobs; separation isn't visualized. The gains of dimensionality optimization must be read off the box chart and density histogram, not the PCA.

L2 PCA 2D projection
L2 PCA — 1,280 dim → 2D
L3 PCA 2D projection
L3 PCA — 120 dim → 2D

▲ L2 and L3 PCAs at the same scale. Both show a single blob; the mean image (diamond) sits at the edge. Dimensionality optimization's effect has to be read off the box chart, not the PCA.

Two Things Dimensions Could Not Reach

Moving from L2 to L3 widens the normal/anomaly gap 3×, but the top 1 and 2 low-density outliers stay exactly the same images. These are structural problems from the data collection stage that dimensionality cannot fix. They come in two flavors.

Person-Normal D1 — the statelessness of a 1m close-up

L2 and L3 both place the same image at rank 1 of the low-density tail: HUM_NOM_21.02.10_CIC_A1058-A1118_D1_014970_bbox1.png. D1 means the camera is within 1m of the subject, so the face and upper body fill the frame entirely. While other classes capture industrial equipment at medium range, this image is a close-up of a human body, different in scale and pattern. Even within the same "Person" class, it becomes a loner.

Person-Normal D1 close-up — L2/L3 shared low-density #1

▲ Person-Normal D1 close-up. Low-density #1 in both L2 and L3 — a "collection scale" difference that dimensions cannot resolve.

Car-Normal D30 — two normals inside one class

Car-Normal, the largest class at 225,954 frames, splits into two halves along a single distance token. At D3 (3m close range) it lands in the L3 high-density top tier at density ≈ 2.7, but at D25–D30 (25–30m range) it falls to 0.43–0.47. Same label, but the AI sees nearly two classes. From far away, the heat signal has all but vanished and only the car's silhouette remains; up close, the engine's heat source and the body's contour stay vivid. Two different normals built from the same label.

Car-Normal D30 — a car cooled in the 30m distance

▲ Car-Normal D30. A car that has cooled into purple-blue from 30m away — same label, but visually almost a different class from the D3 car.

Both cases look like individual outliers, but unfold the low-density top-5 table and they line up as one pattern. Four of the five low-density images in L2 and L3 are the same photographs, and the remaining one is another car shot at the same D30 range. The two extremes, close-up (D1) and long range (D25, D30), occupy the distribution edge through both lenses. A pattern that dimensions cannot resolve and only camera distance can.

L2/L3 shared low-density outliers — TOP-5

Rank Class Distance L2 density L3 density L2/L3 match
1Person-NormalD10.0660.411✅ same image
2Car-NormalD300.0700.426✅ same image
3Person-Anomaly-IdentifyD50.0690.447✅ same image
4Car-NormalD300.456L3 new
5Car-NormalD250.0710.471✅ same image

Four of five appear at the same low-density position in both L2 and L3. Camera distance (D1, D25, D30) and the close-up/cooled extremes produce the pattern.

There's a Car Inside Person-Anomaly — Tracks of Label Contamination

L3 similarity analysis displays, for each class, the highest-density pivot image and its ten nearest neighbors. Open the second pivot of Person-Anomaly-Identify and unfold its ten neighbors, and the filenames start uniformly not with the human code (HUM) but with the car code (CAR_AON_). All come from the CAR_AON_21.02.05_CIC_P1505-P1513_D20_* series, a single batch shot on the same day (2021-02-05), at the same session window (P1505-P1513), at D20 (20m). Their L3 densities cluster tightly in 2.45–2.57. A structurally manufactured cluster.

The CAR_AON cluster inside Person-Anomaly-Identify (L3 top-10 similarity)

// Pivot (density 2.569) Person-Anomaly-Identify/CAR_AON_21.02.05_CIC_P1505-P1513_D20_015430_bbox4.png // 10 neighbors (density 2.45–2.57) Person-Anomaly-Identify/CAR_AON_21.02.05_CIC_P1505-P1513_D20_015437_bbox5.png Person-Anomaly-Identify/CAR_AON_21.02.05_CIC_P1505-P1513_D20_015445_bbox3.png Person-Anomaly-Identify/CAR_AON_21.02.05_CIC_P1505-P1513_D20_015452_bbox5.png Person-Anomaly-Identify/CAR_AON_21.02.05_CIC_P1505-P1513_D20_015460_bbox4.png Person-Anomaly-Identify/CAR_AON_21.02.05_CIC_P1505-P1513_D20_015463_bbox3.png (10 total, all from the same batch)

A single batch shot on 2021-02-05 at P1505-P1513, D20, has slipped into the dataset wearing the "Person-Anomaly-Identify" class label.

Pull one of these files and look at it: most of the frame is cool, dark space with a small, faint heat signal off to one side. It looks like a person-shaped area extracted from a multi-object frame during a car-anomaly shoot. The labeling isn't necessarily wrong by procedure — but the visual signal does not match what a "Person-Anomaly-Identify" class should contain.

A CAR_AON file mis-bucketed into the Person-Anomaly-Identify class

▲ A CAR_AON file that ended up in the Person-Anomaly-Identify class. Most of the frame is cold and dark, with a small heat signal off to one side. An inadequate pattern for an intruder/abnormal-behavior detection model.

Real-world impact — Use this cluster as training data, and an intruder/abnormal-behavior detection model becomes more likely to misclassify car-anomaly scenes as "person anomaly." In industrial complexes where cameras predominantly capture vehicle-related scenes, false positives could spike. The CAR_AON_21.02.05_CIC_P1505-P1513 series needs a batch-level class label review.

Real-World Impact — When AI Reads a 'Leak' as Normal

Data quality diagnostics don't end at distribution charts. They become meaningful when they translate into the specific errors an industrial safety AI would make on site. We pull up the most dramatic normal/anomaly pair from the Pipe class, the meeting point between the L3 high-density TOP1 (density 2.77) and Pipe-Leak's median (≈ 1.0).

⚠️ Scenario: Same pipe, different distribution — 1m normal vs. 0.5m leak

On the left, the Pipe image the AI ranks as "most typical normal" (D1, density 2.77, L3 #1). On the right, the representative Pipe-Leak sample (D0.5), same domain but sitting at the edge of the distribution (median ≈ 1.0).

Pipe-Normal D1 — L3 high-density #1
🎯 The most confidently normal
Pipe-Normal · D1 · 21.03.15
L3 density 2.77 · TOP1
Same domain,
different distribution
Pipe-Leak D0.5 — edge of the distribution
❓ A leak the AI sees as atypical
Pipe-Leak · D0.5 · 21.02.20
L3 median ≈ 1.0
❌ The threshold trap — When a leak image sits on the edge of the normal distribution, a simple density threshold may fail to flag the "atypical anomaly that looks close to normal."
Where the tail of normal overlaps the head of anomaly. The gray zone where industrial safety AI errs most often.

🔄 When 1m and 30m turn the same car into two normals

The secondary scenario runs in the same direction. Car-Normal at D3 reads as the most typical normal at density ≈ 2.7, but at D30 it falls to 0.43–0.47, two peaks inside one class. Without standardizing camera distance on site, the same car gets learned as two different cars. The risk of missing a vehicle anomaly signal at night or at range begins here.

✅ Direction of the fix — standardize at the collection stage

Dimensionality optimization widening the normal/anomaly gap 3× is a clear win. But what dimensions cannot sharpen, the camera has to fix.

  • ① Standardize camera distance tokens (D1, D30) to 1m increments and split the extremes into separate subclasses
  • ② Unify RGB and RGBa channels into a single format
  • ③ Audit the bbox labeling procedure for multi-object frames, especially person regions extracted during car-anomaly shoots
✅ What dimensions can sharpen vs. what the camera has to standardize
L3 domain optimization strengthens the quality signal. Distance, channel, and bbox integrity are problems only collection-stage standardization can solve.

Comparative Frame — More Standardizable than Waste, Yet Standardization Left Unfinished

Within the same AI Hub industrial domain, neighbor reports diagnose datasets of comparable scale. Putting two similar-sized datasets side by side sharpens this one's position.

This article · Report #128
Thermal Camera Images
20 classes (10 × normal/anomaly)
1,223,849 images · AI Hub real
Industrial safety / disaster detection
Score not collected
DataClinic report →
Report #131
National Industrial Waste Images
72 classes
~1,000,000 images · AI Hub real
Industrial waste classification
Score 51 · Poor
Read this diagnosis →
Report #225
PBLS Military 3-class
3 classes · 1,947 images
Pebblous synthetic
Military object classification
Score 79 · Fair
Read this diagnosis →

The closest comparison is the same-domain, real-world, million-scale #131 Industrial Waste Images. Waste shapes are highly free-form and resist standardization. Thermal imagery, where distance, angle, and temperature gradient mostly determine form, is far more standardizable. Even so, this dataset achieves only partial standardization on distance tokens (D1–D200) and channel formats (RGB·RGBa), and L3 dimensionality optimization cannot close those gaps on its own. Thermal data is more standardizable than waste. But this dataset doesn't use that potential fully.

Conclusion — What Dimensions Can Sharpen vs. What the Camera Must Standardize

Moving from L2 to L3, normal and anomaly separate more cleanly. The average median gap widens from ~0.45 to ~1.3, roughly 3×. The classifier gets a sharper decision boundary. Dimensionality optimization is clearly effective at strengthening data quality signals.

But the same step also surfaces another fact. Through both L2 and L3 lenses, the top-2 low-density images are the same photographs. Person-Normal D1 close-ups and Car-Normal D30 long-range shots are atypical regardless of which dimensionality you look at. The two-normals problem that a distance token creates within one class is the camera's problem, not a dimensional one.

And L3 similarity analysis adds one more finding. Unfold the high-density cluster of Person-Anomaly-Identify and the top is occupied by files extracted from a single day, single session, D20 batch, all under a car code. This is a different problem from camera standardization: a labeling-procedure question about how to assign bboxes to classes inside multi-object frames.

The seven findings from this diagnostic line up in one table. Signals that dimensionality optimization can sharpen, and signals that only camera or labeling-procedure standardization can resolve, sit one row at a time facing each other.

Item Finding One-line assessment
L1 channelsRGBa 80% + RGB 20%Unify to a single format before training
L1 class balance11.14× spreadApply class weighting during training
L2 → L3 separationMedian gap 0.45 → 1.3 (≈ 3×)A clear win for dimensionality optimization
L2/L3 shared low-density4 of 5 are the same imageA collection-scale problem dimensions cannot fix
Car-Normal distributionD3 (2.7) vs. D30 (0.43–0.47)Two normals in one class — distance standardization needed
Pipe normal/anomaly contrast2.77 vs. 1.0Most dramatic signal — threshold-design gray zone
Person-Anomaly label contaminationCAR_AON_21.02.05 batch dominatesBbox labeling procedure needs review (HIGH)

1.22 million industrial thermal images is a substantial asset for Korean industrial-safety AI. If dimensionality optimization has clarified half of that asset, the other half remains to be unlocked through collection-stage standardization of camera distance, channel format, and bbox labeling. The full DataClinic diagnostic report for this dataset lives on DataClinic, and the source data is available from AI Hub at dataSetSn=235.

R

References

Dataset

DataClinic Diagnostic Reports