2026.04 · Pebblous Data Communication Team

Reading time: ~14 min · 한국어

Executive Summary

This article is based on the findings of DataClinic Report #227. PBLS_Drone_classification is a synthetic drone image dataset produced by Pebblous in-house. It uses the same underlying imagery as the previously diagnosed PBLS_Drone (#226), but with classification labels for 12 distinct drone types added on top. The dataset contains 28,800 images — exactly 2,399 per class, achieving textbook-perfect balance. Yet DataClinic's overall score came in at 76 (Fair), lower than #226's 82. Why does flawless class balance still result in a lower score? That is the central question this diagnosis addresses.

76
DataClinic Overall Score
12
Drone Type Classes
28,800
Total Images
2,399
Images per Class (Perfect Balance)

#226 vs #227 — Same Data, Different Diagnosis

82
#226 PBLS_Drone
Single class "drone" · 28,801 images
DataBulkup recommended
76
#227 PBLS_Drone_classification
12 classes DR01–DR12 · 28,800 images
DataDiet recommended

Adding class labels → 53-dimensional domain-specific lens applied → multi-cluster structure persists → score drops

DataClinic Grade Summary

L1 IntegrityGood
L1 MissingGood
L1 Class BalanceGood
L1 StatisticsGood
L2 DataLensNo Issues
L2 GeometryGood
L2 DistributionPoor
L3 DataLensNo Issues
L3 GeometryFair
L3 DistributionPoor

Detection vs Classification — What's the Difference?

Drone AI breaks down into two fundamentally different problems.

🔍

Detection

"Is that a drone?" — binary discrimination

PBLS_Drone (#226) is designed for this. A single class groups all drone forms together, teaching the model the essence of "drone-ness."

🎯

Classification

"Which drone is that?" — type identification

The goal of PBLS_Drone_classification (#227). Distinguishing DR01–DR12 maps directly to Counter-UAS threat-level assessment.

In military operations, drone classification is more than a technical challenge. A small reconnaissance drone (DR01) and a loitering munition (DR05/DR09) represent entirely different threat levels. Only when AI can accurately classify the type can the appropriate response — jamming, laser intercept, or kinetic defense — be decided instantly.

Different data requirements: A detection model only needs to tell "drone vs. non-drone," so a single-class dataset suffices. A classification model must learn the discriminating features between classes, making both intra-class diversity and inter-class separability critical. This is the core reason the same image pool scores 82 for detection but only 76 for classification.

Dataset Overview — PBLS_Drone_classification

PBLS_Drone_classification is a synthetic drone image classification dataset produced in-house by Pebblous. 28,800 Full HD (1920×1080) RGB images — generated through the same CG rendering pipeline — have been systematically labelled with 12 drone type categories.

PBLS_Drone_classification dataset representative image collage

PBLS_Drone_classification — 12-class synthetic drone image collage (DataClinic L1)

Filename Structure — How Classes and Frames are Encoded

DR05_1247.png
DR01 ~ DR12
Drone type class
12 distinct UAV models
0001 ~ 2399
Flight sequence frame number
angle, distance, and background progression

12 types × 2,399 frames = 28,788 images (used in diagnosis). Each drone is captured continuously along a simulated flight path across varying angles and backgrounds.

📊 Dataset Specifications

  • 🖼️ 28,800 images (28,788 used in diagnosis)
  • 📦 52 GB (52,644 MB)
  • 📐 1920×1080 px — Full HD
  • 🎨 RGB channels — fully consistent
  • 🏷️ 12 classes — DR01–DR12
  • ⚖️ Perfect balance — 2,399 images per class, σ=0

🎯 Intended Applications

  • 🛡️ Drone type classification AI model training
  • 🎯 Threat-level identification system development
  • 🔬 Fine-grained class discrimination benchmarking
  • 🌐 Counter-UAS decision-making AI
  • 📡 Multi-sensor fusion classification research

⚠️ Not for Commercial Use

This dataset was developed for defense-specific applications. Use is restricted to non-commercial purposes including research, education, and defense AI development.

Level 1 — Basic Quality: All Checks Pass

Level 1 examines fundamental image integrity, missing values, class balance, and statistics. PBLS_Drone_classification earned a Good rating across all four checks.

✅ Image Integrity

All images fixed at 1920×1080 px. Min equals max — zero size variance. 100% consistent RGB channels. Ready for training without any preprocessing.

✅ Class Balance

12 classes × 2,399 images. Standard deviation σ = 0.0. Zero model bias from class imbalance. No class is over- or under-represented.

L1 Summary — Synthetic Data's Precision Control

If you tried to build a classification dataset from real drone footage, achieving balanced class counts would be extremely difficult on its own. Certain drone types are rarely captured on camera; certain environments and angles produce skewed distributions. PBLS_Drone_classification eliminates this problem entirely through its CG generation pipeline — you simply render each of the 12 drone types exactly 2,399 times.

Class Mean Images — Background Repetition Made Visible

A mean image is the pixel-wise average of all images in a class. When a background repeats frequently, it appears sharp in the mean; the drone itself becomes blurred. The right-hand image in each pair shows the most canonical (high-density) real frame from the same class.

DR01 mean image
DR01 representative image

DR01 — Mean (left) · Canonical (right)

DR02 mean image
DR02 representative image

DR02 — Mean (left) · Canonical (right)

DR03 mean image
DR03 representative image

DR03 — Mean (left) · Canonical (right)

DR04 mean image
DR04 representative image

DR04 — Mean (left) · Canonical (right)

DR05 mean image
DR05 representative image

DR05 — Mean (left) · High-density canonical (right)

DR06 mean image
DR06 representative image

DR06 — Mean (left) · Canonical (right)

▲ Per-class mean images for DR01–DR06 (DataClinic L1) alongside high-density canonical samples. The sharper the background in the mean image, the more that background was repeated across the class.

Level 2 — DataLens Analysis: Environments Cluster, Distribution Fragments

Level 2 uses the Wolfram ImageIdentify Net V2 (1,280 dimensions) as its lens to analyze the latent-space structure of the data. This is a general-purpose lens — not drone-specific — and captures broad visual characteristics across the dataset.

PBLS_Drone_classification Level 2 PCA distribution — full feature space

▲ L2 PCA full distribution — feature space across all data (dense cluster top-left, dispersed tail right / Wolfram 1,280-dim lens)

Three Environment Clusters

Through the general-purpose lens, the data groups by shooting environment rather than drone type.

1

Urban Environment

High-rise buildings, city backgrounds. The dominant cluster containing the largest share of images.

2

Natural Environment

Forests, fields, open sky. Visually distinct from urban backgrounds.

3

Mixed Environment

Coastal, suburban, and composite backgrounds. Positioned between the two main clusters.

PBLS_Drone_classification Level 2 density plot

▲ L2 density terrain map — positions and densities of the three environment clusters

L2 Distribution: Poor — What Multimodality Means

The Poor rating on L2 distribution is caused by a multimodal distribution. The data does not form a single continuous distribution; instead, it breaks into multiple discrete clusters. Three environment clusters sit disconnected from one another — meaning the dataset is fragmented. For a model to generalize robustly across varying environments, it needs in-between data that bridges those clusters.

Level 3 — Even the Domain-Specific Lens Can't Resolve the Split

Level 3 applies a 53-dimensional domain-specific lens. It selects only the dimensions from the 3-million-image Wolfram model that retain discriminative power for drone classification, giving a more drone-focused view of the data than the general-purpose L2 lens.

PBLS_Drone_classification Level 3 PCA distribution — drone-specific 53-dim lens

▲ L3 PCA full distribution — 53-dim drone-specific lens. Point cloud spreads widely with uneven density.

⚠️ 3-Cluster Structure Persists — Domain-Specific Lens Fails to Heal the Split

This dataset split into three clusters under the 1,280-dim general-purpose lens (L2). Under the 53-dim drone-specific lens (L3), it still maintains three separate clusters. The density contour plot below makes this unambiguously clear:

  • Primary cluster (center-left, dominant): The largest, highest-density group — the vast majority of data concentrates here
  • Secondary cluster (upper-right): Medium-sized, spatially disconnected from the primary cluster
  • Minor cluster (lower-right): Small, low-density outlier group

If the L2 split were caused purely by a general-purpose lens picking up environmental background differences, switching to a domain-specific lens should resolve it. But the fact that the split persists under a drone-classification lens indicates that visually dissimilar subgroups exist within the drone-relevant features themselves. This is the substance behind the L3 "Poor distribution" verdict.

L3 density contour — 3-cluster structure

▲ L3 density contour — dominant primary cluster (deep red, center-left) + two secondary clusters (upper-right, lower-right). All three are clearly separated by the contour lines.

PBLS_Drone_classification Level 3 density terrain map

▲ L3 density terrain map — multiple high-density peaks scattered on the right, low-density dispersed area on the left. Asymmetric structure overall.

The video frame redundancy problem: Each drone's 2,399 images are extracted from consecutive frames of a flight simulation. When the interval between frames is small, adjacent frames are nearly identical — the drone has barely moved. The domain-specific lens captures how these redundant frames pack densely together in feature space. This is the concrete reality behind the diagnosis of low intra-class diversity.

Density Histograms — Comparing L2 vs L3 Lenses

These histograms visualize how "typical" each image is — i.e., its density value distribution. A tall, narrow peak means images are highly similar to one another (high redundancy); a broad spread indicates greater diversity. The indigo region marks the outlier zone; the teal region marks the long-tail (video redundancy) zone.

L2 density histogram — general 1,280-dim lens

▲ L2 density histogram — general 1,280-dim lens. Narrow, low density range (0.1–0.7), reflecting multimodal structure.

L3 density histogram — drone-specific 53-dim lens

▲ L3 density histogram — drone-specific 53-dim lens. Bell-shaped peak at 1.5–1.8 with a right tail (2.5–4.0) — reflecting high-density repeated frames.

What the Two Lenses Capture Differently

L2 (General 1,280-dim)

Density range 0.1–0.7: narrow and low. Images scatter across multiple clusters driven by environmental and background differences. The multimodal structure is explained by background variation.

L3 (Drone-specific 53-dim)

Density range 0.7–4.0: much broader and higher. Bell-shaped peak at 1.5–1.8 with a right tail (above 2.5). Even through a drone-focused lens, high-density repetition zones persist — sharply exposing the structural problem of redundant frames.

Per-Class Density Distribution — Which Drones Are Most Canonical?

L3 per-class density box chart

▲ L3 per-class density box chart — further right means that class's images are more similar to one another (more canonical). DR04 and DR08 dominate at high density; DR06 and DR02 score lowest.

What the Density Rankings Tell Us

DR04 and DR08 have median densities of 2.2 and 2.0 respectively — by far the highest. These classes look nearly identical to each other through the drone-specific lens, meaning they have a disproportionately high share of redundant frames. By contrast, DR06 and DR02 sit at a median density of around 1.7 — the lowest — indicating relatively higher intra-class visual diversity. A well-constructed classification dataset would have the diversity level of DR06/DR02 distributed evenly across all classes.

Outlier Analysis — Canonical vs Anomalous

DataClinic uses density-based outlier detection to surface the dataset's "most canonical images" (high density) and "most anomalous images" (low density). Examining these samples makes the data quality issues immediately intuitive.

🟢 High Density — Canonical (Most Repeated) Samples

The most "average" images in the dataset — the ones with the most near-identical neighbours.

DR05 high-density sample
DR05 density 0.677
DR05 high-density sample
DR05 density 0.675
DR08 high-density sample
DR08 density 0.669
DR11 high-density sample
DR11 density 0.667

🔴 Low Density — Outlier (Most Rare) Samples

The most visually unique images in the dataset — cases that look markedly different from everything else.

DR02 low-density sample
DR02 density 0.103
DR02 low-density sample
DR02 density 0.106
DR12 low-density sample
DR12 density 0.109
DR06 low-density sample
DR06 density 0.111

High vs Low Density — What This Tells Us

The high-density samples from DR05, DR08, and DR11 being the most canonical means those classes are packed with near-identical images. The low-density outliers from DR02, DR06, and DR12, on the other hand, show distinctive backgrounds and angles — these images actually carry more meaningful diversity. A well-designed classification dataset should have far more images like the low-density samples, spread across all classes.

The 76-Point Paradox — Why Perfect Balance Still Scores Low

Every L1 check came back Good, and class balance is flawless. So why 76 (Fair)? The culprit is the Poor distribution ratings at both L2 and L3.

Class Balance
σ=0, perfectly balanced
No AI bias
Distributional Diversity
Multi-cluster persists (L2 & L3)
Extensive frame redundancy
⚠️
Effective Training Value
Remove duplicates from 28,800
and actual diversity is far smaller

"Balance is necessary; diversity is sufficient."

Even with perfect class balance, if the images within each class are too similar to one another, the model cannot learn effectively. DR05 may have 2,399 images, but if 2,000 of them are nearly identical frames from the same flight sequence, the effective training data is closer to 400 images. Showing a model the same drone shifted slightly 2,000 times teaches it nothing new.

The Real Risk for Classification Models

A classification model trained on data with heavy frame redundancy can exhibit high validation accuracy on the dataset but poor performance in real deployment — a classic overfitting pattern. In a Counter-UAS system, this is critical: the model may misclassify a DR05 drone it has never seen from a particular angle or under different lighting conditions, labelling it as a different type entirely.

Recommendation — Data Diet First

DataClinic recommends a single intervention for PBLS_Drone_classification: Data Diet. This is the exact opposite of #226's DataBulkup prescription — despite using the same underlying dataset, the classification framing calls for trimming rather than augmenting.

🥗 Data Diet — Removing Redundant Frames

1

Stride sampling within flight sequences

Sample every N frames from the flight sequence. At a stride of 10, 2,399 images compress to ~240 while preserving diversity.

2

Embedding-based deduplication

Use the L3 domain-specific lens embeddings to remove pairs above a cosine similarity threshold. Eliminates redundancy without losing information.

+

Background diversification (follow-on step)

After the Diet, generate additional images in night, adverse weather, mountainous, and maritime environments to counteract urban bias — simultaneously addressing the L2 multimodal distribution issue.

⚖️ #226 DataBulkup vs #227 DataDiet

Identical source data, opposite prescriptions — a striking result. For a detection model (#226), more diversity is needed (Bulkup). For a classification model (#227), removing redundancy comes first (Diet). The right data quality prescription depends entirely on which AI task the data is for.

📈 Expected Outcomes

Removing redundant frames increases intra-class diversity → L3 distribution improves → score expected to rise. More importantly, training efficiency improves (same performance with less data), and real-world generalization performance is expected to increase significantly.

Conclusion

PBLS_Drone_classification is a compelling demonstration of what synthetic data generation pipelines do well. Exactly 2,399 images per class across 12 drone types — a level of balance that is simply unachievable with real-world footage. 1920×1080 FHD, consistent RGB channels, zero missing values — all L1 checks pass with flying colours.

Yet DataClinic's L2 and L3 lenses uncovered the structural problem hiding beneath that surface. Under the general-purpose lens and the domain-specific lens alike, the data fractures into three clusters and refuses to coalesce. Insufficient intra-class diversity and repeated consecutive video frames — the paradox of data rich in count but poor in information.

Same data. Different lens. Different prescription.
DataBulkup for detection; DataDiet for classification. The value of data quality diagnosis starts with asking "what is this data for?"

View the full DataClinic Report #227 →