2026.03 · Pebblous Data Communication Team

Reading time: ~15 min · 한국어

Executive Summary

This article is based on the analysis results of DataClinic Report #226. PBLS_Drone is a defense-specialized synthetic image dataset built in-house by Pebblous to optimize drone object recognition AI models. Comprising 28,801 images and 52GB of single-class drone data, it achieved a DataClinic overall score of 87 (Good). A significant improvement over the previously diagnosed PBLS_Military (68), this time Data Bulk-up instead of Data Diet is recommended.

87
DataClinic Overall Score
28,801
Total Images
52GB
Dataset Size
1920×1080
Resolution (FHD)

PBLS_Drone vs PBLS_Military — Two Defense Datasets Compared

87
PBLS_Drone
Single class · 28,801 images · FHD
DataBulkup Recommended
68
PBLS_Military
10 classes · 3,171 images · HD
DataDiet Recommended

Key difference: PBLS_Drone has no class balance issues as a single-class dataset, and its high image diversity earned it a higher score.

DataClinic Grade Summary

L1 Integrity Good
L1 Missing Good
L1 Class Balance N/A
L1 Statistics Good
L2 DataLens N/A
L2 Geometry Fair
L2 Distribution Good
L3 DataLens N/A
L3 Geometry Fair
L3 Distribution Good

Why Drone AI? — Threats in the Sky and Defense

The 2022 Ukraine war proved to the world that drones have become a game changer on the modern battlefield. From low-cost commercial drones to precision-strike loitering munitions, drones have infiltrated virtually every operational domain including reconnaissance, strike, supply, and electronic warfare.

Counter-UAS technologies are evolving rapidly in response. Among various countermeasures such as radar, jamming, lasers, and interceptor drones, AI-based drone detection & classification serves as the critical "eyes" of every defense system. For AI to recognize drones quickly and accurately, rich and diverse training data is essential.

🎯

Counter-UAS Core Tech

AI that detects, classifies, and tracks enemy drones — the brain of defense systems

📷

Limits of Real Photography

Capturing diverse drones at various altitudes, distances, and lighting is prohibitively expensive

🏭

Synthetic Data Solution

CG generates infinite drone scenarios, angles, environments, and distances — AI training without real footage

Drone Simulation Blueprint in the Filename

Each PBLS_Drone filename systematically encodes which drone model was captured at which frame.

DR08_0912.png
DR01~DR12
Drone model number
12 different drone types
0001~2400+
Frame/sequence number
Flight path and angle order

12 drone types × ~2,400 frames = ~28,800 systematic synthetic drone images. Each drone is continuously captured along simulation flight paths at various angles and backgrounds.

Dataset Overview — PBLS_Drone

PBLS_Drone is a defense-specialized synthetic drone image dataset built in-house by Korean AI company Pebblous. Designed with environmental and transportation applications in mind, this dataset is optimized for image classification and image processing tasks. All images are in 1920×1080 (Full HD) high-resolution RGB format, designed with real-world deployment of drone object recognition models in mind.

PBLS_Drone dataset — drone representative image collage

PBLS_Drone — Synthetic drone image collage (DataClinic L1 analysis)

PBLS_Drone representative image — DR08 drone high-density typical sample

▲ PBLS_Drone high-density representative sample — DR08 drone model (density 0.677, highest in dataset at L2)

📊 Dataset Specifications

  • 🖼️ 28,801 images (28,800 used for diagnosis)
  • 📦 52GB (52,646MB)
  • 📐 1920×1080px — Full HD, fixed size
  • 🎨 RGB channels — consistent throughout
  • 🏷️ Single class — "drone" (12 models within class)
  • 📅 2026.03.12 diagnosis completed

🎯 Application Areas

  • 🛡️ Counter-UAS AI model training
  • 🔍 Drone detection & classification algorithm development
  • 📡 Radar/EO-IR sensor fusion AI research
  • 🌐 Environmental & transportation drone monitoring
  • 🧪 Benchmark dataset for model evaluation

⚠️ Not Available for Commercial Use

The PBLS_Drone dataset was developed for defense-specific purposes and commercial use is not permitted. It is available only for non-commercial purposes such as research, education, and defense AI development.

Level 1 — Basic Quality Diagnosis

Overall Mean Image — The "Typical Drone" Through AI's Eyes

This is the pixel-level average of 28,800 drone images. The blurriness is normal — it shows the overlapping common contours of 12 different drone types. If the basic drone form is identifiable in the mean image, that is evidence of visual consistency in the dataset.

PBLS_Drone overall mean image — pixel average of 28,800 images

▲ PBLS_Drone overall mean image — pixel average of 28,800 images (DataClinic L1)

✅ L1 Strengths

  • 📐 Perfect resolution consistency: All 1920×1080px fixed — no padding or resizing needed
  • 🎨 RGB channel consistency: No grayscale or RGBA contamination
  • Zero missing values: No corrupted or empty images
  • 📊 L1 Statistics: Good — rich dataset with diverse structure and texture

📌 L1 Notable Points

  • 🏷️ Single class: Class balance metric N/A
  • 🔄 Multiple drone types: Rich visual diversity across 12 models
  • 🌐 Mixed natural/urban backgrounds: Clear environment clusters form at L2
💡 Difference from PBLS_Military — L1 Statistics Grade Reversal: PBLS_Military scored L1 Statistics Bad, but PBLS_Drone scores Good. Military data was dominated by specific environment (en3) and background (bg5) combinations, while Drone data has 12 drone models × diverse background, altitude, and lighting combinations for much richer visual diversity. This is the core reason behind the 87 score.

Level 2 — DataLens Analysis (Wolfram ImageIdentify Net V2)

Level 2 uses Wolfram's ImageIdentify Net V2, trained on 3 million images, as a lens. We examine how single-class PBLS_Drone data distributes in the 1,280-dimensional feature space, and why even a general-purpose AI discovers 3 clusters.

PBLS_Drone L2 PCA overall distribution

▲ Level 2 PCA distribution — single-class drone data in 1,280-dimensional feature space (Wolfram ImageIdentify Net V2)

PBLS_Drone L2 density topography

▲ Level 2 density topography — 3 clusters separated by natural vs urban environments

3 Drone Groups Found by General AI — "The Background Is Different"

1

Natural Environment Cluster

Drone images captured against natural backgrounds — forests, mountains, plains. Green and brown tones. Estimated ~40% of the dataset.

2

Urban Environment Cluster

Drone images with buildings, roads, and urban backgrounds. Linear structures and gray tones. Estimated ~35% of the dataset.

3

Mixed/Transition Cluster

Images at natural-urban boundaries or with sky/low-altitude backgrounds. Mixed environments or high-altitude shots with sky backgrounds.

💡 L2 Key Finding — "Background" Determines Clusters, Not Drone Shape: Since this is a single class, Wolfram's general AI forms clusters based on background environment (natural vs urban) rather than differences in the drones themselves. This provides important implications for actual drone recognition AI development — there is a risk that the model learns to find "objects not in the background" rather than recognizing the drones themselves. The L3 specialized lens enables more meaningful analysis.

L2 Key Metrics

1,280
Observed Dimensions
0.3
Mean Density
6.8%
Outlier Ratio
3
Cluster Count

Level 3 — Drone-Specialized DataLens (788 Dimensions)

Level 3 applies a specialized lens optimized to 788 dimensions based on a 265-layer, 40MB model. Through dimension optimization that preserves class discriminability, it captures drone morphological characteristics more precisely than the general lens. Mean density rose from L2's 0.3 to 0.41, and outliers are stable at 6%.

PBLS_Drone L3 PCA overall distribution

▲ Level 3 PCA distribution — drone data in 788-dimensional specialized lens

PBLS_Drone L3 density topography

▲ Level 3 density topography — complex clusters and multimodal distribution

Complex Structure Revealed at L3 — "Shape Starts to Matter"

With the specialized lens applied, drone morphological characteristics begin to influence the distribution beyond simple background classification. DataClinic confirmed that Cluster 1 is a complex type mixing buildings and natural landscapes, containing multiple peaks (sub-clusters). This means the 12 drone types, captured at various flight attitudes and environments, have formed fine-grained sub-groups.

L2 (General Lens)

  • · 1,280 dims → Mean density 0.3
  • · Background-based clusters
  • · "Natural vs Urban" binary structure
  • · Outliers 6.8%

L3 (Specialized Lens)

  • · 788 dims → Mean density 0.41 (+37%)
  • · Shape + environment hybrid clusters
  • · Complex multimodal distribution (diversity confirmed)
  • · Outliers 6% (slightly decreased)
💡 L3 Insight — What the Density Increase Tells Us: The rise in mean density from 0.3 to 0.41 at L3 means the specialized lens captures the data's core structure better. This is evidence that shape-specialized features are more important than general features for drone training data. When developing actual Counter-UAS AI models, choosing a drone-specialized backbone network is likely more advantageous than a general classification network.

Outlier Analysis — Scenes AI Is Most and Least Confident About

Outlier analysis in a single-class dataset carries special significance. The visual differences between "typical drones" (high density) and "atypical drones" (low density) serve as a map of AI recognition performance strengths and weaknesses.

🟢 High Density — "Typical Drone" Scenes AI Is Most Confident About (L3)

The DR08 model dominates the high-density top ranks. DR05, DR04, and DR03 also rank high, and these drone models and flight attitudes define the dataset's "standard."

DR08 high-density sample 1
DR08 (density 0.858) 🔥
DR05 high-density sample
DR05 (density 0.857)
DR04 high-density sample
DR04 (density 0.850)
DR03 high-density sample
DR03 (density 0.849)
DR08 high-density sample 2
DR08 (density 0.849)
DR11 high-density sample
DR11 (density 0.848)
💡 Insight — Why DR08 Is the "Most Drone-Like": DR08 (VTOL type) recording the highest density at both L2 and L3 is because this drone's morphological characteristics (rotary wings + fixed-wing combination) are closest to the "universal drone characteristics" among the 12 types. Conversely, DR12 (large strategic drone) frequently appears in low-density outliers due to its unusual size and flight attitudes. The more unique the shape, the harder it is for AI.

🔴 Low Density — Outlier Drone Scenes That Confuse AI (L3)

DR02 and DR09 dominate the low-density top ranks. Unusual flight angles, extreme lighting, and special background combinations are the causes of these outliers.

DR02 low-density sample 1
DR02 (density 0.151) 🔴
DR09 low-density sample
DR09 (density 0.158)
DR02 low-density sample 2
DR02 (density 0.159)
DR06 low-density sample
DR06 (density 0.165)
DR12 low-density sample
DR12 (density 0.170)
DR11 low-density sample
DR11 (density 0.167)

🔄 The Two Most Different Scenes — Extremes of the Dataset

The pivot (reference point) is DR08_0910 (highest density 0.858). The scenes farthest from this image are DR12 series images. This shows what the most "un-drone-like" drone scenes look like within a single class.

DR08 pivot — highest density
DR08 — Reference Image (Pivot)
Density 0.858 (Highest)
DR12 — farthest image
DR12 — Farthest Image
Density 0.245 (Extreme)

⬆️ These two scenes are the farthest apart in L3 feature space. This is the range where drone recognition AI may struggle the most.

🔗 Most Similar Scenes — Nearest Cluster Around DR08

The images closest to DR08_0910 are all from the "0910 range" frames of DR04, DR05, and DR03. This means different drone models adopt similar poses at the same simulation time step (frame 0910).

DR04_0910 similar sample
DR04_0910 (density 0.850)
DR03_0910 similar sample
DR03_0910 (density 0.838)
DR05_0910 similar sample
DR05_0910 (density 0.843)
DR01_0910 similar sample
DR01_0910 (density 0.818)
💡 Insight — "Dominance of Frame 0910 Range": The most similar scenes are all concentrated around frame 0910. This suggests the simulation was designed so drones appear most "typical" at a specific flight phase (angle and distance). In contrast, low-density outliers concentrate in special frame ranges such as 0048~0296, 0890~0896, 1495~1497, 2250, and 2381. These ranges are estimated to be simulation segments featuring unusual flight altitudes, angles, and backgrounds.

Recommendations — From 87 to Even Higher, Why DataBulkup

Why DataClinic recommends Data Bulk-up for PBLS_Drone: The current data has no significant duplication or bias and doesn't need a diet. Rather, samples at cluster boundaries and in low-density regions are insufficient, and adding synthetic data to fill these gaps will contribute to performance improvement.

💪

Data Bulk-up

DataClinic's primary recommendation. While 28,801 images appear sufficient for training, samples in low-density zones (special flight attitudes, extreme lighting, cross-environment) are relatively scarce.

Adding samples especially at cluster boundary zones (natural-urban transitions, altitude changes) can significantly improve the AI model's edge case recognition performance.

🌐

Expanded Environment Diversity

Current data centers on natural and urban environments. Real-world drone recognition AI must also work in extreme conditions such as nighttime, fog, backlighting, rain/snow, desert, and maritime.

Domain Randomization techniques are recommended to generate augmented images with randomly varied lighting direction, fog density, and background textures.

Contrasting Recommendations for Two Defense Datasets

PBLS_Military
68 / 3,171 images
🥗
DataDiet
Remove duplicate images
Fix environment bias
PBLS_Drone
87 / 28,801 images
💪
DataBulkup
Fill low-density zones
Add extreme environments

Quality data needs more data; biased data needs a diet.

⚠️ Low-Density Frame Ranges — Simulation Design Review Recommended

Low-density outliers concentrate in frame ranges 0048~0296 and 0890~0896. These ranges are estimated to be scenes where drones are in special flight attitudes (sharp turns, high-speed climbs, close-range capture, etc.). If these outliers represent meaningful cases in real battlefield scenarios, they should be reinforced instead. If they are mere rendering artifacts, quality review and replacement are recommended.

Conclusion — The Possibilities Opened by an 87-Score Drone Dataset

PBLS_Drone is one of the largest publicly available synthetic datasets in the defense AI drone recognition field. 28,801 images, 52GB, Full HD resolution — the numbers alone are impressive, but a DataClinic score of 87 proves this data is not just large but qualitatively excellent.

Unlike PBLS_Military which scored 68 and needed a "data diet," PBLS_Drone already has good data diversity and balance. It is at the stage where richer data can push AI model performance even higher.

Now that drones have become a core asset on the modern battlefield, AI that recognizes drones is the first line of defense. PBLS_Drone provides the foundation for training that AI. With data bulk-up to fill low-density zones and extreme environment data expansion, we will be one step closer to developing drone recognition models ready for real-world deployment.

PBLS_Drone Key Summary Card

87
DataClinic Overall
28,801
Synthetic Drone Images
52GB
Dataset Size
FHD
1920×1080px
12
Drone Model Types
3
Environment Clusters
6%
Outlier Ratio (L3)

Original DataClinic Report: dataclinic.ai/en/report/226 · Not for commercial use