AI Identifies Threats in the Sky — Quality Insights on Defense Drone Synthetic Data

Executive Summary

This article is based on the analysis results of DataClinic Report #226. PBLS_Drone is a defense-specialized synthetic image dataset built in-house by Pebblous to optimize drone object recognition AI models. Comprising 28,801 images and 52GB of single-class drone data, it achieved a DataClinic overall score of 87 (Good). A significant improvement over the previously diagnosed PBLS_Military (68), this time Data Bulk-up instead of Data Diet is recommended.

87

DataClinic Overall Score

28,801

Total Images

52GB

Dataset Size

1920×1080

Resolution (FHD)

PBLS_Drone vs PBLS_Military — Two Defense Datasets Compared

87

PBLS_Drone

Single class · 28,801 images · FHD

DataBulkup Recommended

68

PBLS_Military

10 classes · 3,171 images · HD

DataDiet Recommended

Key difference: PBLS_Drone has no class balance issues as a single-class dataset, and its high image diversity earned it a higher score.

DataClinic Grade Summary

L1 Integrity Good

L1 Missing Good

L1 Class Balance N/A

L1 Statistics Good

L2 DataLens N/A

L2 Geometry Fair

L2 Distribution Good

L3 DataLens N/A

L3 Geometry Fair

L3 Distribution Good

Why Drone AI? — Threats in the Sky and Defense

The 2022 Ukraine war proved to the world that drones have become a game changer on the modern battlefield. From low-cost commercial drones to precision-strike loitering munitions, drones have infiltrated virtually every operational domain including reconnaissance, strike, supply, and electronic warfare.

Counter-UAS technologies are evolving rapidly in response. Among various countermeasures such as radar, jamming, lasers, and interceptor drones, AI-based drone detection & classification serves as the critical "eyes" of every defense system. For AI to recognize drones quickly and accurately, rich and diverse training data is essential.

🎯

Counter-UAS Core Tech

AI that detects, classifies, and tracks enemy drones — the brain of defense systems

📷

Limits of Real Photography

Capturing diverse drones at various altitudes, distances, and lighting is prohibitively expensive

🏭

Synthetic Data Solution

CG generates infinite drone scenarios, angles, environments, and distances — AI training without real footage

Drone Simulation Blueprint in the Filename

Each PBLS_Drone filename systematically encodes which drone model was captured at which frame.

DR08_0912.png

DR01~DR12

Drone model number
12 different drone types

0001~2400+

Frame/sequence number
Flight path and angle order

12 drone types × ~2,400 frames = ~28,800 systematic synthetic drone images. Each drone is continuously captured along simulation flight paths at various angles and backgrounds.

Dataset Overview — PBLS_Drone

PBLS_Drone is a defense-specialized synthetic drone image dataset built in-house by Korean AI company Pebblous. Designed with environmental and transportation applications in mind, this dataset is optimized for image classification and image processing tasks. All images are in 1920×1080 (Full HD) high-resolution RGB format, designed with real-world deployment of drone object recognition models in mind.

PBLS_Drone dataset — drone representative image collage

PBLS_Drone — Synthetic drone image collage (DataClinic L1 analysis)

PBLS_Drone representative image — DR08 drone high-density typical sample

▲ PBLS_Drone high-density representative sample — DR08 drone model (density 0.677, highest in dataset at L2)

📊 Dataset Specifications

🖼️ 28,801 images (28,800 used for diagnosis)
📦 52GB (52,646MB)
📐 1920×1080px — Full HD, fixed size
🎨 RGB channels — consistent throughout
🏷️ Single class — "drone" (12 models within class)
📅 2026.03.12 diagnosis completed

🎯 Application Areas

🛡️ Counter-UAS AI model training
🔍 Drone detection & classification algorithm development
📡 Radar/EO-IR sensor fusion AI research
🌐 Environmental & transportation drone monitoring
🧪 Benchmark dataset for model evaluation

⚠️ Not Available for Commercial Use

The PBLS_Drone dataset was developed for defense-specific purposes and commercial use is not permitted. It is available only for non-commercial purposes such as research, education, and defense AI development.

12 Drone Gallery — What AI Needs to Recognize

While PBLS_Drone is a single class ("drone"), it internally contains 12 different drone models from DR01 to DR12. Each model differs in form (quadcopter, fixed-wing, hybrid), size, and purpose (reconnaissance, strike, loitering munition, swarm). This diversity is why 3 clusters form at Level 2 despite being a single class.

Single-class dataset strategy: For binary classification ("drone or not") or object detection (localizing drones) model training, single-class data is efficient. By grouping diverse drone forms into one class, it guides AI to learn the "essential characteristics of drones."

🔷 Multi-Rotor Reconnaissance & Surveillance Drones

Quadcopter and hexacopter reconnaissance drones. Specialized in low-altitude surveillance and battlefield intelligence gathering, their similarity to commercial drones creates high risk of AI misidentification.

DR01 — Small Recon Drone

Recon

Small multi-rotor reconnaissance drone. Utilized for covert reconnaissance thanks to its low RCS (Radar Cross Section) and quiet flight characteristics. Records medium density at DataClinic L3 with stable distribution.

DR02 — Medium Surveillance Drone

Surveillance

Medium multi-rotor surveillance drone. Equipped with optical and thermal cameras, specialized for persistent surveillance missions. DataClinic shows many low-density outliers, indicating unusual flight attitudes or special environment scenes.

DR03 — Tactical Recon Drone

Tactical

Tactical reconnaissance drone designed for rapid deployment and recovery, operated at platoon level. Forms the core of high-density clusters at L3, representing typical drone images.

🔴 Attack & Loitering Munition Drones

Loitering munitions are next-generation weapons that loiter above targets before striking. AI recognition of these weapon systems, like the Iranian Shahed-136 that caused massive damage in the Ukraine war, is a top priority for Counter-UAS systems.

DR04 — Attack Multi-Rotor

Attack

Attack drone capable of explosive delivery or direct strike. Hexacopter-based reinforced frame with high payload capacity. Ranks in the high-density top tier at L3 with stable characteristics.

DR05 — Loitering Munition

Loitering

A loitering munition that circles above targets waiting for strike opportunities. VTOL form combining small fixed-wing and multi-rotor, records highest density in DataClinic.

DR06 — Medium Attack Drone

Attack

Military drone with medium-range strike capability. Designed to balance range and payload, deployed for striking high-value targets behind enemy lines.

🔵 Fixed-Wing & VTOL Drones

DR07 — Small Fixed-Wing UAV

Fixed-Wing

Small fixed-wing UAV specialized for long-range reconnaissance. Glider-type wings provide high soaring efficiency and quiet operation. Its form differs greatly from multi-rotors, requiring special handling in AI recognition algorithms.

DR08 — Medium VTOL Drone

VTOL

High-performance drone combining vertical takeoff/landing (VTOL) and fixed-wing flight. Records the highest density across the entire dataset at both DataClinic L2 and L3 — recognized by AI as the most "typical drone."

DR09 — Tactical Fixed-Wing UAV

Fixed-Wing

Medium fixed-wing UAV capable of tactical reconnaissance and strike. Turbofan-powered for extended loiter time, equipped with autonomous flight path planning.

🟣 Swarm & Special-Purpose Drones

DR10 — Swarm Drone Unit

Swarm

Standardized small drone for swarm attack operations. Dozens to hundreds operate simultaneously to saturate air defense systems. Shows numerous low-density outliers in AI recognition.

DR11 — EW/Jamming Drone

EW

Electronic warfare (EW) and communications jamming specialized drone. Large antenna arrays and electronic equipment give it distinctive visual characteristics. Included in the high-density group at L3.

DR12 — Large Strategic Drone

Strategic

Large strategic-class UAV. A high-performance platform specialized for long-range strike and wide-area reconnaissance. Most frequently appearing model in DataClinic's low-density outlier list — diverse flight attitudes are classified as unusual samples.

🌿 Same Drones, Different Environment — Natural Background Comparison

The same 12 drone models shown above in an urban setting (buildings), now captured in a natural environment (mountains and meadows). Compare whether the morphological characteristics of each drone remain identifiable when the background changes — a key factor in whether AI relies on background context or the drone itself for recognition.

DR01

DR02

DR03

DR04

DR05

DR06

DR07

DR08

DR09

DR10

DR11

DR12

▲ Same 12 drone models — natural environment (mountains/meadows). Compare with the urban backgrounds above.

Level 1 — Basic Quality Diagnosis

Overall Mean Image — The "Typical Drone" Through AI's Eyes

This is the pixel-level average of 28,800 drone images. The blurriness is normal — it shows the overlapping common contours of 12 different drone types. If the basic drone form is identifiable in the mean image, that is evidence of visual consistency in the dataset.

▲ PBLS_Drone overall mean image — pixel average of 28,800 images (DataClinic L1)

✅ L1 Strengths

📐 Perfect resolution consistency: All 1920×1080px fixed — no padding or resizing needed
🎨 RGB channel consistency: No grayscale or RGBA contamination
❌ Zero missing values: No corrupted or empty images
📊 L1 Statistics: Good — rich dataset with diverse structure and texture

📌 L1 Notable Points

🏷️ Single class: Class balance metric N/A
🔄 Multiple drone types: Rich visual diversity across 12 models
🌐 Mixed natural/urban backgrounds: Clear environment clusters form at L2

💡 Difference from PBLS_Military — L1 Statistics Grade Reversal: PBLS_Military scored L1 Statistics Bad, but PBLS_Drone scores Good. Military data was dominated by specific environment (en3) and background (bg5) combinations, while Drone data has 12 drone models × diverse background, altitude, and lighting combinations for much richer visual diversity. This is the core reason behind the 87 score.

Level 2 — DataLens Analysis (Wolfram ImageIdentify Net V2)

Level 2 uses Wolfram's ImageIdentify Net V2, trained on 3 million images, as a lens. We examine how single-class PBLS_Drone data distributes in the 1,280-dimensional feature space, and why even a general-purpose AI discovers 3 clusters.

▲ Level 2 PCA distribution — single-class drone data in 1,280-dimensional feature space (Wolfram ImageIdentify Net V2)

▲ Level 2 density topography — 3 clusters separated by natural vs urban environments

3 Drone Groups Found by General AI — "The Background Is Different"

1

Natural Environment Cluster

Drone images captured against natural backgrounds — forests, mountains, plains. Green and brown tones. Estimated ~40% of the dataset.

2

Urban Environment Cluster

Drone images with buildings, roads, and urban backgrounds. Linear structures and gray tones. Estimated ~35% of the dataset.

3

Mixed/Transition Cluster

Images at natural-urban boundaries or with sky/low-altitude backgrounds. Mixed environments or high-altitude shots with sky backgrounds.

💡 L2 Key Finding — "Background" Determines Clusters, Not Drone Shape: Since this is a single class, Wolfram's general AI forms clusters based on background environment (natural vs urban) rather than differences in the drones themselves. This provides important implications for actual drone recognition AI development — there is a risk that the model learns to find "objects not in the background" rather than recognizing the drones themselves. The L3 specialized lens enables more meaningful analysis.

L2 Key Metrics

1,280

Observed Dimensions

0.3

Mean Density

6.8%

Outlier Ratio

3

Cluster Count

Level 3 — Drone-Specialized DataLens (788 Dimensions)

Level 3 applies a specialized lens optimized to 788 dimensions based on a 265-layer, 40MB model. Through dimension optimization that preserves class discriminability, it captures drone morphological characteristics more precisely than the general lens. Mean density rose from L2's 0.3 to 0.41, and outliers are stable at 6%.

▲ Level 3 PCA distribution — drone data in 788-dimensional specialized lens

▲ Level 3 density topography — complex clusters and multimodal distribution

Complex Structure Revealed at L3 — "Shape Starts to Matter"

With the specialized lens applied, drone morphological characteristics begin to influence the distribution beyond simple background classification. DataClinic confirmed that Cluster 1 is a complex type mixing buildings and natural landscapes, containing multiple peaks (sub-clusters). This means the 12 drone types, captured at various flight attitudes and environments, have formed fine-grained sub-groups.

L2 (General Lens)

· 1,280 dims → Mean density 0.3
· Background-based clusters
· "Natural vs Urban" binary structure
· Outliers 6.8%

L3 (Specialized Lens)

· 788 dims → Mean density 0.41 (+37%)
· Shape + environment hybrid clusters
· Complex multimodal distribution (diversity confirmed)
· Outliers 6% (slightly decreased)

💡 L3 Insight — What the Density Increase Tells Us: The rise in mean density from 0.3 to 0.41 at L3 means the specialized lens captures the data's core structure better. This is evidence that shape-specialized features are more important than general features for drone training data. When developing actual Counter-UAS AI models, choosing a drone-specialized backbone network is likely more advantageous than a general classification network.

Outlier Analysis — Scenes AI Is Most and Least Confident About

Outlier analysis in a single-class dataset carries special significance. The visual differences between "typical drones" (high density) and "atypical drones" (low density) serve as a map of AI recognition performance strengths and weaknesses.

🟢 High Density — "Typical Drone" Scenes AI Is Most Confident About (L3)

The DR08 model dominates the high-density top ranks. DR05, DR04, and DR03 also rank high, and these drone models and flight attitudes define the dataset's "standard."

DR08 (density 0.858) 🔥

DR05 (density 0.857)

DR04 (density 0.850)

DR03 (density 0.849)

DR08 (density 0.849)

DR11 (density 0.848)

💡 Insight — Why DR08 Is the "Most Drone-Like": DR08 (VTOL type) recording the highest density at both L2 and L3 is because this drone's morphological characteristics (rotary wings + fixed-wing combination) are closest to the "universal drone characteristics" among the 12 types. Conversely, DR12 (large strategic drone) frequently appears in low-density outliers due to its unusual size and flight attitudes. The more unique the shape, the harder it is for AI.

🔴 Low Density — Outlier Drone Scenes That Confuse AI (L3)

DR02 and DR09 dominate the low-density top ranks. Unusual flight angles, extreme lighting, and special background combinations are the causes of these outliers.

DR02 (density 0.151) 🔴

DR09 (density 0.158)

DR02 (density 0.159)

DR06 (density 0.165)

DR12 (density 0.170)

DR11 (density 0.167)

🔄 The Two Most Different Scenes — Extremes of the Dataset

The pivot (reference point) is DR08_0910 (highest density 0.858). The scenes farthest from this image are DR12 series images. This shows what the most "un-drone-like" drone scenes look like within a single class.

DR08 — Reference Image (Pivot)

Density 0.858 (Highest)

DR12 — Farthest Image

Density 0.245 (Extreme)

⬆️ These two scenes are the farthest apart in L3 feature space. This is the range where drone recognition AI may struggle the most.

🔗 Most Similar Scenes — Nearest Cluster Around DR08

The images closest to DR08_0910 are all from the "0910 range" frames of DR04, DR05, and DR03. This means different drone models adopt similar poses at the same simulation time step (frame 0910).

DR04_0910 (density 0.850)

DR03_0910 (density 0.838)

DR05_0910 (density 0.843)

DR01_0910 (density 0.818)

💡 Insight — "Dominance of Frame 0910 Range": The most similar scenes are all concentrated around frame 0910. This suggests the simulation was designed so drones appear most "typical" at a specific flight phase (angle and distance). In contrast, low-density outliers concentrate in special frame ranges such as 0048~0296, 0890~0896, 1495~1497, 2250, and 2381. These ranges are estimated to be simulation segments featuring unusual flight altitudes, angles, and backgrounds.

Recommendations — From 87 to Even Higher, Why DataBulkup

Why DataClinic recommends Data Bulk-up for PBLS_Drone: The current data has no significant duplication or bias and doesn't need a diet. Rather, samples at cluster boundaries and in low-density regions are insufficient, and adding synthetic data to fill these gaps will contribute to performance improvement.

💪

Data Bulk-up

DataClinic's primary recommendation. While 28,801 images appear sufficient for training, samples in low-density zones (special flight attitudes, extreme lighting, cross-environment) are relatively scarce.

Adding samples especially at cluster boundary zones (natural-urban transitions, altitude changes) can significantly improve the AI model's edge case recognition performance.

🌐

Expanded Environment Diversity

Current data centers on natural and urban environments. Real-world drone recognition AI must also work in extreme conditions such as nighttime, fog, backlighting, rain/snow, desert, and maritime.

Domain Randomization techniques are recommended to generate augmented images with randomly varied lighting direction, fog density, and background textures.

Contrasting Recommendations for Two Defense Datasets

PBLS_Military

68 / 3,171 images

🥗

DataDiet

Remove duplicate images
Fix environment bias

PBLS_Drone

87 / 28,801 images

💪

DataBulkup

Fill low-density zones
Add extreme environments

Quality data needs more data; biased data needs a diet.

⚠️ Low-Density Frame Ranges — Simulation Design Review Recommended

Low-density outliers concentrate in frame ranges 0048~0296 and 0890~0896. These ranges are estimated to be scenes where drones are in special flight attitudes (sharp turns, high-speed climbs, close-range capture, etc.). If these outliers represent meaningful cases in real battlefield scenarios, they should be reinforced instead. If they are mere rendering artifacts, quality review and replacement are recommended.

Conclusion — The Possibilities Opened by an 87-Score Drone Dataset

PBLS_Drone is one of the largest publicly available synthetic datasets in the defense AI drone recognition field. 28,801 images, 52GB, Full HD resolution — the numbers alone are impressive, but a DataClinic score of 87 proves this data is not just large but qualitatively excellent.

Unlike PBLS_Military which scored 68 and needed a "data diet," PBLS_Drone already has good data diversity and balance. It is at the stage where richer data can push AI model performance even higher.

Now that drones have become a core asset on the modern battlefield, AI that recognizes drones is the first line of defense. PBLS_Drone provides the foundation for training that AI. With data bulk-up to fill low-density zones and extreme environment data expansion, we will be one step closer to developing drone recognition models ready for real-world deployment.

PBLS_Drone Key Summary Card

87

DataClinic Overall

28,801

Synthetic Drone Images

52GB

Dataset Size

FHD

1920×1080px

12

Drone Model Types

3

Environment Clusters

6%

Outlier Ratio (L3)

Original DataClinic Report: dataclinic.ai/en/report/226 · Not for commercial use