AI Learns Without Live Fire — DataClinic Diagnosis of 10 Korean Military Synthetic Datasets

Why Synthetic Data? — Building the Battlefield with Data

Imagine photographing an actual K-2 Black Panther tank in deserts, snowfields, and urban ruins under different lighting and weather conditions. Deploying over a dozen tanks worth billions of dollars to stage hundreds of scenes is practically impossible. That is why defense AI researchers choose Synthetic Data.

Synthetic data consists of images generated via computer graphics, enabling the creation of infinite combinations of training data without actual photography. PBLS_Military is a dataset that puts this concept into practice.

🎮

Infinite Environment Combos

Combat scenarios impossible to film in real life, built with CG

⚖️

Perfect Class Balance

Exactly 216 images for all 10 classes — bias-free training data

🔒

No Security Concerns

No need to photograph real military facilities, zero classified info exposure

The Battlefield Blueprint Hidden in Synthetic Data Filenames

PBLS_Military filenames are not simple numbers. They encode the exact battlefield conditions under which each image was rendered.

sn3_en4_bg9_mt01.png

sn

Scenario (1~4)
Camera angle & situation

en

Environment (1~6)
Lighting & weather

bg

Background (1~9)
Terrain & backdrop type

mt

Model Type (01~10)
Weapon system type

📐 Theoretical maximum combinations: 4 × 6 × 9 × 10 = 2,160 scenes — each class selectively includes 216 of these.

Dataset Introduction — PBLS_Military

PBLS_Military is a synthetic military image dataset built by Pebblous, a Korean defense industry company. It includes 10 classes of military equipment — Korea's flagship defense exports and enemy vehicles essential for training. Set against winter and autumn natural environments, it comprises 3,171 HD widescreen images (up to 1,344×768px).

PBLS_Military Dataset — Collage of 10 Military Equipment Representative Images

PBLS_Military — Collage of 10 Military Equipment Representative Images (DataClinic L1 Analysis)

PBLS_Military Representative Image — K-9 Thunder Self-Propelled Howitzer Synthetic Image

▲ PBLS_Military Representative Image — K-9 Thunder Self-Propelled Howitzer (High-density typical sample, density 0.248)

⚠️ Not Available for Commercial Use

The PBLS_Military dataset contains military equipment images and is not licensed for commercial use. It may only be used for non-commercial purposes such as research, education, and defense AI development.

10 Weapon Systems Gallery — The Battlefield Players

The true value of this dataset lies in the weapon systems it contains. From cutting-edge Korean defense exports sold worldwide to virtual enemy vehicles essential for AI target recognition training, it captures the key players of the modern battlefield.

🇰🇷 Pride of Korean Defense Exports

K-2 Black Panther

Korean MBT

Korea's 3rd-generation main battle tank, operational since 2014. Equipped with a 1,500hp powerpack and Active Protection System (APS), reaching speeds up to 55km/h. In 2022, a 980-unit export contract (~$4.6B) was signed with Poland, making it an icon of K-Defense exports.

🌍 Exports: Poland · Norway (under review)

K-9 Thunder Self-Propelled Howitzer Synthetic Image

K-9 Thunder (SPH)

World Export #1

A 155mm/52-caliber self-propelled howitzer holding #1 global market share (over 50%) in SPH exports. Maximum range of 54km (with base-bleed rounds). Appears as the most typical (highest density) military target in this dataset.

🌍 Exports: Poland, India, Norway, Finland, Estonia, Australia, Egypt, Turkey, etc.

K200 KIFV

Korean Infantry Fighting Vehicle

Korea's infantry fighting vehicle operational since 1985. With a 400hp engine, it transports up to 9 troops and has proven its reliability through over 37 years of active deployment. It has evolved into various derivatives.

🌍 Exports: Malaysia, Ecuador, etc.

K806 Wheeled Armored Vehicle

Latest 8×8 APC

The latest 8×8 wheeled armored vehicle deployed by the Korean military since 2022. It emphasizes a balance between mobility and protection, equipped with NBC (Nuclear, Biological, Chemical) defense capabilities. Surprisingly, it frequently appears as an outlier in AI data — due to its unique shape.

🇰🇷 Currently in service with the Korean Armed Forces

🎯 Enemy Vehicles for IFF Training

An effective defense AI must accurately identify enemy equipment just as well as friendly equipment. The BMP-3 and T-80U are major Russian ground combat vehicles, essential data for IFF (Identification Friend or Foe) AI training.

BMP-3 Infantry Fighting Vehicle

Russian IFV

Russia's 3rd-generation infantry fighting vehicle. It boasts formidable firepower with a 100mm main gun + 30mm autocannon + 7.62mm coaxial machine gun. Capable of amphibious operations, it was deployed extensively in the 2022 Ukraine conflict. It is the key "enemy" combat vehicle in IFF AI datasets.

T-80U Main Battle Tank

Russian MBT

A gas-turbine-powered tank developed by the Soviet Union in the 1980s. Its 1,000hp engine reaches top speeds of 70km/h. As the K-2 Black Panther's most important hypothetical adversary, training AI to accurately distinguish between the K-2 and T-80U is a core challenge for IFF AI systems.

🚁 Air Power & Support Vehicles

Cobra Armored Vehicle (cobra2)

Armored Vehicle

A Cobra series wheeled armored vehicle. A light-armored platform deployed for reconnaissance and patrol missions, combining mobility with survivability.

Military Jeep

Tactical Vehicle

A core tactical vehicle for frontline command, reconnaissance, and liaison missions. Light and highly maneuverable, it operates across all terrains. In AI data, it forms a "light support vehicle" cluster together with trucks.

Military Truck

Supply Vehicle

The lifeline of supply and troop transport. The ability to identify trucks on the battlefield is essential for disrupting enemy supply lines and protecting friendly logistics routes. AI identifies trucks by their distinctive cargo bed shape.

Level 1 — Basic Quality Diagnosis

Mean Images per Class — Each Weapon's "Archetype" Through AI's Eyes

A mean image is the pixel-level average of all images in a given class. It is normal for them to appear blurry — they show the common contours of overlapping images. The sharper the mean image, the more visually similar the images in that class are.

Actual

Mean

K-9 Howitzer

Actual

Mean

K-2 Black Panther

Actual

Mean

T-80U Tank

Actual

Mean

K200 APC

Actual

Mean

BMP-3 IFV

Actual

Mean

M113 APC

Actual

Mean

Cobra2 APC

Actual

Mean

K806 APC

Actual

Mean

Military Jeep

Actual

Mean

Military Truck

▲ Each card — Left: representative sample (actual image) / Right: mean image (pixel average)

✅ Strengths

📐 Perfect Class Balance: Std. deviation 0.0 — exactly 216 images for all 10 classes
🎨 RGB Channel Consistency: All images in RGB format, no grayscale or RGBA contamination
❌ Zero Missing Values: No corrupted files, no empty images
🖼️ HD Resolution: 1,338~1,344 × 768px widescreen rendering

⚠️ Cautions

📊 L1 Statistics: Bad — Lack of visual diversity
🔄 Similar Compositions: Most images share similar framing
📁 Synthetic Data Limitations: Texture and lighting realism gap vs. real-world images

💡 DataClinic Insight: Synthetic data excels over real-world data in terms of class balance and zero missing values. However, visual diversity (L1 Statistics: Bad) is a chronic weakness of synthetic data — because all images share a similar rendering style. When using this for actual AI training, we recommend supplementing with Domain Randomization techniques or mixing with real-world data.

Level 2 — DataLens Analysis (Wolfram ImageIdentify Net V2)

Level 2 uses Wolfram's ImageIdentify Net V2, trained on 3 million images, as a lens. Although this neural network was not specifically trained on military equipment, it analyzes the data through general visual patterns (shape, texture, color). Let's examine how PBLS_Military data is distributed in a 1,280-dimensional feature space.

PBLS_Military L2 PCA Overall Distribution

▲ Level 2 PCA Distribution — Feature space distribution of 10 classes (Wolfram ImageIdentify Net V2)

▲ Level 2 Density Landscape — Cluster distribution of all data (single cluster)

💡 L2 Key Finding — To a general-purpose AI, everything looks "similar": Wolfram's general-purpose neural network perceives all military equipment as a single cluster. Whether it's a K-2 tank or a Cobra armored vehicle, the general AI sees them all as "military equipment on a yellow-green background." The low density and multimodal distribution arise because 10 distinct equipment types are forcibly grouped together under a general-purpose lens. This is precisely why a military-specialized domain lens (Level 3) is needed.

Density Plots per Class — Distribution Patterns of Each Equipment

Density

Actual

K-2 Black Panther

Density

Actual

K200 APC

Density

Actual

K806 APC

Density

Actual

BMP-3 IFV

Density

Actual

Cobra2 APC

Density

Actual

Military Jeep

▲ Each card — Left: density plot (L2) / Right: representative sample (actual image)

🔬 K806 Through Pebbloscope — Clusters Created by Seasons

While DataClinic's density charts show what is different, Pebbloscope shows why. Below is a snapshot visualizing the L2 embedding of the K806 wheeled APC. Even though it's the same vehicle, distinct clusters form based on background season.

Pebbloscope — K806 wheeled APC L2 embedding. Clusters separate by background season (summer/winter/fall)

The general-purpose AI (L2) is more sensitive to background color than vehicle shape. Summer images with green meadows, fall images with bare trees, and winter images with snow each form separate clusters. This is a limitation of the general lens — when the background changes, the same equipment is recognized as a "different object."

▲ Explore this Pebbloscope snapshot directly → Click each node to view the actual image and leave comments for your team.

Level 3 — Military-Specialized DataLens (79 Dimensions)

Level 3 applies domain-specific optimization. The feature space compressed to 79 dimensions is tuned to maximize discriminability between military equipment. Unlike the general-purpose lens, 3 clusters emerge — naturally grouped by the shape, size, and functional characteristics of the equipment.

PBLS_Military L3 PCA Overall Distribution

▲ Level 3 PCA Distribution — Class separation in domain-optimized 79 dimensions

▲ Level 3 Density Landscape — 3 clusters identified (L2 single cluster → L3 tri-split)

3 Groups Discovered by Military AI

1

Heavy Armored Combat Vehicles

Track-based heavy armored ground combat platforms such as K-2 Black Panther, T-80U, BMP-3, and K200. Low, wide hulls are the common visual feature.

2

SPH & Heavy Artillery

K-9 Thunder, K806, and others with long gun barrels or distinctive hull shapes. Turret proportions and shape serve as classification criteria.

3

Light Support & Air Power

Jeep, Truck, Cobra Armored Vehicle, etc. — relatively light vehicles with vertical profiles. Some cluster boundary confusion occurs at L3.

L3 Density Plots per Class

Density

Actual

K-2 Black Panther

Density

Actual

K-9 Howitzer

Density

Actual

K200 APC

Density

Actual

K806 APC

Density

Actual

T-80U Tank

Density

Actual

Military Truck

▲ Each card — Left: density plot (L3) / Right: representative sample (actual image)

🔬 Pebbloscope L3 — Military Lens Reduces Background Bias

In L2, K806 formed distinct clusters based on seasonal backgrounds. What happens under the military-specialized lens (L3)? Compare with the L3 snapshot below.

Pebbloscope — 10-class ground weapon L3 embedding. Seasonal/background clustering is weaker compared to L2

L2 → L3 change: The clusters that were separated by background season under the general lens (L2) become relatively weaker under the military-specialized lens (L3). The military lens focuses more on equipment morphology (tracks, turret presence, hull ratio) rather than background. This is the value of a domain-specialized lens — it captures essential differences that general-purpose AI misses.

▲ Explore the L3 Pebbloscope snapshot directly → Compare with the L2 snapshot to see the military lens effect.

Outlier Sample Analysis — The Most Striking Scenes for AI

Let's examine the most "typical" and most "unusual" images in the dataset. This analysis reveals which scenes AI models learn as "archetypes" and which scenes may cause confusion.

🎯 High Density — The "Core" Scenes AI Is Most Confident About (L3)

The K-9 Thunder and K-2 Black Panther occupy the core of the high-density cluster. They are the "face" of the dataset.

K-9 (Density 1.285) 🔥

K-9 (Density 1.280)

K-9 (Density 1.227)

K-2 (Density 1.153)

K-2 (Density 1.147)

K-9 (Density 1.147)

💡 Insight — The dominance of bg5 (Background 5) and en3 (Environment 3): All high-density samples share en3 (environment condition 3) and bg5 (background 5). This is evidence that a specific lighting-background combination dominates the dataset's "standard." This is also linked to the duplicate image problem.

⚠️ Low Density — The Most Confusing Outlier Scenes for AI (L3)

T-80U, BMP-3, and K806 frequently appear as low-density outliers. These scenes carry a high risk of AI misidentification.

T-80U (Density 0.283) 🔴

BMP-3 (Density 0.306)

K806 (Density 0.309)

BMP-3 (Density 0.310)

K-9 (Density 0.312)

K-2 (Density 0.312)

🔄 The Two Most Different Scenes — Extremes of the Dataset

The Military Jeep and K200 KIFV appear as the most visually different pair at L3.

Military Jeep

Light · Vertical profile

K200 KIFV

Heavy armor · Horizontal profile

Recommendations — From 68 to a Higher Score

🥗

Data Diet

This is the core improvement recommended by DataClinic. The current data contains numerous near-duplicate similar images. In particular, images with the en3_bg5 combination are extremely clustered in the density space.

Removing duplicate images and replacing them with more diverse environment combinations can significantly improve the AI model's generalization performance.

🌍

Expand Environmental Diversity

The current data is dominated by "winter and autumn natural environments." For AI models to operate across diverse battlefields, desert, urban ruin, jungle, and nighttime environment data is also needed.

Domain Randomization: A technique that randomly varies background textures, lighting directions, and weather effects to enhance the AI model's real-world adaptability.

🔥 Overrepresentation of "Fire" Scenes in Some Clusters — L3 Analysis

The Level 3 analysis reveals that fire scenes (explosions and flames) appear somewhat frequently in certain clusters. If these scenes are concentrated in specific clusters, the AI risks incorrectly learning "fire = that equipment type." Adjusting the proportion of fire scenes or distributing them evenly across classes is recommended.

Field Scenario — What If AI Gets It Wrong?

A data quality diagnosis does not end with numbers. The real question is: what kind of errors can low-density outliers produce on the actual battlefield across 10 classes of military equipment?

⚠️ Scenario: T-80U Enemy Tank in Low-Visibility Conditions

T-80U Enemy Tank — Low density: heavy rain environment

🎯 Actual: T-80U Enemy Tank

sn2 · en4(low visibility) · bg5

Density 0.283 — Low-density outlier

→ AI Judgment

K-9 Self-Propelled Howitzer — High density: standard environment

❌ AI Misjudgment: K-9 Friendly Howitzer?

sn1 · en3(standard) · bg5

Density 1.285 — High-density core

❌ IFF (Identification Friend or Foe) Failure — Misidentifying an enemy tank (T-80U) as a friendly howitzer (K-9) could allow hostile armor to pass through friendly lines unchallenged.
Cause: In heavy rain and low-visibility (en4) conditions, T-80U falls into the low-density region — the AI has not learned this condition sufficiently, so it gravitates toward the high-density pattern of K-9, which shares a similar frontal silhouette.

📊 Why Does This Error Occur?

T-80U and K-9 both share a tracked chassis + turret structure. In low-visibility frontal views, only turret shape and hull ratio differentiate them, but T-80U's en4 (low-visibility) images are extremely scarce in the training data, leaving AI without sufficient evidence for judgment. Meanwhile, K-9's en3·bg5 combination sits at the high-density core — the pattern AI is most confident about. Uncertain inputs being pulled toward the highest-confidence pattern is a fundamental characteristic of neural networks.

✅ Solution: Reinforce Low-Density Conditions

By generating additional T-80U images with en4 (low-visibility) and diverse background combinations, the low-density region fills in and IFF reliability improves. Simultaneously, applying a data diet to the high-density images concentrated in the en3·bg5 combination reduces duplicates, resulting in a more uniform overall density distribution — making the leap from 68 to the target of 80+ achievable.

✅ Data Rebalancing → Improved IFF Reliability
Reinforce low-density conditions (en4, diverse bg) + remove high-density duplicates = directly strengthen AI judgment in real-world environments

Conclusion — Possibilities and Limits of the Synthetic Battlefield

PBLS_Military is a highly meaningful starting point for defense AI research. Perfect class balance, HD resolution, and systematic environment combinations are advantages that only synthetic data can provide. The very fact that globally acclaimed Korean defense exports like the K-2 Black Panther and K-9 Thunder appear as AI training data reflects the elevated status of Korea's defense industry.

A DataClinic score of 68 is "a good start." By cleaning up duplicate images (Data Diet) and expanding environmental diversity, reaching the 80s is achievable. Going further, if this synthetic data is combined with real-world photographs (Hybrid Dataset), it will bring us one step closer to developing combat-ready defense AI models.

AI can learn the battlefield without live fire. Improving the quality of that learning is the next challenge for defense synthetic data.

PBLS_Military Key Summary Card

68

DataClinic Overall

10

Weapon System Classes

3,171

Synthetic Images

HD

1,344×768px

Original DataClinic Report: dataclinic.ai/en/report/224 · Not for commercial use

Executive Summary

DataClinic Grade Summary

Why Synthetic Data? — Building the Battlefield with Data

Infinite Environment Combos

Perfect Class Balance

No Security Concerns

The Battlefield Blueprint Hidden in Synthetic Data Filenames

Dataset Introduction — PBLS_Military

10 Weapon Systems Gallery — The Battlefield Players

🇰🇷 Pride of Korean Defense Exports

🎯 Enemy Vehicles for IFF Training

🚁 Air Power & Support Vehicles

Level 1 — Basic Quality Diagnosis

Mean Images per Class — Each Weapon's "Archetype" Through AI's Eyes

✅ Strengths

⚠️ Cautions

Level 2 — DataLens Analysis (Wolfram ImageIdentify Net V2)

Density Plots per Class — Distribution Patterns of Each Equipment

🔬 K806 Through Pebbloscope — Clusters Created by Seasons

Level 3 — Military-Specialized DataLens (79 Dimensions)

3 Groups Discovered by Military AI

Heavy Armored Combat Vehicles

SPH & Heavy Artillery

Light Support & Air Power

L3 Density Plots per Class

🔬 Pebbloscope L3 — Military Lens Reduces Background Bias

Outlier Sample Analysis — The Most Striking Scenes for AI

🎯 High Density — The "Core" Scenes AI Is Most Confident About (L3)

⚠️ Low Density — The Most Confusing Outlier Scenes for AI (L3)

🔄 The Two Most Different Scenes — Extremes of the Dataset

Recommendations — From 68 to a Higher Score

Data Diet

Expand Environmental Diversity

Field Scenario — What If AI Gets It Wrong?

⚠️ Scenario: T-80U Enemy Tank in Low-Visibility Conditions

📊 Why Does This Error Occur?

✅ Solution: Reinforce Low-Density Conditions

Conclusion — Possibilities and Limits of the Synthetic Battlefield

PBLS_Military Key Summary Card