2026.03 · Pebblous Data Communication Team

Reading time: ~8 min · 한국어

Executive Summary

This analysis is a comprehensive census of scale and class imbalance statistics across 134 image classification datasets diagnosed by Pebblous DataClinic. Per-class image counts were collected directly via the public API.

The 134 datasets contain a total of 12.05 million images. The median dataset size is 11,505 images, with more than half concentrated around the 10K mark, yet the top 4 datasets each exceed 1 million images, making the distribution extremely skewed.

Class imbalance is even more dramatic. While 25% (33 datasets) are perfectly balanced, 15 datasets have a max-to-min class ratio exceeding 100x. The most extreme case reaches 73,384x, where a class with just 3 images coexists with one containing 220,000 images in the same dataset.

Overview

134
Datasets Analyzed
12.05M
Total Images
11,505
Median (images)
50.8
Avg. Classes

Of the 158 public reports on DataClinic, 134 are image classification tasks with per-class image count data. The remaining 24 are unlabeled datasets for unsupervised learning or formats where class distinctions do not apply.

The 134 datasets have a combined total of 6,805 classes containing 12,054,130 images. The average dataset size is 89,956 images, but this is heavily skewed by 4 datasets exceeding 1 million images each. The fact that the median (11,505) is only one-eighth of the mean (89,956) starkly illustrates the extreme asymmetry of the distribution. Full reports are available at dataclinic.ai.

Dataset Size Distribution

The distribution of 134 datasets by total image count, divided into 5 ranges:

<1K (Small)9 (6.7%)
1K–10K55 (41.0%)
10K–100K54 (40.3%)
100K–1M12 (9.0%)
>1M (Large)4 (3.0%)

81% of all datasets fall within the 1K–100K range. Notably, 64 datasets (48%) have fewer than 10K images, close to half the total — matching the scale most commonly encountered in real-world AI projects.

In contrast, only 4 datasets exceed 1 million images — Places365 (1.8M), OpenImages (1.74M), ImageNet (1.28M), Thermal Camera (1.22M) — yet these 4 account for 50.4% of all images. The total data volume is extremely concentrated in a handful of large datasets.

The smallest dataset is Marble Surface Anomaly Detection (55 images), while the largest is Places365 (1,803,460 images). The gap between the largest and smallest is 32,790x.

🔍 Dataset Diversity Through Class Mean Images

Pixel-wise average of all images in each class. Sharper images indicate higher visual consistency within the class.

PEACOCK 클래스 평균

Birds 525
PEACOCK

EMU 클래스 평균

Birds 525
EMU

CROWNED CRANE 클래스 평균

Birds 450
CRANE

Impressionism 클래스 평균

WikiArt
Impressionism

Cubism 클래스 평균

WikiArt
Cubism

김밥 클래스 평균

Korean Food
Gimbap

Class Count Distribution

2
Min Classes
10
Median
1,000
Max (ImageNet)

The median number of classes is 10, with an average of 50.8. The spectrum ranges widely from binary classification (2 classes) to ImageNet's 1,000 classes.

The top 5 datasets by number of classes:

  1. ImageNet — 1,000 classes (1,281,167 images)
  2. OpenImages — 599 classes (1,743,042 images)
  3. Birds 525 — 525 classes (89,885 images) · Story
  4. Birds 450 — 450 classes (67,792 images) · Story
  5. MPII (Human Pose) — 398 classes (40,522 images)

More classes tend to mean fewer images per class, which is directly linked to imbalance issues. ImageNet (avg. 1,281 images per class) is relatively well-designed in this regard, while Birds 525 (avg. 171 images per class) has limited data relative to its number of classes.

Class Imbalance Status

Analysis of class imbalance ratio (max class image count / min class image count). Based on 134 multi-class datasets, excluding single-class datasets.

Perfect Balance (1.0x)33 (24.6%)
Mild Imbalance (1–2x)30 (22.4%)
Moderate Imbalance (2–10x)38 (28.4%)
Severe Imbalance (10–100x)18 (13.4%)
Extreme Imbalance (>100x)15 (11.2%)

Of the 134 datasets, 33 (24.6%) are perfectly balanced with identical image counts across all classes. These were intentionally balanced by researchers — for example, EPL Logo Detection (exactly 1,000 images per class) and Sign Language Digits (204–208 images per class).

On the other hand, 15 datasets (11.2%) have imbalance ratios exceeding 100x. The most extreme case, OpenImages, has a class with just 3 images coexisting with one containing 220,154 images. Training AI models under such extreme imbalance means minority classes are effectively ignored during learning.

The median imbalance ratio is 2.1x, meaning a typical public dataset has about twice as many images in its largest class compared to its smallest. While this may seem acceptable at first glance, the extreme gap between the mean (748x) and median (2.1x) reveals the severe skew in the distribution.

📖 In-Depth Imbalanced Dataset Stories

WikiArt 81,471 Images Diagnosis — 27 styles, imbalance 133x, quality score 53

Korean Food 150 Diagnosis — 150 classes, imbalance 121x, quality score 71

Birds 525 Diagnosis — 525 species, 89,880 images, quality score 77

Birds 450 Diagnosis — 450 species, 75,100 images, quality score 65

Key Rankings

📦 Size Top 10 (Total Images)

1Places3651,803,460
2OpenImages1,743,042
3ImageNet1,281,167
5Kfashion967,806
7CelebA202,599
8SVHN99,289
9Birds 52589,885
10EuroSAT81,500

⚠️ Imbalance Top 10 (max/min ratio)

🗂 Class Count Top 10

✅ Most Balanced Top 5

Perfectly balanced (1.0x) datasets account for 24.6% (33) of the total.

3 Key Insights

1. Mid-sized datasets dominate, but actual data volume is concentrated in a few giants

While 81% of datasets fall in the 1K–100K range, just 4 datasets with over 1 million images account for half of all images. The distribution of dataset count and total data volume are completely different. "Well-known" datasets used as benchmarks by the research community tend to be larger.

2. Imbalance is unavoidable, but many datasets achieve balance through intentional design

25% of datasets achieve perfect balance. Their common trait is controlling the number of images per class from the collection stage to match research objectives. In contrast, "naturally" collected datasets (OpenImages, CelebA, UTKFace, etc.) directly reflect real-world imbalance, showing ratios exceeding 100x.

3. More classes mean fewer images per class, increasing quality risk

The top 10 datasets by class count average 3,164 images per class, but excluding ImageNet, this drops to 875. Birds 525 has 525 classes with 89,885 images — only 171 per class. The more fine-grained the classification task, the harder it is to collect data and maintain balance, and the higher the risk of model performance degradation. This is why DataClinic's multi-level diagnosis (L1/L2/L3) is especially critical for such datasets.

How does your dataset measure up?

DataClinic provides free diagnosis of image classification datasets from L1 (basic quality) to L3 (domain-specific analysis).

Diagnose with DataClinic →