2026.03 · Pebblous Data Communication Team
Reading time: ~12 min · 한국어
Executive Summary
This post presents key insights from Pebblous DataClinic's Quality Diagnosis Report #116 for the BIRDS 525 SPECIES dataset, along with a comparative analysis against the predecessor Birds 450.
BIRDS 525 is a large-scale bird classification dataset comprising 525 species and 89,880 images. The DataClinic comprehensive diagnosis yielded a quality score of 77 (Fair). This is 12 points higher than Birds 450 (quality score 65) — a result of multiple factors: ① preserving original resolutions (L1 improvement), ② expanded feature space coverage from 75 additional species (L2/L3 improvement), and ③ doubled L3 analysis precision (41→81 dimensions).
In L1 (Basic Quality) analysis, while Birds 450 had all images uniformly preprocessed to 224px, Birds 525 preserves original resolutions from 45px to 4,763px. The general-purpose AI lens (L2, Feature Space Analysis) revealed two clusters, and L2 density comparison shows Birds 525 is distributed across a broader feature space than 450. The domain-specific lens (L3, Domain-Specific Analysis) maintains a stable bell-shaped distribution in the 81-dimensional space.
The most "typical" bird was the PEACOCK, while the most atypical were EMU and BIRD OF PARADISE. Note that commercial use is not permitted, so caution is needed for purposes beyond research and education.
Birds 450 vs Birds 525 — What Changed
| Metric | Birds 450 (#11) | Birds 525 (#116) | Change |
|---|---|---|---|
| Overall Score | Quality score 65 (Fair) | Quality score 77 (Fair) | +12 pts ↑ |
| Classes | 450 species | 525 species | +75 species ↑ |
| Total Images | 75,100 | 89,880 | +14,780 ↑ |
| Avg per Class | 150.6 | 161.2 | +10.6 ↑ |
| Class Balance Std Dev | 15.7 | 20.6 | -4.9 ↓ |
| Min Resolution | 170×196px | 45×109px | Smaller |
| Max Resolution | 224×224px | 4,763×3,421px | Much more diverse ↑ |
| L1 Statistics Grade | Poor | Good | Major improvement ↑ |
| L2 Analysis Lens | BLIP (Multimodal) | Wolfram ImageIdentify Net V2 | Lens changed |
| L3 Dimensions | 41 dimensions | 81 dimensions | 2x more precise ↑ |
| Commercial Use | Allowed ✓ | Not allowed ✗ | ⚠️ Caution |
🔑 The Key to Score Improvement: Multiple Factors
The 12-point improvement is the result of multiple improvements working in combination, not a single factor:
① Preserving Original Data (L1 contribution) — Birds 450 preprocessed all images to 224×224px, destroying the natural pixel-level statistical diversity created by different shooting conditions — camera type, distance, lighting, and background. Birds 525 preserves original resolutions from 45px thumbnails to 4,763px professional photographs, dramatically improving the L1 statistics grade from "Poor" to "Good."
② Expanded Class Diversity (L2/L3 contribution) — With 75 additional species, the coverage of the feature space expanded. Newly included species (SNOWY SHEATHBILL, OILBIRD, JACOBIN PIGEON, etc.) are rare species and regional endemics with visual characteristics absent from the original 450. These fill previously empty regions of the feature space, improving the overall distribution balance.
③ More Precise L3 Analysis (41→81 dimensions) — Birds 525's domain-specific lens uses 81 dimensions, twice that of Birds 450. Higher dimensionality captures finer visual differences between species, improving class separability in the embedding space.
⚠️ Important: Commercial Use Not Permitted
While Birds 450 permits commercial use, Birds 525 does not allow commercial use. It can only be used for research and educational purposes. For actual services or products, you should use Birds 450 or another dataset with appropriate licensing.
Dataset Overview — The World of 525 Bird Species
BIRDS 525 SPECIES IMAGE CLASSIFICATION is a large-scale multi-class classification dataset published on Kaggle, consisting of 89,880 images across 525 bird species worldwide. From the familiar Jacobin Pigeon to the spectacular Bird of Paradise, the flightless EMU, and the nocturnal Oilbird, it captures a cross-section of avian diversity on Earth.
The 75 newly added species in Birds 525 are rarer species, regional endemics, or species that are visually difficult to distinguish — all absent from Birds 450. These include SNOWY SHEATHBILL, OILBIRD, JACOBIN PIGEON, and others.
Another key difference from Birds 450 is the diversity of image sources. While Birds 450 used uniformly preprocessed resolutions, Birds 525 includes wildlife photographs, field guide images, and professional bird photography at their original resolutions. This creates data conditions closer to real-world deployment (in-the-wild).
Birds 525 Dataset — Representative image collage of 525 bird species (DataClinic L1 analysis)
▲ Class mean images — pixel-wise average of 161 images per class. Sharper images indicate higher visual consistency within the class.