Executive Summary

Developed by Tsinghua University and presented at AAAI 2026, Kronos converts financial candlestick (K-line) data into domain-specific discrete tokens via Binary Spherical Quantization (BSQ), achieving a verified performance gap of RankIC* +93% and synthetic generation fidelity** +22% over general-purpose time-series foundation models (TSFMs). A massive domain-specific corpus of 12B+ K-line observations from over 45 exchanges, combined with a GPT-style hierarchical prediction architecture, structurally accounts for this gap. Released under the MIT open-source license, Kronos has accumulated 16.4K GitHub stars (as of 2026-04-13), placing it at the top of the TSFM ecosystem.

The deeper significance of this achievement goes beyond model performance. It empirically proves that machines can learn the native "language" of a domain like financial time series — and that the quality of that learning depends entirely on the quality of the training data. While TimesFM (500M parameters), a general-purpose TSFM, recorded R² = -2.80% on financial data, Kronos outperformed all 25 baseline models on the same tasks. This is the clearest evidence that "domain-appropriate representation" — not "bigger and more general models" — is what decides the outcome.

Pebblous intersects this paradigm shift at three points. First, the fact that Kronos's BSQ codebook (2^20 vocabulary) directly reflects training data distribution means that DataClinic's time-series outlier detection, missing value imputation, and distribution diagnostics determine FM performance at the upstream level. Second, Kronos's synthetic K-line generation (+22% fidelity) serves as prior validation for the synthetic sensor time-series generation that DataGreenhouse aims to deliver in the manufacturing domain. Third, the positioning that "in the era of domain-specific FMs, data quality is a constituent element of the model itself" becomes the central argument for the AI-Ready Data strategy. Gartner has already warned: "Through 2026, organizations will abandon 60% of AI projects unsupported by AI-ready data."

* RankIC (Rank Information Coefficient) — The correlation between model-predicted return rankings and actual return rankings across assets. +93% means "93% higher ranking prediction accuracy relative to 25 comparison models," not an absolute 93% return.

** Synthetic Fidelity — A metric measuring how accurately synthetic K-line data generated by the model reproduces the statistical distributions (volatility, trends, correlation structure) of real market data. +22% means distribution reproduction accuracy is 22 percentage points higher than general-purpose TSFMs.

1

K-line as Language — Kronos's Technical Innovation

Why do time-series forecasting models fail to properly understand financial data? The answer lies in representation. Conventional general-purpose TSFMs treat time series as sequences of continuous numerical values. A temperature rising from 23.4 to 23.8 degrees and a stock price moving from $53 to $54.20 are encoded in exactly the same way. But financial data possesses an inherent structure that generic numerical representations cannot capture. The open, high, low, close, volume, and turnover (OHLCVA) of a candlestick (K-line) are not merely six numbers — they are a compressed representation of collective decision-making by market participants.

1.1 Binary Spherical Quantization — Turning Finance into Discrete Tokens

Kronos solves this problem in a fundamentally different way. Just as large language models (LLMs) convert text into "word tokens," Kronos converts K-lines into "financial tokens." The core mechanism behind this conversion is Binary Spherical Quantization (BSQ). A 3-layer symmetric Transformer autoencoder (dim=256, FF=512, heads=4) compresses the 6-dimensional continuous OHLCVA values into a latent space, then projects the latent vector onto learnable hyperplanes to generate a k=20-bit binary code. Through this process, each K-line is represented as a single integer index in a vocabulary space (codebook) of 2^20, approximately 1 million entries.

This is more than simple integer conversion. Kronos structures this vocabulary hierarchically. The 20 bits are split into coarse (upper 10 bits, 2^10=1,024 entries) and fine (lower 10 bits, 2^10=1,024 entries). Coarse sub-tokens capture the primary structure of price movements (upward/downward direction, volume levels), while fine sub-tokens capture residual precision within that structure (exact positions of highs and lows, turnover changes). During prediction, coarse sub-tokens are generated first, then fine sub-tokens are conditionally predicted using cross-attention to incorporate context — a sequential approach.

What the ablation numbers reveal: In the ablation study (Table 2 of the original paper), the discrete token approach (Kronos-small) achieved a RankIC of 0.0254, while the continuous-value direct regression approach (Direct-AR) managed only 0.0149 — an advantage of over 70%. Sequential sub-token prediction (0.0254) also significantly outperformed parallel prediction (0.0226). This is the most direct empirical answer to the question "why discrete tokens?"

1.2 Predictor Architecture — Generating the Next K-line Like GPT

Once the tokenizer converts K-lines into discrete tokens, the predictor takes this token sequence as input and generates the next token. Kronos's predictor is a GPT-style decoder-only Transformer. It employs RoPE (Rotary Position Embedding), RMSNorm normalization, and causal masking, autoregressively predicting the next candlestick token just as a language model predicts the next word.

Kronos is available in four model variants. Mini (4.1M parameters, 2,048 context length), Small (24.7M, 512), Base (102.3M, 512), and Large (499.2M, 512). The Mini, Small, and Base variants can be freely downloaded from the HuggingFace NeoQuasar organization under the MIT license. The Large model is accessible only through institutional licensing inquiries. Notably, the model with the longest context length (2,048) is the smallest in parameter count — Mini — a design choice tailored for ultra-short-term trading and real-time prediction.

The handling of temporal information is also distinctive. Kronos uses five types of time embeddings: minute, hour, day of week, day of month, and month. These five are processed independently and then summed, explicitly capturing the seasonality and cyclicality of financial markets — market open/close effects, month-end effects, quarterly rebalancing patterns, and more.

1.3 Training Data Scale — 45 Exchanges, 12B+ K-lines

Kronos's training data comprises over 12 billion K-line observations collected from more than 45 exchanges across over 30 countries. It spans 7 time frequencies including minute, hourly, daily, weekly, and monthly bars, and covers multiple asset classes such as equities, futures, forex, and cryptocurrencies. Data quality filtering is rigorous — illiquid assets and price-stagnant intervals are removed, and extreme values outside the Z-score [-5, 5] range are clipped. This scale and diversity are the foundation of Kronos's zero-shot generalization capability.

The key architectural differences between Kronos and general-purpose TSFMs are summarized below. The two approaches start from fundamentally different premises.

Attribute TimesFM (General) Kronos (Finance-Specific)
Tokenization Continuous-value patch-based Discrete K-line tokens (BSQ)
Context Length Up to 16,000 512 (Small/Base/Large), 2,048 (Mini)
Training Data General 100B time points (mixed) Finance-specific 12B+ K-lines (45 exchanges)
Vocabulary Size Continuous (no vocabulary) 2^20 ~ 1M (coarse/fine hierarchy)
Financial Zero-Shot R² -2.80% (effectively useless) Outperforms all 25 baselines
2

Why Finance First — The Structural Roots of the Performance Gap

The underperformance of general-purpose TSFMs on financial data is no accident. The paper "Re(Visiting) Time Series Foundation Models in Finance" (arXiv:2511.18578) systematically demonstrated this phenomenon. TimesFM (500M parameters) achieved a financial zero-shot forecast R² of -2.80% and a return of just -1.47%. Chronos (large) fared no better at R² = -1.37%, an effectively meaningless result. These models are not bad — they simply were never designed for financial data.

2.1 Four Structural Reasons General-Purpose TSFMs Fail in Finance

The academic literature consistently identifies four structural limitations of general-purpose TSFMs in the financial domain.

  • Low Signal-to-Noise Ratio (SNR): Unlike weather or energy consumption data, financial time series are noise-dominated. General-purpose TSFMs are optimized for high-SNR environments, creating the paradox of learning noise as signal when applied to financial data.
  • Severe Non-stationarity: Financial markets undergo frequent regime shifts. The statistical properties of time series during quantitative easing differ completely from those during tightening cycles. Fixed patch representations are vulnerable to such regime changes.
  • OHLCVA Multivariate Structure: The six variables of a candlestick are not independent. High must always exceed Low, and Close falls between them. The independent time-series assumption of general-purpose models ignores these structural constraints.
  • Higher-Order Dependencies: Key signals in technical analysis — Bollinger band breakouts, death/golden crosses, head-and-shoulders patterns — are higher-order patterns formed across multiple K-lines. Simple continuous-value patches cannot capture these patterns.

2.2 Kronos's Answer — Designing a Domain Language

Kronos addresses all four problems at the tokenization stage. The BSQ codebook's 2^20 vocabulary naturally reflects the statistical structure of financial data through training on 12B+ K-lines. Recurrent patterns in noisy environments are encoded as high-frequency codebook entries, while rare anomalous patterns become low-frequency entries. By compressing the 6-dimensional OHLCVA into a single discrete token, structural constraints between variables are also learned automatically.

Cross-asset training across 45 exchanges is equally critical. By simultaneously learning patterns from the Seoul equity market, the New York futures market, and the Singapore forex market, a general-purpose financial representation emerges that is not overfitted to any single market's idiosyncrasies. This is the source of Kronos's zero-shot generalization capability.

2.3 Benchmark Results — Quantifying the Gap

The original Kronos paper compares against 25 baseline models across 4 categories (zero-shot TSFMs, full-shot trained models, econometric volatility models, and generative models). Key metric results are summarized below.

Metric Kronos Advantage Compared Against
RankIC (Directional Accuracy) +93% vs. best general-purpose TSFM
RankIC (Directional Accuracy) +87% vs. best non-pretrained model
MAE (Volatility Prediction) -9% Kronos-small 0.0384 vs Direct-AR 0.0565
Synthetic K-line Fidelity +22% vs. DiffusionTS, TimeVAE
Discrete vs. Continuous Ablation +70% RankIC 0.0254 vs 0.0149 (Direct-AR)

2.4 Synthetic K-line Generation — Solving Data Scarcity with Data

Another core capability of Kronos is synthetic K-line generation. In financial practice, backtesting is constrained by the limitations of historical data — daily data requires decades to accumulate, and data for specific market conditions (bull, bear, or sideways markets) is naturally unevenly distributed. Kronos can generate new K-line sequences by running the learned codebook in reverse.

The fidelity of this synthetic data is validated through Discriminative Score and TSTR (Train-on-Synthetic, Test-on-Real) protocols. Kronos achieved a +22% fidelity improvement over DiffusionTS and TimeVAE. It can also generate scenarios incorporating 103 classical candlestick patterns. This is not merely a technical achievement — it is practical infrastructure that enables model training and backtesting in emerging markets and niche asset classes where data is scarce.

3

Lessons from Domain-Specific Models — Transfer to Manufacturing

What Kronos proved in finance is not simply "a good finance-specific model." It demonstrated a paradigm: when a domain's native language is learned as discrete tokens, the precision of domain-specific representation can overwhelm the scale and generality advantages of general-purpose models. The natural question follows: what happens when this paradigm is applied to domains beyond finance — particularly manufacturing and industrial time series?

3.1 The Finance-Manufacturing Analogy — Surprisingly Parallel Structures

The structural parallels between the two domains run deeper than one might expect.

Finance Domain Mapping Manufacturing Domain
K-line (OHLCVA) Sensor time series (temp, pressure, vibration, current)
45+ exchanges Diversity of plants and equipment types
Price volatility Equipment degradation / anomaly signals
Candlestick patterns (chart analysis) Equipment health patterns (predictive maintenance)
Backtesting data scarcity Real failure data sparsity
Synthetic K-line generation Synthetic sensor time-series generation

3.2 Why a Manufacturing Foundation Model Does Not Yet Exist

Despite these clear structural parallels, foundation models for the manufacturing domain remain in their infancy. Siemens announced its IFM (Industrial Foundation Model) project in collaboration with Microsoft Azure at Hannover Messe 2025, and the ProcessFM framework proposed the first dedicated FM architecture — but nothing yet matches Kronos's level of maturity. The reasons are structural.

  • Complex Physicochemical Mechanisms: The thermodynamics, reaction kinetics, and fluid dynamics of manufacturing processes exhibit far higher nonlinearity than financial price mechanisms. Simple time-series pattern learning cannot capture physical causal relationships.
  • Data Silos: Financial markets have centralized data collection infrastructure — the exchange. Manufacturing sensor data, by contrast, is managed independently by each plant. Aggregating data from 45 exchanges and aggregating data from 45 factories are entirely different problems.
  • Heterogeneous Data Integration: Manufacturing data extends beyond sensor time series to include images (vision inspection), CAD drawings, PLC control code, and work orders — fundamentally heterogeneous forms. Unifying these under a single tokenizer remains a technically unsolved problem.
  • Label Scarcity: Real failure data is inherently sparse. Training a model on equipment failures that occur once a year is practically impossible. Without synthetic data, this problem remains unsolvable.

3.3 Why Data Quality Is a Prerequisite — The Codebook Contamination Mechanism

Kronos's BSQ codebook directly reflects the training data distribution. This is not merely a technical fact — it carries fundamental implications for data quality. If the training data contains noise, outliers, or missing values, that contaminated distribution is encoded directly into the codebook. A codebook that has learned noise as legitimate signal will use corrupted representations in every downstream inference.

The Representation Collapse problem documented in the VQ (Vector Quantization) literature (arXiv:2411.16550) demonstrates this empirically. Noisy training data causes representation collapse where only a fraction of codebook entries are utilized and the rest become "dead codes." Even with a vocabulary size of 2^20, only a few thousand entries may actually be used — fundamentally limiting the model's representational capacity. Outliers, missing intervals, and calibration errors in manufacturing sensor data will trigger the same mechanism in industrial foundation models.

3.4 The Current State of Manufacturing AI and Market Opportunity

Despite these challenges, the market direction is clear. The Korean government is driving a transition from 30,144 smart factories (as of end-2022) to 500 AI factories (2026 target), with a related budget of KRW 402 billion — a 70% year-over-year increase. The predictive maintenance market is projected at $10B-$18B in 2026, growing at a CAGR of 20-30%. The digital twin market is expected to grow from $36.19B in 2025 to $180.28B by 2030 (CAGR 37.87%).

NIST has scheduled an "AI for Manufacturing Workshop" for May 2026. Siemens IFM and ProcessFM are still at early stages, but when this direction materializes, the biggest bottleneck will not be technology — it will be data. Without large-scale, high-quality, standardized manufacturing sensor data, no architecture, however sophisticated, can replicate what Kronos achieved.

4

The Pebblous Connection — Data Readiness Partner

What Kronos demonstrated in finance aligns precisely with Pebblous's core value proposition. "Domain-specific foundation model performance begins with codebook quality, and codebook quality is determined by training data quality." DataClinic sits at the most upstream point of this causal chain, while DataGreenhouse extends it by addressing data scarcity.

4.1 DataClinic — Preventing Codebook Contamination Upstream

The quality filtering that Kronos applies during training — removing illiquid assets, clipping Z-scores to [-5, 5] — is functionally complementary to what DataClinic automates. DataClinic's time-series outlier detection performs a more sophisticated version of Kronos's Z-score clipping, its missing value imputation is the automated counterpart of illiquid interval removal, and its distribution diagnostics pre-validate domain coverage of training data to prevent unintended distributional bias in the codebook.

This is more than preprocessing automation. To invoke Gartner's warning once more: "Through 2026, organizations will abandon 60% of AI projects unsupported by AI-ready data." This is precisely the risk facing organizations attempting to build domain-specific FMs. An FM trained on substandard data produces a contaminated codebook that fails in production. DataClinic eliminates the root cause of this 60% failure rate at the upstream level.

The AIMA (Alternative Investment Management Association) 2025 report found that 95% of hedge funds already use generative AI, with 58% planning to expand AI usage for alpha generation. It also confirmed that "data quality and accessibility remain the primary barrier to AI adoption." 48% of firms face model accuracy issues due to volatile and incomplete time-series data. This is the basis for positioning DataClinic as a fine-tuning data curation partner in financial AI PoCs built on Kronos.

4.2 DataGreenhouse — From Synthetic K-lines to Synthetic Sensor Time Series

Kronos's synthetic K-line generation (+22% fidelity) serves as prior validation for the synthetic sensor time-series generation that DataGreenhouse aims to deliver in the manufacturing domain. The two problems are structurally identical. In finance, historical data for specific market conditions is scarce; in manufacturing, real failure data is sparse. Just as Kronos reproduces bull, bear, and sideways market scenarios as synthetic K-lines, DataGreenhouse can generate normal, anomalous, and failure-state sensor patterns as synthetic time series.

The market has already validated the strategic value of synthetic data. Gretel was acquired by NVIDIA (March 2025), and Hazy was acquired by SAS (November 2024). The synthetic data market is projected at approximately $636M in 2026, growing at a CAGR of 30.8-46.3%. Synthetic manufacturing sensor data becomes particularly powerful when combined with digital twins — a pipeline where diverse failure scenarios are generated in simulated environments and that synthetic data is then used to train IFMs is becoming a reality.

4.3 AI-Ready Data Positioning — Infrastructure That Connects the Entire Pipeline

The paradigm shift that Kronos proved can be expressed in a single sentence: "In the era of general-purpose models, data quality was a nice-to-have. In the era of domain-specific models, data quality is a constituent element of the model itself." This shift provides the strongest argument for Pebblous's AI-Ready Data strategy.

The pipeline that Pebblous positions itself within is as follows.

  • Data Generation/Collection: Factory sensors, exchange feeds, medical devices — domain-specific sources
  • DataClinic Diagnosis & Curation: Outlier detection, missing value imputation, distribution diagnostics to prevent codebook contamination
  • DataGreenhouse Synthetic Augmentation: Rare scenario synthesis, class imbalance remediation
  • Domain-Specific FM Training: Training or fine-tuning IFMs/financial FMs on curated and augmented data
  • Prediction/Decision-Making: Digital twin integration, alpha generation, predictive maintenance

Pebblous owns stages 2 and 3 of this pipeline. The Kronos case demonstrates that these two stages are the critical determinants of overall pipeline performance. From a Physical AI platform perspective, if domain-specific FMs serve as the engine at the "prediction" stage of the sensor-to-digital-twin-to-predictive-decision chain, then quality management of that engine's fuel (training data) becomes an essential layer of any Physical AI platform.

5

Competitive Landscape and Outlook — TSFM Ecosystem Divergence

The TSFM ecosystem underwent rapid divergence during 2024-2025. On one front, tech giants like Google (TimesFM), Amazon (Chronos), and Salesforce (Moirai) are competitively advancing general-purpose TSFMs. On another, domain-specific models like Kronos and FinCast are opening new fronts by demonstrating performance that surpasses general-purpose models in targeted domains.

5.1 General-Purpose vs. Domain-Specific TSFM Landscape

A comparison of major TSFMs reveals the following landscape.

Model Developer Type Scale Key Feature
TimesFM 2.5 Google Research General 200M GIFT-Eval #1, 16K context
Chronos-2 Amazon General 120M HF 600M+ downloads
Moirai 2.0 Salesforce General 11.4M 30x smaller, 2x faster
Kronos Tsinghua / NeoQuasar Domain-Specific (Finance) 4.1M~499.2M RankIC +93%, GitHub 16.4K stars
FinCast Academic (CIKM 2025) Domain-Specific (Finance) 1B MoE, MSE -20%/MAE -10% vs TimesFM

GitHub stars offer a revealing proxy for the TSFM ecosystem's power distribution. As of April 2026, Kronos (16.4K) slightly exceeds TimesFM (~15.6K) and significantly leads Chronos (~4.8K) and Moirai (~1.5K). This signals how strongly the developer community has gravitated toward domain-specific models. However, the decision to withhold Kronos Large (499.2M) behind institutional licensing reflects the tension between open-source ecosystem building and commercialization.

5.2 Open-Source Ecosystem and Enterprise Adoption Path

Kronos's MIT license permits unrestricted fine-tuning and commercial deployment, meaning the PoC entry barrier is effectively zero. Official integration with Microsoft Qlib enables immediate access to a China A-share backtest pipeline, and the BTC/USDT real-time prediction demo (GitHub Pages) provides a hands-on channel to verify Kronos's practical utility.

Enterprise adoption follows a gradual pattern. The natural path is: PoC with open-source Mini/Small, then fine-tune Base on domain data, then deploy Base or Large (institutional licensing) in production. Survey data (2025) showing that 71% of data scientists are adopting foundation-model-based forecasting suggests this adoption path is already in motion.

5.3 Outlook for 2026-2028 — The Sequence of Domain Transfer

Academic surveys (arXiv:2504.04011, "Foundation Models for Time Series: A Survey") predict the rollout sequence of domain-specific TSFMs as follows: finance first (due to data availability and measurability of outcomes), followed by healthcare (scaling of EMR and vital-sign time series), then manufacturing and energy.

The implications for the Korean market are concrete. On the financial side, with the AI Basic Act taking effect in January 2026 and financial AI guidelines now established, developments such as KRX AI startup acquisitions and NH Investment Securities' 40-50% improvement in AI research speed are expected to accelerate. Korea's financial AI market is projected to grow from $3.166B in 2025 to $5.667B by 2030, at a CAGR of 12.35%. On the manufacturing side, the predictive maintenance market accompanying the smart-factory-to-AI-factory transition will be a key growth driver, with Korean AI-based manufacturing solutions projected to grow at a CAGR of 16.6% (2025-2030).

Key Takeaway: The era of general-purpose TSFMs is not ending — rather, an era of coexistence between general-purpose and domain-specific models is beginning. General-purpose models will serve early-stage exploration and rapid PoC with limited data; domain-specific models will power production environments where performance is decisive. Throughout this divergence, the importance of data infrastructure for building and maintaining domain-specific models will grow exponentially.

References

  • • Shi et al. (2025). "Kronos: A Foundation Model for Time Series in Financial Domain." arXiv:2508.02739. AAAI 2026.
  • • (2025). "Re(Visiting) Time Series Foundation Models in Finance." arXiv:2511.18578.
  • • Das et al. (2024). "TimesFM 2.5." Google Research. GIFT-Eval.
  • • Ansari et al. (2024). "Chronos: Learning the Language of Time Series." Amazon Science. arXiv:2403.07815.
  • • (2025). "Moirai 2.0: Advancing Foundation Models for Time Series Forecasting." arXiv:2511.11698.
  • • (2025). "FinCast: A Foundation Model for Financial Forecasting." arXiv:2508.19609. CIKM 2025.
  • • (2025). "Foundation Models for Time Series: A Survey." arXiv:2504.04011.
  • • (2024). "Representation Collapse in Vector Quantization." arXiv:2411.16550.
  • • AIMA (2025). "AI and Data Science in Alternative Investments Survey." AUM $788B, N=150.
  • • Gartner (2025.02.26). "Gartner Predicts Through 2026, 60% of AI Projects Will Be Abandoned."
  • • Kronos GitHub: https://github.com/shiyu-coder/Kronos (16.4K stars, MIT License)
  • • HuggingFace NeoQuasar: https://huggingface.co/NeoQuasar/Kronos-base

What Kronos proved transcends the achievements of financial AI. The principle that "if a domain's language can be learned as discrete tokens, that domain's prediction problems can be solved in the manner of language models" is ready to extend beyond finance into manufacturing, healthcare, and energy. The speed of that extension will depend not on architectural innovation but on the maturity of data infrastructure. The domains with AI-Ready Data will be the ones to produce the next Kronos.

We hope this report serves as a practical decision-making resource for data scientists and ML engineers in the finance, manufacturing, and industrial sectors who are evaluating domain-specific foundation models. For questions about data readiness preparation through DataClinic or DataGreenhouse, please reach out to the Pebblous team at any time.

Pebblous Research Team
April 13, 2026