AgiBot World 2026: Open Tactile-Contact Robot Dataset

Executive Summary

On June 3, 2026, AgiBot released "Rich Interaction," the second theme of the AgiBot World 2026 dataset. Where earlier robot datasets mostly gathered clean, successful demonstrations, this one goes the other way. It deliberately collects the moments when a robot drops an object, bumps into things, slips, or spills liquid — and it records the texture of that contact not only in vision but through tactile sensors as well. This article looks at what that choice means for Physical AI data strategy.

The data is a 100% real-world recording in which the dual-arm humanoid G2 captures RGB(D) cameras, gripper tactile signals, LiDAR, IMU, and full-body joint states through a single synchronized pipeline. The dataset isn't released all at once; it is split into five themes, each aligned with a different research direction. The first theme targeted imitation learning, and this second one targets contact-rich interaction. That structure signals data designed around research questions, not a one-time dump collected once and finished.

The next bottleneck in robot data isn't quantity but what you choose to record. Training a world model requires the physics of contact — friction, slip, the fine modulation of force — and none of that survives in clean success footage. Which modalities you keep, and at what fidelity, is becoming the new coordinate of data quality.

Key Figures

The four numbers below compress the design direction and scale of AgiBot World 2026. The first two say what the data is filled with; the last two point to the gap the data sets out to close.

Source: The Robot Report · AgiBot Official

5

Synchronized modalities

RGB(D) · tactile · LiDAR · IMU · joints

100%

Real-world data

Theme 2, not synthetic footage

5 stages

Phased release

Designed per research direction

<2M

Open manipulation episodes

Against 3.9M robots in operation

AgiBot G2 dual-arm humanoid robot full body — equipped with Zhixing 90D gripper and OmniHand for dexterous manipulation — ▲ AgiBot G2 dual-arm humanoid robot — the data-collection platform for AgiBot World 2026, fitted with Zhixing 90D grippers and OmniHand for dexterous manipulation | Source: AgiBot

1

What Success-Only Data Can't Teach

For a long time, robot manipulation datasets focused on collecting clean, successful demonstrations. They kept only the trajectories where the robot picked up a cup precisely and set it down precisely, treating any mid-motion slip or collision as noise to be deleted. Intuitively this is a persuasive choice: show the model only perfect examples, and the logic goes, it will learn to follow perfection.

The trouble is that a model trained this way copies the surface appearance of behavior without understanding the physics beneath it. A world model has to predict how much resistance comes when a hand touches an object, when it starts to slip, and how much more force is needed to stop that slip. Information like friction, deformation, and fine force modulation doesn't live in the pixels of a success video.

The limits of touch-free data are obvious from a single grip. From camera footage alone it is hard to tell whether the gripper is holding an object firmly or barely hanging on, about to drop it. A person has little trouble lifting a cup with their eyes closed, on fingertip sensation alone — but numb those fingertips and the same motion turns precarious. That is why policies trained without force and tactile signals are weak in contact-rich manipulation, precisely in the moments just before failure.

This gap is exactly where AgiBot World 2026 Theme 2 begins. Real physical intelligence has to learn how to react within variability — drops, collisions, falls, unstable contact, the instant liquid splashes. So instead of curating only successes, this data elevates the texture of contact itself into something worth recording.

▲ Vision-only data (left) shows the outcome of behavior but empties out the physics of contact. Data that synchronizes touch and force (right) fills that void

2

What AgiBot World 2026 Records

The collection platform is the dual-arm humanoid AgiBot G2. For dexterous manipulation it carries the Zhixing 90D gripper and OmniHand, and the signals from these are recorded together in a single synchronized pipeline. RGB(D) cameras, the gripper's tactile signals, LiDAR point clouds, IMU, and full-body joint states are all aligned on the same timeline. Because tactile sensors refresh far faster than cameras, this alignment is itself a technical challenge — and solving it inside one unified pipeline is the core of this dataset.

Set against the large open robot datasets that came before, the difference is clear. The public datasets that scaled up mostly did so by pooling recordings from many different robots to grow the episode count, and rarely synchronized touch and force onto the same timeline as the video. Some datasets specialized in contact-rich manipulation did include force, torque, and tactile data — but it is uncommon to see these modalities bound into a single pipeline from the start on a single humanoid platform operating at industrial scale. That is where AgiBot World 2026 reads differently from earlier datasets.

AgiBot G2 robot hand sorting strawberries and chocolates during free-form exploratory teleoperation data collection — ▲ G2 robot hand sorting strawberries and chocolates in a free-form exploratory teleoperation session. Tactile and force signals are recorded on the same timeline as the RGB cameras | Source: The Robot Report / AgiBot

2.1Variability Gathered Through Exploratory Teleoperation

Theme 2's data wasn't shot by repeating a fixed demonstration. It was gathered through exploratory teleoperation, a method that deliberately steers operators to interact freely with objects of varied materials, geometries, and mechanical properties. The goal is to leave behind not just successes but incomplete contacts and exceptional outcomes. What gets recorded, as a result, is contact dynamics, material deformation, object response, and multimodal feedback that fuses vision, touch, and force.

2.2A Release Designed in Five Stages

The dataset is not released all at once. It is split into five themes, each corresponding to a different research direction in embodied intelligence. The first theme is hundreds of hours of imitation-learning data, holding task descriptions, action sequences, atomic skill labels, and error-recovery trajectories. This second theme targets contact-rich interaction, and the third through fifth are slated for sequential release.

· Theme 1 — Imitation learning: hundreds of hours of real-world data, including task descriptions, action sequences, atomic skills, and error-recovery trajectories
· Theme 2 — Rich Interaction (2026-06-03): 100% real-world; drops, collisions, falls, unstable contact, and liquid splashes gathered through exploratory teleoperation
· Themes 3–5 — Unreleased: to be released sequentially, each targeting a different research question

A phased release is more than a distribution schedule. It means the data was sliced and designed by research question rather than thrown out in one block. The data needed for imitation learning and the data needed to learn contact physics differ in grain, even when gathered with the same robot. Separating that difference into themes is itself the message.

2.3A Digital Twin Placed Beside the Real Measurements

Alongside the real-world data, AgiBot also released simulation data generated in a 1:1 digital-twin environment. This part is open-sourced as the GenieSim project to support sim-to-real research. In effect, two tracks sit side by side within one dataset: filling out quantity with synthetic data and calibrating fidelity with real measurements. The whole release is on Hugging Face under a CC BY-NC-SA 4.0 license, collected on a platform that was a Best Paper finalist at IROS 2025.

Kitchen countertop with 2D bounding box annotations overlaid — AgiBot World hierarchical annotation framework bridging real-world data to the GenieSim digital twin — ▲ AgiBot World's hierarchical annotation pipeline applied to a kitchen scene with 2D bounding-box labels. Structured labeling of real-world data enables alignment with the digital-twin environment (GenieSim) | Source: The Robot Report / AgiBot

3

The Bottleneck Isn't Volume, It's What You Record

The bottleneck in Physical AI data is usually discussed in terms of quantity. More than 3.9 million industrial robots operate worldwide, yet the largest open manipulation datasets, even all combined, fall short of two million episodes. The gap between hardware scale and data scale is plainly large. But the question AgiBot World 2026 raises sits one step ahead of the quantity gap: even when you record the same hour, what you capture and at what fidelity decides the data's value.

Touch and force are modalities that vision cannot substitute for. The resistance, friction, and slip a world model must predict happen at the fingertips, not only within the camera's field of view. So which modality you omit sets the ceiling of the data. Vision-only data, no matter how much you scale it, will never fill in the physics of contact.

The five-stage split reveals another coordinate. Good robot data is not a resource you gather once and finish, but an asset designed and updated to match research directions. The choice to place synthetic and real measurements side by side in one dataset belongs to the same logic. The dataset's structure openly admits the present limit: you can grow quantity with synthetic data, but the fidelity of contact physics is, for now, still underwritten by real measurement.

Editor's Note. This is why Pebblous views data quality not as "how much did you collect" but as "what did you preserve, and at what fidelity." The next competition in robot data is shifting away from a fight to grow the episode count toward a fight over which modalities you design for which research questions. AgiBot World 2026 shows that this coordinate is already being drawn at the frontier of robotics research.

Pebblous Data Communication Team
June 27, 2026