RoboCurate: Harnessing Diversity with Action-Verified Neural Trajectory for Robot Learning

Seungku Kim*1, Suhyeok Jang*1, Byungjun Yoon1, Dongyoung Kim1,2, John Won1, Jinwoo Shin1,2
*Equal contribution
1KAIST, 2RLWRLD
RoboCurate overview figure

Abstract

We present RoboCurate, a novel neural trajectory generation framework that increases diversity via controllable video generation and filters low-quality samples by evaluating motion similarity between generated video and simulator replay. Specifically, RoboCurate replays the predicted actions in a simulator and assesses action quality by measuring the consistency of motion between the simulator rollout and the generated video. In addition, we unlock observation diversity beyond the available dataset via image-to-image editing and apply action-preserving video-to-video transfer to further augment appearance.

Method

1. Generation Stage

We expand observation diversity with two components: (1) image-to-image (I2I) editing on the initial frame for scene-level variation, and (2) video-to-video (V2V) transfer for appearance diversity while preserving initial motion.

Generation stage overview

2. Filtering Stage

We filter suboptimal synthetic trajectories with inaccurate actions by replaying the predicted actions in a simulator and assessing action quality by measuring the consistency of motion between the simulator rollout and the generated video. We train an attentive probe on top of a frozen video encoder to measure motion similarity between the simulator rollout and the generated video with automatically generated positive and negative samples from real data.

Filtering stage overview

Accurate action: Simulator rollout ≈ Synthetic video

Inaccurate action: Simulator rollout ≠ Synthetic video

Examples of positive and negative pairs for attentive probe training

Examples of positive and negative pairs for attentive probe training.

Results

ALLEX Humanoid Robot

In-distribution task — Pick and Place Can

Real Real + DreamGen Real + RoboCurate (Ours)
Fail
Success
Success

Out-of-distribution task (Novel Object) — Pick and Place Cup

Real Real + DreamGen Real + RoboCurate (Ours)
Fail
Partial
Success

Out-of-distribution task (Novel Behavior) — Pour Can

Real Real + DreamGen Real + RoboCurate (Ours)
Fail
Fail
Success

GR-1 Tabletop

We report the average success rate (%) over 50 trials across 24 tasks (18 rearrangement and 6 articulated).

GR-1 Tabletop results table

DexMimicGen

We report the average success rate (%) over 50 trials across 6 tasks (3 GR-1 Humanoid and 3 Bimanual Panda Arms with Dexterous Hands), trained with 100 demonstrations per task.

DexMimicGen results table
We observe that our visual augmentation pipeline (e.g., I2I editing and V2V transfer) substantially improves downstream task performance. Moreover, our action-level filtering is effective for curating neural trajectories and further enhances VLA performance.