Add New Dataset
This guide explains how to add a new dataset to SPIDER for retargeting human motion data to robots.
Overview
Adding a new dataset involves:
- Preparing raw data in standardized format
- Creating a dataset processor script
- Extracting hand kinematics and object poses
- Converting meshes to MuJoCo-compatible format
- Testing the integration
Dataset Requirements
Your dataset should include:
- Hand motion data: Either MANO parameters or joint angles
- Object information: 3D meshes and 6D poses (position + orientation)
- Temporal alignment: Synchronized hand and object trajectories
Supported formats:
- MANO parameters (shape, pose)
- Direct joint angles
- Wrist poses (position + rotation)
- Object 6D poses
Data File Structure
SPIDER uses a standardized directory structure:
example_datasets/
├── raw/ # Raw data from original dataset
│ └── my_dataset/
│ ├── task_name_01.pkl # Raw motion capture data
│ ├── task_name_02.pkl
│ └── meshes/ # Object meshes
│ ├── cup.obj
│ └── spoon.obj
│
└── processed/ # Processed data for SPIDER
└── my_dataset/
├── dataset_summary.json # Dataset metadata
├── assets/ # Shared assets
│ ├── objects/ # Object meshes
│ │ └── cup/
│ │ ├── convex/ # Convex decomposition
│ │ │ ├── 0.obj
│ │ │ ├── 1.obj
│ │ │ └── ...
│ │ └── visual.obj
│ └── robots/ # Robot models
│ └── allegro/
│ ├── left.xml
│ └── right.xml
└── mano/ # Processed MANO data
└── bimanual/
└── task_name/
└── 0/
├── trajectory_keypoint.npz
└── info.jsonStep 1: Prepare Raw Data
Place your raw data in the appropriate directory:
bash
mkdir -p example_datasets/raw/my_dataset
# Copy your raw data files hereRaw Data Format
Your raw data file (.pkl or .npz) should contain:
python
{
# Hand data (one of the following):
'mano_pose': [...], # [T, 48] MANO pose parameters
'mano_shape': [...], # [10] or [T, 10] MANO shape parameters
# OR
'qpos_finger_left': [...], # [T, n_joints] Left finger joints
'qpos_finger_right': [...], # [T, n_joints] Right finger joints
'qpos_wrist_left': [...], # [T, 7] Left wrist pose (xyz + quat)
'qpos_wrist_right': [...], # [T, 7] Right wrist pose (xyz + quat)
# Object data:
'object_pose_left': [...], # [T, 7] Object pose (xyz + quat) for left hand
'object_pose_right': [...], # [T, 7] Object pose (xyz + quat) for right hand
'object_name_left': 'cup', # Object identifier
'object_name_right': 'spoon',
# Metadata:
'fps': 30.0, # Frame rate
'task_name': 'pick_cup', # Task identifier
}Step 2: Create Dataset Processor
Create a processor script at spider/process_datasets/my_dataset.py:
Step 3: Handle Object Meshes
Convert Meshes to OBJ Format
Ensure meshes are in MuJoCo-compatible OBJ format:
python
def convert_mesh_to_obj(input_mesh: str, output_mesh: str):
"""
Convert mesh to MuJoCo-compatible OBJ format.
Supports various input formats (STL, PLY, glb, etc.)
"""
import trimesh
# Load mesh
mesh = trimesh.load(input_mesh)
# Ensure single mesh (merge if needed)
if isinstance(mesh, trimesh.Scene):
mesh = mesh.dump(concatenate=True)
# Export as OBJ
mesh.export(output_mesh)
print(f"Converted {input_mesh} to {output_mesh}")Place Meshes in Assets
Place meshes in assets folder, for example, for cup:
bash
mkdir -p example_datasets/processed/my_dataset/assets/objects/cup
cp example_datasets/raw/my_dataset/meshes/cup.obj example_datasets/processed/my_dataset/assets/objects/cup/visual.objStep 4: Test Your Dataset Processor
Run Processor
bash
# Process a single sample
uv run spider/process_datasets/my_dataset.py \
--task=pick_cup \
--embodiment-type=bimanual \
--data-id=0Verify Output
Check that the output files exist and have correct format.
python
import numpy as np
# Load processed data
data = np.load('example_datasets/processed/my_dataset/mano/bimanual/pick_cup/0/trajectory_keypoint.npz')
print("Keys:", list(data.keys()))
print("qpos_finger shape:", data['qpos_finger'].shape)
print("qpos_wrist shape:", data['qpos_wrist'].shape)