Data Structure
SPIDER uses a structured file organization to handle multiple datasets, robots, and tasks efficiently.
Directory Layout
example_datasets/
├── raw/ # Raw human motion data
│ └── {dataset_name}/
│ ├── {data_id}.pkl
│ └── meshes/
│
└── processed/ # Processed robot trajectories
└── {dataset_name}/
├── dataset_summary.json # High-level metadata
├── assets/ # Shared assets (avoid duplication)
│ ├── objects/
│ └── robots/
├── mano/ # Extracted human hand data
│ └── {embodiment_type}/
│ └── {task}/
│ └── {data_id}/
│ ├── trajectory_keypoint.npz
│ └── info.json
└── {robot_type}/ # Robot-specific data
└── {embodiment_type}/
└── {task}/
└── {data_id}/
├── scene.xml
├── trajectory_kinematic.npz
├── trajectory_mjwp.npz
├── visualization_mjwp.mp4
└── metrics.jsonPath Resolution
SPIDER automatically resolves paths based on task parameters:
from spider.io import get_processed_data_dir
# Resolve processed directory
processed_dir = get_processed_data_dir(
dataset_dir="/path/to/datasets",
dataset_name="oakink",
robot_type="allegro",
embodiment_type="bimanual",
task="pick_spoon_bowl",
data_id=0
)
# Returns: /path/to/datasets/processed/oakink/allegro/bimanual/pick_spoon_bowl/0/File Formats
Raw Data (raw/{dataset_name}/{data_id}.pkl)
Raw human motion data from dataset processing:
{
'qpos_wrist_right': np.ndarray, # [T, 7] position + quaternion
'qpos_finger_right': np.ndarray, # [T, n_joints] joint angles
'qpos_wrist_left': np.ndarray, # [T, 7]
'qpos_finger_left': np.ndarray, # [T, n_joints]
'qpos_obj_right': np.ndarray, # [T, 7] object pose
'qpos_obj_left': np.ndarray, # [T, 7]
'contact': np.ndarray, # [T, ncon] contact indicators
'contact_pos': np.ndarray # [T, ncon, 3] contact positions
}Trajectory Files (.npz)
trajectory_keypoint.npz
Human hand keypoints (from dataset processor):
{
'qpos_wrist_right': [T, 7],
'qpos_finger_right': [T, 15],
'qpos_obj_right': [T, 7],
# ... left hand equivalents
}trajectory_kinematic.npz
Kinematic retargeting result (from IK):
{
'qpos': [T, nq], # Full robot configuration
'qvel': [T, nv], # Velocities
'ctrl': [T, nu], # Control targets
'contact': [T, ncon], # Contact indicators
'contact_pos': [T, ncon, 3] # Contact positions
}trajectory_mjwp.npz
Physics-optimized trajectory (from MJWP):
{
'qpos': [N_steps, ctrl_steps, nq], # Per-step configs
'qvel': [N_steps, ctrl_steps, nv], # Velocities
'ctrl': [N_steps, ctrl_steps, nu], # Control commands
'time': [N_steps, ctrl_steps], # Timestamps
'opt_steps': [N_steps], # Iterations per step
'improvement': [N_steps], # Convergence metric
'rewards': [N_steps, num_samples], # Per-sample rewards
}Scene Files
scene.xml
MuJoCo scene definition:
<mujoco model="scene">
<!-- Solver settings -->
<option timestep="0.01" iterations="20" ...>
<!-- Robot model -->
<include file="../../assets/robots/allegro/bimanual.xml"/>
<!-- Object geometries -->
<body name="object_right" pos="0 0 0">
<freejoint/>
<geom type="mesh" mesh="obj_visual"/>
<geom type="mesh" mesh="obj_collision_0" .../>
...
</body>
<!-- Contact pairs -->
<contact>
<pair geom1="finger_tip" geom2="obj_collision_0"/>
...
</contact>
</mujoco>scene_eq.xml
Scene with equality constraints (for num_dyn > 1):
Includes additional <equality> elements for virtual contact constraints used in annealing.
Metadata Files
task_info.json
Task-level metadata:
{
"dataset_name": "oakink",
"task": "pick_spoon_bowl",
"embodiment_type": "bimanual",
"ref_dt": 0.02,
"object_names": ["spoon", "bowl"],
"num_frames": 250
}info.json
Trial-level metadata:
{
"robot_type": "allegro",
"data_id": 0,
"timestamp": "2025-01-15T10:30:00",
"object_asset_paths": {
"spoon": "../../../../assets/objects/spoon/visual.obj",
"bowl": "../../../../assets/objects/bowl/visual.obj"
}
}metrics.json
Performance metrics:
{
"success_rate": 0.95,
"avg_position_error": 0.012,
"avg_rotation_error": 0.08,
"avg_optimization_time": 0.234,
"total_simulation_steps": 300
}Asset Organization
Object Assets
Shared object meshes to avoid duplication:
assets/objects/{object_name}/
├── visual.obj # High-res visual mesh
└── convex/ # Convex decomposition
├── 0.obj
├── 1.obj
├── 2.obj
└── ...Robot Assets
Robot MJCF files:
assets/robots/{robot_type}/
├── left.xml # Left hand only
├── right.xml # Right hand only
└── bimanual.xml # Both handsLoading Data
In Python
import numpy as np
from spider.io import load_data
from spider.config import Config, process_config
# Create and process config
config = Config(
dataset_name="oakink",
robot_type="allegro",
embodiment_type="bimanual",
task="pick_spoon_bowl",
data_id=0
)
config = process_config(config)
# Load trajectory
qpos, qvel, ctrl, contact, contact_pos = load_data(
config,
config.data_path
)
print(f"Trajectory length: {qpos.shape[0]}")
print(f"DOF: {qpos.shape[1]}")Manual Loading
import numpy as np
# Load trajectory
data = np.load('trajectory_mjwp.npz')
# Access arrays
qpos = data['qpos'] # [T, nq]
qvel = data['qvel'] # [T, nv]
ctrl = data['ctrl'] # [T, nu]
# Reshape if needed (MJWP saves per-step)
if qpos.ndim == 3:
qpos = qpos.reshape(-1, qpos.shape[-1])Data Flow Through Pipeline
graph TD
A[Raw Data<br/>raw/dataset/data.pkl] --> B[Dataset Processor]
B --> C[Keypoints<br/>mano/.../trajectory_keypoint.npz]
C --> D[Preprocessing Pipeline]
D --> E[Object Decomposition<br/>assets/objects/*/convex/*.obj]
D --> F[Scene XML<br/>robot/.../scene.xml]
D --> G[Kinematic Trajectory<br/>robot/.../trajectory_kinematic.npz]
G --> H[Physics Optimization]
H --> I[Optimized Trajectory<br/>robot/.../trajectory_mjwp.npz]
H --> J[Video<br/>robot/.../visualization_mjwp.mp4]
style C fill:#e1f5ff
style G fill:#fff4e1
style I fill:#ffe1e1Configuration Paths
Paths in Config are automatically resolved:
@dataclass
class Config:
dataset_dir: str = "example_datasets" # Base directory
dataset_name: str = "oakink"
robot_type: str = "allegro"
embodiment_type: str = "bimanual"
task: str = "pick_spoon_bowl"
data_id: int = 0
# Auto-set by process_config():
model_path: str = "" # Path to scene.xml
data_path: str = "" # Path to trajectory_kinematic.npz
output_dir: str = "" # Where to save resultsAfter process_config():
config.model_path = "{dataset_dir}/processed/{dataset_name}/{embodiment_type}/{task}/{data_id}/scene.xml"
config.data_path = "{...}/{robot_type}/{embodiment_type}/{task}/{data_id}/trajectory_kinematic.npz"
config.output_dir = "{...}/{robot_type}/{embodiment_type}/{task}/{data_id}/"Best Practices
Organization
- Keep raw data separate: Never modify
raw/directory - Share assets: Use
assets/for common objects/robots - One trial per directory: Each
{data_id}is self-contained - Relative paths in JSON: Use relative paths for portability
Naming Conventions
- Tasks: Use descriptive names (e.g.,
pick_cup,open_jar) - Objects: Match names across datasets
- Robots: Use official model names (e.g.,
allegro,inspire)
Cleanup
Remove intermediate files after successful optimization:
# Keep only final results
rm trajectory_kinematic.npz
rm scene.xml # If not needed for replayTroubleshooting
Path Not Found
Check that all path components match:
# Verify structure
ls example_datasets/processed/oakink/allegro/bimanual/pick_spoon_bowl/0/
# Check config values
python -c "from spider.config import Config, process_config; \
c = process_config(Config()); print(c.output_dir)"Missing Assets
Ensure assets are in shared location:
# Check object assets
ls example_datasets/processed/oakink/assets/objects/
# Check robot assets
ls example_datasets/processed/oakink/assets/robots/Corrupted NPZ Files
Validate NPZ integrity:
import numpy as np
try:
data = np.load('trajectory.npz')
print("Keys:", list(data.keys()))
print("Shapes:", {k: v.shape for k, v in data.items()})
except Exception as e:
print(f"Error loading: {e}")