Data Structure

SPIDER uses a structured file organization to handle multiple datasets, robots, and tasks efficiently.

Directory Layout

example_datasets/
├── raw/                          # Raw human motion data
│   └── {dataset_name}/
│       ├── {data_id}.pkl
│       └── meshes/
│
└── processed/                    # Processed robot trajectories
    └── {dataset_name}/
        ├── dataset_summary.json  # High-level metadata
        ├── assets/               # Shared assets (avoid duplication)
        │   ├── objects/
        │   └── robots/
        ├── mano/                 # Extracted human hand data
        │   └── {embodiment_type}/
        │       └── {task}/
        │           └── {data_id}/
        │               ├── trajectory_keypoint.npz
        │               └── info.json
        └── {robot_type}/         # Robot-specific data
            └── {embodiment_type}/
                └── {task}/
                    └── {data_id}/
                        ├── scene.xml
                        ├── trajectory_kinematic.npz
                        ├── trajectory_mjwp.npz
                        ├── visualization_mjwp.mp4
                        └── metrics.json

Path Resolution

SPIDER automatically resolves paths based on task parameters:

python

from spider.io import get_processed_data_dir

# Resolve processed directory
processed_dir = get_processed_data_dir(
    dataset_dir="/path/to/datasets",
    dataset_name="oakink",
    robot_type="allegro",
    embodiment_type="bimanual",
    task="pick_spoon_bowl",
    data_id=0
)
# Returns: /path/to/datasets/processed/oakink/allegro/bimanual/pick_spoon_bowl/0/

File Formats

Raw Data (`raw/{dataset_name}/{data_id}.pkl`)

Raw human motion data from dataset processing:

python

{
    'qpos_wrist_right': np.ndarray,    # [T, 7] position + quaternion
    'qpos_finger_right': np.ndarray,   # [T, n_joints] joint angles
    'qpos_wrist_left': np.ndarray,     # [T, 7]
    'qpos_finger_left': np.ndarray,    # [T, n_joints]
    'qpos_obj_right': np.ndarray,      # [T, 7] object pose
    'qpos_obj_left': np.ndarray,       # [T, 7]
    'contact': np.ndarray,             # [T, ncon] contact indicators
    'contact_pos': np.ndarray          # [T, ncon, 3] contact positions
}

Trajectory Files (`.npz`)

`trajectory_keypoint.npz`

Human hand keypoints (from dataset processor):

python

{
    'qpos_wrist_right': [T, 7],
    'qpos_finger_right': [T, 15],
    'qpos_obj_right': [T, 7],
    # ... left hand equivalents
}

`trajectory_kinematic.npz`

Kinematic retargeting result (from IK):

python

{
    'qpos': [T, nq],      # Full robot configuration
    'qvel': [T, nv],      # Velocities
    'ctrl': [T, nu],      # Control targets
    'contact': [T, ncon], # Contact indicators
    'contact_pos': [T, ncon, 3]  # Contact positions
}

`trajectory_mjwp.npz`

Physics-optimized trajectory (from MJWP):

python

{
    'qpos': [N_steps, ctrl_steps, nq],  # Per-step configs
    'qvel': [N_steps, ctrl_steps, nv],  # Velocities
    'ctrl': [N_steps, ctrl_steps, nu],  # Control commands
    'time': [N_steps, ctrl_steps],      # Timestamps
    'opt_steps': [N_steps],             # Iterations per step
    'improvement': [N_steps],           # Convergence metric
    'rewards': [N_steps, num_samples],  # Per-sample rewards
}

Scene Files

`scene.xml`

MuJoCo scene definition:

xml

<mujoco model="scene">
    <!-- Solver settings -->
    <option timestep="0.01" iterations="20" ...>

    <!-- Robot model -->
    <include file="../../assets/robots/allegro/bimanual.xml"/>

    <!-- Object geometries -->
    <body name="object_right" pos="0 0 0">
        <freejoint/>
        <geom type="mesh" mesh="obj_visual"/>
        <geom type="mesh" mesh="obj_collision_0" .../>
        ...
    </body>

    <!-- Contact pairs -->
    <contact>
        <pair geom1="finger_tip" geom2="obj_collision_0"/>
        ...
    </contact>
</mujoco>

`scene_eq.xml`

Scene with equality constraints (for num_dyn > 1):

Includes additional <equality> elements for virtual contact constraints used in annealing.

Metadata Files

`task_info.json`

Task-level metadata:

json

{
  "dataset_name": "oakink",
  "task": "pick_spoon_bowl",
  "embodiment_type": "bimanual",
  "ref_dt": 0.02,
  "object_names": ["spoon", "bowl"],
  "num_frames": 250
}

`info.json`

Trial-level metadata:

json

{
  "robot_type": "allegro",
  "data_id": 0,
  "timestamp": "2025-01-15T10:30:00",
  "object_asset_paths": {
    "spoon": "../../../../assets/objects/spoon/visual.obj",
    "bowl": "../../../../assets/objects/bowl/visual.obj"
  }
}

`metrics.json`

Performance metrics:

json

{
  "success_rate": 0.95,
  "avg_position_error": 0.012,
  "avg_rotation_error": 0.08,
  "avg_optimization_time": 0.234,
  "total_simulation_steps": 300
}

Asset Organization

Object Assets

Shared object meshes to avoid duplication:

assets/objects/{object_name}/
├── visual.obj           # High-res visual mesh
└── convex/             # Convex decomposition
    ├── 0.obj
    ├── 1.obj
    ├── 2.obj
    └── ...

Robot Assets

Robot MJCF files:

assets/robots/{robot_type}/
├── left.xml          # Left hand only
├── right.xml         # Right hand only
└── bimanual.xml      # Both hands

Loading Data

In Python

python

import numpy as np
from spider.io import load_data
from spider.config import Config, process_config

# Create and process config
config = Config(
    dataset_name="oakink",
    robot_type="allegro",
    embodiment_type="bimanual",
    task="pick_spoon_bowl",
    data_id=0
)
config = process_config(config)

# Load trajectory
qpos, qvel, ctrl, contact, contact_pos = load_data(
    config,
    config.data_path
)

print(f"Trajectory length: {qpos.shape[0]}")
print(f"DOF: {qpos.shape[1]}")

Manual Loading

python

import numpy as np

# Load trajectory
data = np.load('trajectory_mjwp.npz')

# Access arrays
qpos = data['qpos']  # [T, nq]
qvel = data['qvel']  # [T, nv]
ctrl = data['ctrl']  # [T, nu]

# Reshape if needed (MJWP saves per-step)
if qpos.ndim == 3:
    qpos = qpos.reshape(-1, qpos.shape[-1])

Data Flow Through Pipeline

mermaid

graph TD
    A[Raw Data<br/>raw/dataset/data.pkl] --> B[Dataset Processor]
    B --> C[Keypoints<br/>mano/.../trajectory_keypoint.npz]
    C --> D[Preprocessing Pipeline]
    D --> E[Object Decomposition<br/>assets/objects/*/convex/*.obj]
    D --> F[Scene XML<br/>robot/.../scene.xml]
    D --> G[Kinematic Trajectory<br/>robot/.../trajectory_kinematic.npz]
    G --> H[Physics Optimization]
    H --> I[Optimized Trajectory<br/>robot/.../trajectory_mjwp.npz]
    H --> J[Video<br/>robot/.../visualization_mjwp.mp4]

    style C fill:#e1f5ff
    style G fill:#fff4e1
    style I fill:#ffe1e1

Configuration Paths

Paths in Config are automatically resolved:

python

@dataclass
class Config:
    dataset_dir: str = "example_datasets"  # Base directory
    dataset_name: str = "oakink"
    robot_type: str = "allegro"
    embodiment_type: str = "bimanual"
    task: str = "pick_spoon_bowl"
    data_id: int = 0

    # Auto-set by process_config():
    model_path: str = ""   # Path to scene.xml
    data_path: str = ""    # Path to trajectory_kinematic.npz
    output_dir: str = ""   # Where to save results

After process_config():

python

config.model_path = "{dataset_dir}/processed/{dataset_name}/{embodiment_type}/{task}/{data_id}/scene.xml"
config.data_path = "{...}/{robot_type}/{embodiment_type}/{task}/{data_id}/trajectory_kinematic.npz"
config.output_dir = "{...}/{robot_type}/{embodiment_type}/{task}/{data_id}/"

Best Practices

Organization

Keep raw data separate: Never modify raw/ directory
Share assets: Use assets/ for common objects/robots
One trial per directory: Each {data_id} is self-contained
Relative paths in JSON: Use relative paths for portability

Naming Conventions

Tasks: Use descriptive names (e.g., pick_cup, open_jar)
Objects: Match names across datasets
Robots: Use official model names (e.g., allegro, inspire)

Cleanup

Remove intermediate files after successful optimization:

bash

# Keep only final results
rm trajectory_kinematic.npz
rm scene.xml  # If not needed for replay

Troubleshooting

Path Not Found

Check that all path components match:

bash

# Verify structure
ls example_datasets/processed/oakink/allegro/bimanual/pick_spoon_bowl/0/

# Check config values
python -c "from spider.config import Config, process_config; \
  c = process_config(Config()); print(c.output_dir)"

Missing Assets

Ensure assets are in shared location:

bash

# Check object assets
ls example_datasets/processed/oakink/assets/objects/

# Check robot assets
ls example_datasets/processed/oakink/assets/robots/

Corrupted NPZ Files

Validate NPZ integrity:

python

import numpy as np

try:
    data = np.load('trajectory.npz')
    print("Keys:", list(data.keys()))
    print("Shapes:", {k: v.shape for k, v in data.items()})
except Exception as e:
    print(f"Error loading: {e}")

Data Structure ​

Directory Layout ​

Path Resolution ​

File Formats ​

Raw Data (raw/{dataset_name}/{data_id}.pkl) ​

Trajectory Files (.npz) ​

trajectory_keypoint.npz ​

trajectory_kinematic.npz ​

trajectory_mjwp.npz ​

Scene Files ​

scene.xml ​

scene_eq.xml ​

Metadata Files ​

task_info.json ​

info.json ​

metrics.json ​

Asset Organization ​

Object Assets ​

Robot Assets ​

Loading Data ​

In Python ​

Manual Loading ​

Data Flow Through Pipeline ​

Configuration Paths ​

Best Practices ​

Organization ​

Naming Conventions ​

Cleanup ​

Troubleshooting ​

Path Not Found ​

Missing Assets ​

Corrupted NPZ Files ​

Data Structure

Directory Layout

Path Resolution

File Formats

Raw Data (`raw/{dataset_name}/{data_id}.pkl`)

Trajectory Files (`.npz`)

`trajectory_keypoint.npz`

`trajectory_kinematic.npz`

`trajectory_mjwp.npz`

Scene Files

`scene.xml`

`scene_eq.xml`

Metadata Files

`task_info.json`

`info.json`

`metrics.json`

Asset Organization

Object Assets

Robot Assets

Loading Data

In Python

Manual Loading

Data Flow Through Pipeline

Configuration Paths

Best Practices

Organization

Naming Conventions

Cleanup

Troubleshooting

Path Not Found

Missing Assets

Corrupted NPZ Files