Skip to main content

Policy Evaluation Usage

Overview

This guide demonstrates how to evaluate a trained policy on LW-BenchHub tasks. The evaluation process involves two steps:

  1. Start the environment server (env_server.py) - Creates a remote environment service that waits for configuration
  2. Run policy evaluation (eval_policy.py) - Connects to the server, sends environment configuration via attach(), and evaluates the policy

LW-BenchHub's Advanced Distributed Architecture

Decoupled Policy-Environment Design

LW-BenchHub adopts a cutting-edge distributed architecture that fundamentally separates the policy inference engine from the simulation environment. This design philosophy brings several transformative advantages:

🚀 Zero-Dependency Isolation

The policy and environment run in completely independent processes with isolated Python environments. This architecture eliminates the notorious "dependency hell" problem:

  • Policy side can use any deep learning framework with specific versions without conflicts
  • Environment side runs Isaac Lab with its required dependencies independently
  • Rapid iteration: Update policy models without restarting the heavy simulation environment

⚡ High-Performance Zero-Copy Communication

The framework implements an optimized inter-process communication (IPC) protocol with shared memory for data transfer:

  • Seamless remote environment access - clients interact with remote environments as if they were local, with transparent API calls
  • Zero-copy data sharing via shared memory regions - large observation data (multi-camera RGB-D streams) are transferred without serialization
  • Sub-millisecond latency for observation-action loops, enabling real-time policy evaluation with negligible overhead

🔄 Flexible Deployment Modes

The distributed design supports multiple deployment paradigms:

┌─────────────────────────────────────────────────────────────┐
│ Local IPC Mode (Default) │
│ ┌──────────────┐ Shared Memory ┌──────────────────┐ │
│ │ Policy │ ←─────────────────→ │ Environment │ │
│ │ (PyTorch) │ Zero-Copy IPC │ (Isaac Lab) │ │
│ └──────────────┘ └──────────────────┘ │
│ Same Machine - Microsecond Latency │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│ Remote RESTful Mode │
│ ┌──────────────┐ Network/HTTP ┌──────────────────┐ │
│ │ Policy │ ←─────────────────→ │ Environment │ │
│ │ (CPU/Edge) │ Compression │ (GPU Cluster) │ │
│ └──────────────┘ └──────────────────┘ │
│ Cross-Machine - Network Resilient │
└─────────────────────────────────────────────────────────────┘

This sophisticated architecture makes LW-BenchHub uniquely suitable for:

  • Research labs with heterogeneous computing environments
  • Production robotics systems requiring high reliability
  • Large-scale policy benchmarking and ablation studies
  • Real-robot deployment where policy runs on robot hardware while simulation serves as digital twin

Prerequisites

  • LW-BenchHub installed
  • Trained policy checkpoint
  • Policy configuration file (includes environment configuration)

Step 1: Start Environment Server

The environment server creates a remote service that waits for configuration from the policy client. Unlike the previous architecture, no task configuration is needed at server startup.

Basic Usage

cd lw_benchhub
python lw_benchhub/scripts/env_server.py

Server Parameters

# Basic usage with default IPC protocol
python lw_benchhub/scripts/env_server.py

# Custom IPC host and port
python lw_benchhub/scripts/env_server.py --ipc_host 127.0.0.1 --ipc_port 50000

# Use RESTful protocol for remote access
python lw_benchhub/scripts/env_server.py --remote_protocol restful --restful_host 0.0.0.0 --restful_port 8000

# Enable cameras (required for visual observations)
python lw_benchhub/scripts/env_server.py --enable_camera

# Run without GUI viewer
python lw_benchhub/scripts/env_server.py --headless

Available Server Arguments

ArgumentDefaultDescription
--remote_protocolipcCommunication protocol (ipc or restful)
--ipc_host127.0.0.1IPC host address
--ipc_port50000IPC port number
--ipc_authkeylightwheelIPC authentication key
--restful_host0.0.0.0RESTful server host
--restful_port8000RESTful server port
--headlesstrueRun without GUI viewer (default is headless)
--enable_camerafalseEnable camera rendering for visual observations

Server Startup

When you run the server, you should see output like:

Waiting for connection on ('127.0.0.1', 50000)...
Press Ctrl+C to stop the server

Keep this terminal running - the server must stay active for the evaluation script to connect.

Step 2: Run Policy Evaluation

Once the environment server is running, open a new terminal and run the policy evaluation script.

Important: The policy evaluation script runs in a separate Python environment from the simulation server. You only need to install lw_benchhub and your policy's dependencies (e.g., PyTorch, transformers) in this environment - no need to install Isaac Lab, Isaac Sim, or any simulation dependencies. This lightweight setup enables rapid policy development and testing without the overhead of full simulation stack installation.

Basic Usage

cd lw_benchhub
python lw_benchhub/scripts/policy/eval_policy.py --config policy/GR00T/deploy_policy_lerobot.yml

Policy Configuration File

The policy configuration file now includes both policy settings and environment configuration. Create a policy deployment configuration (e.g., policy/GR00T/deploy_policy_lerobot.yml):

# Policy Configuration
policy_name: GR00TPolicy # Policy class name (must match your policy class)

# Model Configuration
checkpoint: /path/to/checkpoint # Path to trained policy checkpoint
instruction: "Grab the block and lift it up." # Task instruction/prompt

# Policy-specific parameters
embodiment_tag: 'new_embodiment'
action_horizon: 16
denoising_steps: 4
num_feedback_actions: 16
data_config: 'so100_dualcam'

# Observation Configuration
observation_config:
custom_mapping:
# Map environment observation keys to policy input keys
video.front: global_camera # Front camera view
video.wrist: hand_camera # Wrist camera view
state.single_arm: {joint_pos: [0,1,2,3,4]} # Arm joint positions
state.gripper: {joint_pos: [5]} # Gripper position

# Evaluation Settings
record_camera: ["global_camera"] # Cameras to record in evaluation video
time_out_limit: 50 # Maximum steps per episode
height: 480 # Camera image height
width: 480 # Camera image width

# Environment Configuration (sent to server via attach())
env_cfg:
task: LiftObj # Task name
robot: LeRobot-AbsJointGripper-RL # Robot type
layout: robocasakitchen-9-8 # Scene layout
scene_backend: robocasa # Scene backend
task_backend: robocasa # Task backend
device: cuda:0 # Device for simulation
num_envs: 1 # Number of parallel environments
usd_simplify: false # USD simplification
enable_cameras: true # Enable camera observations
video: false # Record video in environment
disable_fabric: false # Fabric settings
robot_scale: 1.0 # Robot scale factor
seed: 42 # Random seed
for_rl: false # RL mode (false for policy evaluation)
variant: Visual # Observation variant (Visual/State)
concatenate_terms: false # Concatenate observation terms
distributed: false # Multi-GPU training mode

Environment Configuration Options

The env_cfg section specifies all environment parameters that are sent to the server via attach():

ParameterTypeDescription
taskstringTask name (e.g., LiftObj, CloseDishwasher)
robotstringRobot type (e.g., LeRobot-AbsJointGripper-RL)
layoutstringScene layout (e.g., robocasakitchen, robocasakitchen-9-8)
scene_backendstringScene backend (robocasa)
task_backendstringTask backend (robocasa)
devicestringCUDA device for simulation
num_envsintNumber of parallel environments
usd_simplifyboolUSD simplification for faster loading
videoboolRecord video in environment
seedintRandom seed for reproducibility
variantstringObservation variant (Visual or State)

Observation Mapping

The observation_config.custom_mapping in your policy config maps environment observations to your policy's expected input format:

Environment Observation Keys (from LW-BenchHub):

  • global_camera: Front/global camera view
  • hand_camera: Wrist/hand camera view
  • joint_pos: Robot joint positions
  • joint_vel: Robot joint velocities
  • joint_target_pos: Target joint positions (previous action)

Policy Input Keys (your model's expected format):

Different policies expect different input formats. Map accordingly:

# Example for GR00T Policy
observation_config:
custom_mapping:
video.front: global_camera
video.wrist: hand_camera
state.single_arm: {joint_pos: [0,1,2,3,4]}
state.gripper: {joint_pos: [5]}

# Example for PI Policy
observation_config:
custom_mapping:
images/front: global_camera
images/wrist: hand_camera
state: joint_pos
action: joint_target_pos

# Example for GR00T Policy
observation_config:
custom_mapping:
observation.image.front: global_camera
observation.image.wrist: hand_camera
observation.state.joint_pos: joint_pos

Step 3: View Results

Evaluation videos will be saved to:

lw_benchhub/eval_result/episode0.mp4
lw_benchhub/eval_result/episode1.mp4
...

Success rate will be printed:

Testing policy: 100%|██████████| 10/10 [02:30<00:00, 15.0s/it]
Success rate: 0.8

Quick Start Example: Evaluating GR00T on LiftCube

We provide a pre-trained GR00T checkpoint to help you get started quickly with the LiftCube task.

Step 1: Download Pre-trained Checkpoint

Download the GR00T checkpoint from Hugging Face:

# Download checkpoint
git lfs install
git clone https://huggingface.co/LightwheelAI/gr00t15_LiftCube

Or download directly from: https://huggingface.co/LightwheelAI/gr00t15_LiftCube/tree/main/checkpoint-9000

Step 2: Set Up GR00T Environment

Install the GR00T framework following the official guide:

# Clone GR00T repository
git clone https://github.com/NVIDIA/Isaac-GR00T
cd Isaac-GR00T

# Follow installation instructions from the repository
# This creates a separate Python environment for GR00T

Step 3: Install LW-BenchHub in GR00T Environment

# Activate your GR00T Python environment
conda activate groot # or your GR00T environment name

# Install lw_benchhub (lightweight installation - no simulation dependencies needed)
cd /path/to/lw_benchhub
pip install -e .

Note: You only need LW-BenchHub's client library in the GR00T environment, not Isaac Lab or Isaac Sim.

Step 4: Start Environment Server

In a separate terminal, activate your lw_benchhub environment (with Isaac Lab) and start the server:

# Activate lw_benchhub environment (with Isaac Lab installed)
conda activate lw_benchhub

# Navigate to lw_benchhub directory
cd /path/to/lw_benchhub

# Start environment server (no task config needed!)
python lw_benchhub/scripts/env_server.py

The server will start and display:

Waiting for connection on ('127.0.0.1', 50000)...
Press Ctrl+C to stop the server

Step 5: Run GR00T Policy Evaluation

In a new terminal, activate your GR00T environment and run the evaluation:

# Activate GR00T environment
conda activate gr00t

# Navigate to lw_benchhub directory
cd /path/to/lw_benchhub

# Run policy evaluation (env config is in the policy config file)
python lw_benchhub/scripts/policy/eval_policy.py \
--config policy/GR00T/deploy_policy_lerobot.yml

Expected Output

Connecting to environment server at 127.0.0.1:50000...
Connected successfully!
Attaching environment with config: LiftObj on robocasakitchen-9-8
[INFO-50000]: Attached environment to <ManagerBasedEnv>
Loading GR00T model...
Successfully loaded GR00T policy!

Evaluating policy: 100%|██████████| 10/10 [01:45<00:00, 10.5s/it]
Success rate: 0.9

Evaluation videos saved to: eval_result/

Environment Summary

Terminal 1 (lw_benchhub env with Isaac Lab):
└─ env_server.py → Waits for attach(), then runs simulation on GPU

Terminal 2 (GR00T env, lightweight):
└─ eval_policy.py → Sends env_cfg via attach(), runs policy inference

Communication: Zero-copy IPC via shared memory
Lifecycle: attach() → step/reset → close_connection()

This example demonstrates the power of LW-BenchHub's decoupled architecture - you can run a complex policy like GR00T without installing the full simulation stack in the policy environment!

Summary

To evaluate a policy on LW-BenchHub tasks:

1. Prepare configuration:

  • Policy config with env_cfg section (task, robot, cameras, etc.)
  • Model checkpoint and policy-specific parameters

2. Start environment server:

python lw_benchhub/scripts/env_server.py

3. Run policy evaluation:

python lw_benchhub/scripts/policy/eval_policy.py \
--config policy/YourPolicy/deploy_policy.yml

4. Check results:

  • Videos in eval_result/video/
  • Results JSON in eval_result/eval_results.json
  • Success rate in terminal output

Key Points:

  • Environment config in policy file: All environment parameters are now specified in the policy configuration file under env_cfg
  • Attach/Detach lifecycle: Server waits for attach() with configuration, can be reconfigured without restart
  • Observation mapping: Ensure observation_config.custom_mapping correctly maps simulation output keys to your model's expected input keys
  • Action mapping: If your model's joint order differs from simulation, use joint_mapping to reorder
  • Dimension matching: Verify action dimension matches robot DoF, observation resolution matches training setup
  • Format consistency: Camera image format (HWC/CHW), state vector order must align with training data