Policy Evaluation Usage

Overview

This guide demonstrates how to evaluate a trained policy on LW-BenchHub tasks. The evaluation process involves two steps:

Start the environment server (env_server.py) - Creates a remote environment service that waits for configuration
Run policy evaluation (eval_policy.py) - Connects to the server, sends environment configuration via attach(), and evaluates the policy

LW-BenchHub's Advanced Distributed Architecture

Decoupled Policy-Environment Design

LW-BenchHub adopts a cutting-edge distributed architecture that fundamentally separates the policy inference engine from the simulation environment. This design philosophy brings several transformative advantages:

🚀 Zero-Dependency Isolation

The policy and environment run in completely independent processes with isolated Python environments. This architecture eliminates the notorious "dependency hell" problem:

Policy side can use any deep learning framework with specific versions without conflicts
Environment side runs Isaac Lab with its required dependencies independently
Rapid iteration: Update policy models without restarting the heavy simulation environment

⚡ High-Performance Zero-Copy Communication

The framework implements an optimized inter-process communication (IPC) protocol with shared memory for data transfer:

Seamless remote environment access - clients interact with remote environments as if they were local, with transparent API calls
Zero-copy data sharing via shared memory regions - large observation data (multi-camera RGB-D streams) are transferred without serialization
Sub-millisecond latency for observation-action loops, enabling real-time policy evaluation with negligible overhead

🔄 Flexible Deployment Modes

The distributed design supports multiple deployment paradigms:

┌─────────────────────────────────────────────────────────────┐
│                    Local IPC Mode (Default)                  │
│  ┌──────────────┐    Shared Memory    ┌──────────────────┐  │
│  │   Policy     │ ←─────────────────→ │   Environment    │  │
│  │  (PyTorch)   │   Zero-Copy IPC     │  (Isaac Lab)     │  │
│  └──────────────┘                     └──────────────────┘  │
│      Same Machine - Microsecond Latency                      │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                  Remote RESTful Mode                         │
│  ┌──────────────┐    Network/HTTP    ┌──────────────────┐  │
│  │   Policy     │ ←─────────────────→ │   Environment    │  │
│  │ (CPU/Edge)   │    Compression      │  (GPU Cluster)   │  │
│  └──────────────┘                     └──────────────────┘  │
│    Cross-Machine - Network Resilient                         │
└─────────────────────────────────────────────────────────────┘

This sophisticated architecture makes LW-BenchHub uniquely suitable for:

Research labs with heterogeneous computing environments
Production robotics systems requiring high reliability
Large-scale policy benchmarking and ablation studies
Real-robot deployment where policy runs on robot hardware while simulation serves as digital twin

Prerequisites

LW-BenchHub installed
Trained policy checkpoint
Policy configuration file (includes environment configuration)

Step 1: Start Environment Server

The environment server creates a remote service that waits for configuration from the policy client. Unlike the previous architecture, no task configuration is needed at server startup.

Basic Usage

cd lw_benchhub
python lw_benchhub/scripts/env_server.py

Server Parameters

# Basic usage with default IPC protocol
python lw_benchhub/scripts/env_server.py

# Custom IPC host and port
python lw_benchhub/scripts/env_server.py --ipc_host 127.0.0.1 --ipc_port 50000

# Use RESTful protocol for remote access
python lw_benchhub/scripts/env_server.py --remote_protocol restful --restful_host 0.0.0.0 --restful_port 8000

# Enable cameras (required for visual observations)
python lw_benchhub/scripts/env_server.py --enable_camera

# Run without GUI viewer 
python lw_benchhub/scripts/env_server.py --headless

Available Server Arguments

Argument	Default	Description
`--remote_protocol`	`ipc`	Communication protocol (`ipc` or `restful`)
`--ipc_host`	`127.0.0.1`	IPC host address
`--ipc_port`	`50000`	IPC port number
`--ipc_authkey`	`lightwheel`	IPC authentication key
`--restful_host`	`0.0.0.0`	RESTful server host
`--restful_port`	`8000`	RESTful server port
`--headless`	`true`	Run without GUI viewer (default is headless)
`--enable_camera`	`false`	Enable camera rendering for visual observations

Server Startup

When you run the server, you should see output like:

Waiting for connection on ('127.0.0.1', 50000)...
Press Ctrl+C to stop the server

Keep this terminal running - the server must stay active for the evaluation script to connect.

Step 2: Run Policy Evaluation

Once the environment server is running, open a new terminal and run the policy evaluation script.

Important: The policy evaluation script runs in a separate Python environment from the simulation server. You only need to install lw_benchhub and your policy's dependencies (e.g., PyTorch, transformers) in this environment - no need to install Isaac Lab, Isaac Sim, or any simulation dependencies. This lightweight setup enables rapid policy development and testing without the overhead of full simulation stack installation.

Basic Usage

cd lw_benchhub
python lw_benchhub/scripts/policy/eval_policy.py --config policy/GR00T/deploy_policy_lerobot.yml

Policy Configuration File

The policy configuration file now includes both policy settings and environment configuration. Create a policy deployment configuration (e.g., policy/GR00T/deploy_policy_lerobot.yml):

# Policy Configuration
policy_name: GR00TPolicy                # Policy class name (must match your policy class)

# Model Configuration
checkpoint: /path/to/checkpoint       # Path to trained policy checkpoint
instruction: "Grab the block and lift it up."  # Task instruction/prompt

# Policy-specific parameters
embodiment_tag: 'new_embodiment'
action_horizon: 16
denoising_steps: 4
num_feedback_actions: 16
data_config: 'so100_dualcam'

# Observation Configuration
observation_config:
  custom_mapping:
    # Map environment observation keys to policy input keys
    video.front: global_camera          # Front camera view
    video.wrist: hand_camera            # Wrist camera view
    state.single_arm: {joint_pos: [0,1,2,3,4]}  # Arm joint positions
    state.gripper: {joint_pos: [5]}     # Gripper position

# Evaluation Settings
record_camera: ["global_camera"]        # Cameras to record in evaluation video
time_out_limit: 50                      # Maximum steps per episode
height: 480                             # Camera image height
width: 480                              # Camera image width

# Environment Configuration (sent to server via attach())
env_cfg:
  task: LiftObj                         # Task name
  robot: LeRobot-AbsJointGripper-RL     # Robot type
  layout: robocasakitchen-9-8           # Scene layout
  scene_backend: robocasa               # Scene backend
  task_backend: robocasa                # Task backend
  device: cuda:0                        # Device for simulation
  num_envs: 1                           # Number of parallel environments
  usd_simplify: false                   # USD simplification
  enable_cameras: true                  # Enable camera observations
  video: false                          # Record video in environment
  disable_fabric: false                 # Fabric settings
  robot_scale: 1.0                      # Robot scale factor
  seed: 42                              # Random seed
  for_rl: false                         # RL mode (false for policy evaluation)
  variant: Visual                       # Observation variant (Visual/State)
  concatenate_terms: false              # Concatenate observation terms
  distributed: false                    # Multi-GPU training mode

Environment Configuration Options

The env_cfg section specifies all environment parameters that are sent to the server via attach():

Parameter	Type	Description
`task`	string	Task name (e.g., `LiftObj`, `CloseDishwasher`)
`robot`	string	Robot type (e.g., `LeRobot-AbsJointGripper-RL`)
`layout`	string	Scene layout (e.g., `robocasakitchen`, `robocasakitchen-9-8`)
`scene_backend`	string	Scene backend (`robocasa`)
`task_backend`	string	Task backend (`robocasa`)
`device`	string	CUDA device for simulation
`num_envs`	int	Number of parallel environments
`usd_simplify`	bool	USD simplification for faster loading
`video`	bool	Record video in environment
`seed`	int	Random seed for reproducibility
`variant`	string	Observation variant (`Visual` or `State`)

Observation Mapping

The observation_config.custom_mapping in your policy config maps environment observations to your policy's expected input format:

Environment Observation Keys (from LW-BenchHub):

global_camera: Front/global camera view
hand_camera: Wrist/hand camera view
joint_pos: Robot joint positions
joint_vel: Robot joint velocities
joint_target_pos: Target joint positions (previous action)

Policy Input Keys (your model's expected format):

Different policies expect different input formats. Map accordingly:

# Example for GR00T Policy
observation_config:
  custom_mapping:
    video.front: global_camera
    video.wrist: hand_camera
    state.single_arm: {joint_pos: [0,1,2,3,4]}
    state.gripper: {joint_pos: [5]}

# Example for PI Policy  
observation_config:
  custom_mapping:
    images/front: global_camera
    images/wrist: hand_camera
    state: joint_pos
    action: joint_target_pos

# Example for GR00T Policy  
observation_config:
  custom_mapping:
    observation.image.front: global_camera
    observation.image.wrist: hand_camera
    observation.state.joint_pos: joint_pos

Step 3: View Results

Evaluation videos will be saved to:

lw_benchhub/eval_result/episode0.mp4
lw_benchhub/eval_result/episode1.mp4
...

Success rate will be printed:

Testing policy: 100%|██████████| 10/10 [02:30<00:00, 15.0s/it]
Success rate: 0.8

Quick Start Example: Evaluating GR00T on LiftCube

We provide a pre-trained GR00T checkpoint to help you get started quickly with the LiftCube task.

Step 1: Download Pre-trained Checkpoint

Download the GR00T checkpoint from Hugging Face:

# Download checkpoint
git lfs install
git clone https://huggingface.co/LightwheelAI/gr00t15_LiftCube

Or download directly from: https://huggingface.co/LightwheelAI/gr00t15_LiftCube/tree/main/checkpoint-9000

Step 2: Set Up GR00T Environment

Install the GR00T framework following the official guide:

# Clone GR00T repository
git clone https://github.com/NVIDIA/Isaac-GR00T
cd Isaac-GR00T

# Follow installation instructions from the repository
# This creates a separate Python environment for GR00T

Step 3: Install LW-BenchHub in GR00T Environment

# Activate your GR00T Python environment
conda activate groot  # or your GR00T environment name

# Install lw_benchhub (lightweight installation - no simulation dependencies needed)
cd /path/to/lw_benchhub
pip install -e .

Note: You only need LW-BenchHub's client library in the GR00T environment, not Isaac Lab or Isaac Sim.

Step 4: Start Environment Server

In a separate terminal, activate your lw_benchhub environment (with Isaac Lab) and start the server:

# Activate lw_benchhub environment (with Isaac Lab installed)
conda activate lw_benchhub

# Navigate to lw_benchhub directory
cd /path/to/lw_benchhub

# Start environment server (no task config needed!)
python lw_benchhub/scripts/env_server.py

The server will start and display:

Waiting for connection on ('127.0.0.1', 50000)...
Press Ctrl+C to stop the server

Step 5: Run GR00T Policy Evaluation

In a new terminal, activate your GR00T environment and run the evaluation:

# Activate GR00T environment
conda activate gr00t

# Navigate to lw_benchhub directory
cd /path/to/lw_benchhub

# Run policy evaluation (env config is in the policy config file)
python lw_benchhub/scripts/policy/eval_policy.py \
    --config policy/GR00T/deploy_policy_lerobot.yml

Expected Output

Connecting to environment server at 127.0.0.1:50000...
Connected successfully!
Attaching environment with config: LiftObj on robocasakitchen-9-8
[INFO-50000]: Attached environment to <ManagerBasedEnv>
Loading GR00T model...
Successfully loaded GR00T policy!

Evaluating policy: 100%|██████████| 10/10 [01:45<00:00, 10.5s/it]
Success rate: 0.9

Evaluation videos saved to: eval_result/

Environment Summary

Terminal 1 (lw_benchhub env with Isaac Lab):
  └─ env_server.py → Waits for attach(), then runs simulation on GPU

Terminal 2 (GR00T env, lightweight):
  └─ eval_policy.py → Sends env_cfg via attach(), runs policy inference
  
Communication: Zero-copy IPC via shared memory
Lifecycle: attach() → step/reset → close_connection()

This example demonstrates the power of LW-BenchHub's decoupled architecture - you can run a complex policy like GR00T without installing the full simulation stack in the policy environment!

Summary

To evaluate a policy on LW-BenchHub tasks:

1. Prepare configuration:

Policy config with env_cfg section (task, robot, cameras, etc.)
Model checkpoint and policy-specific parameters

2. Start environment server:

python lw_benchhub/scripts/env_server.py

3. Run policy evaluation:

python lw_benchhub/scripts/policy/eval_policy.py \
    --config policy/YourPolicy/deploy_policy.yml

4. Check results:

Videos in eval_result/video/
Results JSON in eval_result/eval_results.json
Success rate in terminal output

Key Points:

Environment config in policy file: All environment parameters are now specified in the policy configuration file under env_cfg
Attach/Detach lifecycle: Server waits for attach() with configuration, can be reconfigured without restart
Observation mapping: Ensure observation_config.custom_mapping correctly maps simulation output keys to your model's expected input keys
Action mapping: If your model's joint order differs from simulation, use joint_mapping to reorder
Dimension matching: Verify action dimension matches robot DoF, observation resolution matches training setup
Format consistency: Camera image format (HWC/CHW), state vector order must align with training data

Troubleshooting

AuthenticationError

Error: If you encounter an error when calling env.step(torch.rand(x, y)), the error log will show:

multiprocessing.context.AuthenticationError: digest received was wrong  
multiprocessing.context.AuthenticationError: digest sent was rejected   

This is not an explicit error, it occurs because the environment step fails or produces incorrect results when passing CPU tensors.

Solution: Actions passed to env.step() must be on the same device as the environment (typically GPU). Convert actions to CUDA before calling step:

# ❌ Wrong
action = torch.rand(1, 6)
env.step(action)

# ✅ Correct
action = torch.rand(1, 6).cuda()
env.step(action)

Overview​

LW-BenchHub's Advanced Distributed Architecture​

Decoupled Policy-Environment Design​

Prerequisites​

Step 1: Start Environment Server​

Basic Usage​

Server Parameters​

Available Server Arguments​

Server Startup​

Step 2: Run Policy Evaluation​

Basic Usage​

Policy Configuration File​

Environment Configuration Options​

Observation Mapping​

Step 3: View Results​

Quick Start Example: Evaluating GR00T on LiftCube​

Step 1: Download Pre-trained Checkpoint​

Step 2: Set Up GR00T Environment​

Step 3: Install LW-BenchHub in GR00T Environment​

Step 4: Start Environment Server​

Step 5: Run GR00T Policy Evaluation​

Expected Output​

Environment Summary​

Summary​

Troubleshooting​

AuthenticationError​

Overview

LW-BenchHub's Advanced Distributed Architecture

Decoupled Policy-Environment Design

Prerequisites

Step 1: Start Environment Server

Basic Usage

Server Parameters

Available Server Arguments

Server Startup

Step 2: Run Policy Evaluation

Basic Usage

Policy Configuration File

Environment Configuration Options

Observation Mapping

Step 3: View Results

Quick Start Example: Evaluating GR00T on LiftCube

Step 1: Download Pre-trained Checkpoint

Step 2: Set Up GR00T Environment

Step 3: Install LW-BenchHub in GR00T Environment

Step 4: Start Environment Server

Step 5: Run GR00T Policy Evaluation

Expected Output

Environment Summary

Summary

Troubleshooting

AuthenticationError