Policy Evaluation Usage
Overview
This guide demonstrates how to evaluate a trained policy on LW-BenchHub tasks. The evaluation process involves two steps:
- Start the environment server (
env_server.py) - Creates a remote environment service that waits for configuration - Run policy evaluation (
eval_policy.py) - Connects to the server, sends environment configuration viaattach(), and evaluates the policy
LW-BenchHub's Advanced Distributed Architecture
Decoupled Policy-Environment Design
LW-BenchHub adopts a cutting-edge distributed architecture that fundamentally separates the policy inference engine from the simulation environment. This design philosophy brings several transformative advantages:
🚀 Zero-Dependency Isolation
The policy and environment run in completely independent processes with isolated Python environments. This architecture eliminates the notorious "dependency hell" problem:
- Policy side can use any deep learning framework with specific versions without conflicts
- Environment side runs Isaac Lab with its required dependencies independently
- Rapid iteration: Update policy models without restarting the heavy simulation environment
⚡ High-Performance Zero-Copy Communication
The framework implements an optimized inter-process communication (IPC) protocol with shared memory for data transfer:
- Seamless remote environment access - clients interact with remote environments as if they were local, with transparent API calls
- Zero-copy data sharing via shared memory regions - large observation data (multi-camera RGB-D streams) are transferred without serialization
- Sub-millisecond latency for observation-action loops, enabling real-time policy evaluation with negligible overhead
🔄 Flexible Deployment Modes
The distributed design supports multiple deployment paradigms:
┌─────────────────────────────────────────────────────────────┐
│ Local IPC Mode (Default) │
│ ┌──────────────┐ Shared Memory ┌──────────────────┐ │
│ │ Policy │ ←─────────────────→ │ Environment │ │
│ │ (PyTorch) │ Zero-Copy IPC │ (Isaac Lab) │ │
│ └──────────────┘ └──────────────────┘ │
│ Same Machine - Microsecond Latency │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Remote RESTful Mode │
│ ┌──────────────┐ Network/HTTP ┌──────────────────┐ │
│ │ Policy │ ←─────────────────→ │ Environment │ │
│ │ (CPU/Edge) │ Compression │ (GPU Cluster) │ │
│ └──────────────┘ └──────────────────┘ │
│ Cross-Machine - Network Resilient │
└─────────────────────────────────────────────────────────────┘
This sophisticated architecture makes LW-BenchHub uniquely suitable for:
- Research labs with heterogeneous computing environments
- Production robotics systems requiring high reliability
- Large-scale policy benchmarking and ablation studies
- Real-robot deployment where policy runs on robot hardware while simulation serves as digital twin
Prerequisites
- LW-BenchHub installed
- Trained policy checkpoint
- Policy configuration file (includes environment configuration)
Step 1: Start Environment Server
The environment server creates a remote service that waits for configuration from the policy client. Unlike the previous architecture, no task configuration is needed at server startup.
Basic Usage
cd lw_benchhub
python lw_benchhub/scripts/env_server.py
Server Parameters
# Basic usage with default IPC protocol
python lw_benchhub/scripts/env_server.py
# Custom IPC host and port
python lw_benchhub/scripts/env_server.py --ipc_host 127.0.0.1 --ipc_port 50000
# Use RESTful protocol for remote access
python lw_benchhub/scripts/env_server.py --remote_protocol restful --restful_host 0.0.0.0 --restful_port 8000
# Enable cameras (required for visual observations)
python lw_benchhub/scripts/env_server.py --enable_camera
# Run without GUI viewer
python lw_benchhub/scripts/env_server.py --headless
Available Server Arguments
| Argument | Default | Description |
|---|---|---|
--remote_protocol | ipc | Communication protocol (ipc or restful) |
--ipc_host | 127.0.0.1 | IPC host address |
--ipc_port | 50000 | IPC port number |
--ipc_authkey | lightwheel | IPC authentication key |
--restful_host | 0.0.0.0 | RESTful server host |
--restful_port | 8000 | RESTful server port |
--headless | true | Run without GUI viewer (default is headless) |
--enable_camera | false | Enable camera rendering for visual observations |
Server Startup
When you run the server, you should see output like:
Waiting for connection on ('127.0.0.1', 50000)...
Press Ctrl+C to stop the server
Keep this terminal running - the server must stay active for the evaluation script to connect.
Step 2: Run Policy Evaluation
Once the environment server is running, open a new terminal and run the policy evaluation script.
Important: The policy evaluation script runs in a separate Python environment from the simulation server. You only need to install lw_benchhub and your policy's dependencies (e.g., PyTorch, transformers) in this environment - no need to install Isaac Lab, Isaac Sim, or any simulation dependencies. This lightweight setup enables rapid policy development and testing without the overhead of full simulation stack installation.
Basic Usage
cd lw_benchhub
python lw_benchhub/scripts/policy/eval_policy.py --config policy/GR00T/deploy_policy_lerobot.yml
Policy Configuration File
The policy configuration file now includes both policy settings and environment configuration. Create a policy deployment configuration (e.g., policy/GR00T/deploy_policy_lerobot.yml):
# Policy Configuration
policy_name: GR00TPolicy # Policy class name (must match your policy class)
# Model Configuration
checkpoint: /path/to/checkpoint # Path to trained policy checkpoint
instruction: "Grab the block and lift it up." # Task instruction/prompt
# Policy-specific parameters
embodiment_tag: 'new_embodiment'
action_horizon: 16
denoising_steps: 4
num_feedback_actions: 16
data_config: 'so100_dualcam'
# Observation Configuration
observation_config:
custom_mapping:
# Map environment observation keys to policy input keys
video.front: global_camera # Front camera view
video.wrist: hand_camera # Wrist camera view
state.single_arm: {joint_pos: [0,1,2,3,4]} # Arm joint positions
state.gripper: {joint_pos: [5]} # Gripper position
# Evaluation Settings
record_camera: ["global_camera"] # Cameras to record in evaluation video
time_out_limit: 50 # Maximum steps per episode
height: 480 # Camera image height
width: 480 # Camera image width
# Environment Configuration (sent to server via attach())
env_cfg:
task: LiftObj # Task name
robot: LeRobot-AbsJointGripper-RL # Robot type
layout: robocasakitchen-9-8 # Scene layout
scene_backend: robocasa # Scene backend
task_backend: robocasa # Task backend
device: cuda:0 # Device for simulation
num_envs: 1 # Number of parallel environments
usd_simplify: false # USD simplification
enable_cameras: true # Enable camera observations
video: false # Record video in environment
disable_fabric: false # Fabric settings
robot_scale: 1.0 # Robot scale factor
seed: 42 # Random seed
for_rl: false # RL mode (false for policy evaluation)
variant: Visual # Observation variant (Visual/State)
concatenate_terms: false # Concatenate observation terms
distributed: false # Multi-GPU training mode
Environment Configuration Options
The env_cfg section specifies all environment parameters that are sent to the server via attach():
| Parameter | Type | Description |
|---|---|---|
task | string | Task name (e.g., LiftObj, CloseDishwasher) |
robot | string | Robot type (e.g., LeRobot-AbsJointGripper-RL) |
layout | string | Scene layout (e.g., robocasakitchen, robocasakitchen-9-8) |
scene_backend | string | Scene backend (robocasa) |
task_backend | string | Task backend (robocasa) |
device | string | CUDA device for simulation |
num_envs | int | Number of parallel environments |
usd_simplify | bool | USD simplification for faster loading |
video | bool | Record video in environment |
seed | int | Random seed for reproducibility |
variant | string | Observation variant (Visual or State) |
Observation Mapping
The observation_config.custom_mapping in your policy config maps environment observations to your policy's expected input format:
Environment Observation Keys (from LW-BenchHub):
global_camera: Front/global camera viewhand_camera: Wrist/hand camera viewjoint_pos: Robot joint positionsjoint_vel: Robot joint velocitiesjoint_target_pos: Target joint positions (previous action)
Policy Input Keys (your model's expected format):
Different policies expect different input formats. Map accordingly:
# Example for GR00T Policy
observation_config:
custom_mapping:
video.front: global_camera
video.wrist: hand_camera
state.single_arm: {joint_pos: [0,1,2,3,4]}
state.gripper: {joint_pos: [5]}
# Example for PI Policy
observation_config:
custom_mapping:
images/front: global_camera
images/wrist: hand_camera
state: joint_pos
action: joint_target_pos
# Example for GR00T Policy
observation_config:
custom_mapping:
observation.image.front: global_camera
observation.image.wrist: hand_camera
observation.state.joint_pos: joint_pos
Step 3: View Results
Evaluation videos will be saved to:
lw_benchhub/eval_result/episode0.mp4
lw_benchhub/eval_result/episode1.mp4
...
Success rate will be printed:
Testing policy: 100%|██████████| 10/10 [02:30<00:00, 15.0s/it]
Success rate: 0.8
Quick Start Example: Evaluating GR00T on LiftCube
We provide a pre-trained GR00T checkpoint to help you get started quickly with the LiftCube task.
Step 1: Download Pre-trained Checkpoint
Download the GR00T checkpoint from Hugging Face:
# Download checkpoint
git lfs install
git clone https://huggingface.co/LightwheelAI/gr00t15_LiftCube
Or download directly from: https://huggingface.co/LightwheelAI/gr00t15_LiftCube/tree/main/checkpoint-9000
Step 2: Set Up GR00T Environment
Install the GR00T framework following the official guide:
# Clone GR00T repository
git clone https://github.com/NVIDIA/Isaac-GR00T
cd Isaac-GR00T
# Follow installation instructions from the repository
# This creates a separate Python environment for GR00T
Step 3: Install LW-BenchHub in GR00T Environment
# Activate your GR00T Python environment
conda activate groot # or your GR00T environment name
# Install lw_benchhub (lightweight installation - no simulation dependencies needed)
cd /path/to/lw_benchhub
pip install -e .
Note: You only need LW-BenchHub's client library in the GR00T environment, not Isaac Lab or Isaac Sim.
Step 4: Start Environment Server
In a separate terminal, activate your lw_benchhub environment (with Isaac Lab) and start the server:
# Activate lw_benchhub environment (with Isaac Lab installed)
conda activate lw_benchhub
# Navigate to lw_benchhub directory
cd /path/to/lw_benchhub
# Start environment server (no task config needed!)
python lw_benchhub/scripts/env_server.py
The server will start and display:
Waiting for connection on ('127.0.0.1', 50000)...
Press Ctrl+C to stop the server
Step 5: Run GR00T Policy Evaluation
In a new terminal, activate your GR00T environment and run the evaluation:
# Activate GR00T environment
conda activate gr00t
# Navigate to lw_benchhub directory
cd /path/to/lw_benchhub
# Run policy evaluation (env config is in the policy config file)
python lw_benchhub/scripts/policy/eval_policy.py \
--config policy/GR00T/deploy_policy_lerobot.yml
Expected Output
Connecting to environment server at 127.0.0.1:50000...
Connected successfully!
Attaching environment with config: LiftObj on robocasakitchen-9-8
[INFO-50000]: Attached environment to <ManagerBasedEnv>
Loading GR00T model...
Successfully loaded GR00T policy!
Evaluating policy: 100%|██████████| 10/10 [01:45<00:00, 10.5s/it]
Success rate: 0.9
Evaluation videos saved to: eval_result/
Environment Summary
Terminal 1 (lw_benchhub env with Isaac Lab):
└─ env_server.py → Waits for attach(), then runs simulation on GPU
Terminal 2 (GR00T env, lightweight):
└─ eval_policy.py → Sends env_cfg via attach(), runs policy inference
Communication: Zero-copy IPC via shared memory
Lifecycle: attach() → step/reset → close_connection()
This example demonstrates the power of LW-BenchHub's decoupled architecture - you can run a complex policy like GR00T without installing the full simulation stack in the policy environment!
Summary
To evaluate a policy on LW-BenchHub tasks:
1. Prepare configuration:
- Policy config with
env_cfgsection (task, robot, cameras, etc.) - Model checkpoint and policy-specific parameters
2. Start environment server:
python lw_benchhub/scripts/env_server.py
3. Run policy evaluation:
python lw_benchhub/scripts/policy/eval_policy.py \
--config policy/YourPolicy/deploy_policy.yml
4. Check results:
- Videos in
eval_result/video/ - Results JSON in
eval_result/eval_results.json - Success rate in terminal output
Key Points:
- Environment config in policy file: All environment parameters are now specified in the policy configuration file under
env_cfg - Attach/Detach lifecycle: Server waits for
attach()with configuration, can be reconfigured without restart - Observation mapping: Ensure
observation_config.custom_mappingcorrectly maps simulation output keys to your model's expected input keys - Action mapping: If your model's joint order differs from simulation, use
joint_mappingto reorder - Dimension matching: Verify action dimension matches robot DoF, observation resolution matches training setup
- Format consistency: Camera image format (HWC/CHW), state vector order must align with training data