RL Configuration
RL Configuration Guide (with LiftObj as an example)
This page explains how reinforcement learning (RL) environments are defined and configured in LW-BenchHub, using the LiftObj task as a concrete example. It covers how robot and task RL configs are composed, where the configurations live, and how to train and evaluate with SKRL.
High-level composition
- An RL environment is composed from:
- A robot RL variant (actions, frames, sensors) under
lw_benchhub/core/robots/<robot>/ - A task RL environment config (observations, rewards, events, curriculum) under
lw_benchhub_rl/<task>/ - A scene/layout (kitchen variants or USD)
- A robot RL variant (actions, frames, sensors) under
- Both robot and task RL configs are registered in
lw_benchhub_rl/__init__.pyusing a helper, which also links the SKRL agent config (YAML).
Where things are (LiftObj)
- Task RL env config (Python):
lw_benchhub_rl/lift_obj/lift_obj.py - Robot RL variant (Python):
lw_benchhub/core/robots/unitree/g1.py(classUnitreeG1HandEnvRLCfg) - RL environment registration:
lw_benchhub_rl/__init__.py - Global RL presets (YAML):
configs/rl/skrl/rl_base.yml - Task preset (YAML):
configs/rl/skrl/g1_liftobj.yaml - SKRL agent config (YAML):
lw_benchhub_rl/lift_obj/agents/skrl_ppo_cfg.yaml - Training/Eval scripts:
lw_benchhub/scripts/rl/train.py,lw_benchhub/scripts/rl/play.py, shell wrapperstrain.sh,eval.sh
Robot RL variant (G1-RL)
The robot RL variant adapts the robot to RL control (e.g., action heads as joint targets, defining end-effector frames). For Unitree G1-Right-Hand:
File: lw_benchhub/core/robots/unitree/g1.py
Key points in UnitreeG1HandEnvRLCfg:
- Sets
robot_name = "G1-RL"for RL usage - Replaces teleop actions with RL action head (e.g.,
JointPositionActionCfgfor the right arm joints) - Defines
ee_frame(FrameTransformer) with targets for TCP and fingertips, used in observations/rewards - Sets reward-related joint name patterns for gripper and arm
Effect: The robot provides the correct action/observation interfaces for the RL task and rewards.
Task RL env config (LiftObj)
File: lw_benchhub_rl/lift_obj/lift_obj.py
The LiftObj RL config composes the base task with RL-specific managers.
Commands
@configclass
class CommandsCfg:
object_pose = mdp.UniformPoseCommandCfg(
asset_name="robot",
body_name=MISSING, # set in LiftObjG1RlCfg
resampling_time_range=(5.0, 5.0),
debug_vis=False,
ranges=mdp.UniformPoseCommandCfg.Ranges(
pos_x=(-0.05, 0.05), pos_y=(-0.20, -0.05), pos_z=(0.3, 0.35),
roll=(0.0, 0.0), pitch=(0.0, 0.0), yaw=(0.0, 0.0)
),
)
The task samples a target object pose command object_pose. In LiftObjG1RlCfg, this command's body_name is bound to the robot link, e.g. right_wrist_yaw_link.
Policy Observations
@configclass
class G1LiftObjStateObsCfg(RlBasePolicyObservationCfg):
joint_pos = ObsTerm(func=mdp.joint_pos_rel)
joint_vel = ObsTerm(func=mdp.joint_vel_rel)
target_object_position = ObsTerm(
func=mdp.generated_commands, params={"command_name": "object_pose"}
)
actions = ObsTerm(func=mdp.last_action)
image_hand = ObsTerm(
func=mdp.image_features,
params={"sensor_cfg": SceneEntityCfg("hand_camera"), "data_type": "rgb", "model_name": "resnet18", "model_device": "cuda:0"},
)
rgb_d435 = ObsTerm(
func=mdp.image_features,
params={"sensor_cfg": SceneEntityCfg("d435_camera"), "data_type": "rgb", "model_name": "resnet18", "model_device": "cuda:0"},
)
def __post_init__(self):
self.enable_corruption = True
self.concatenate_terms = True
Notes:
- Proprioceptive terms use relative joint pos/vel
generated_commandsinjects the sampledobject_posecommand- Two image feature terms are provided (hand camera and D435)
- Observations are concatenated for the policy
Rewards
@configclass
class RewardsCfg:
reaching_object = RewTerm(func=mdp.object_ee_distance, params={"std": 0.1}, weight=1.0)
lifting_object = RewTerm(func=mdp.object_is_lifted, params={"minimal_height": 0.98}, weight=15.0)
lifting_object_grasped = RewTerm(
func=mdp.object_is_lifted_grasped,
params={"grasp_threshold": 0.08, "velocity_threshold": 0.15},
weight=8.0,
)
gripper_close_action_reward = RewTerm(
func=mdp.gripper_close_action_reward,
weight=1.0,
params={"asset_cfg": SceneEntityCfg("robot", joint_names=["right_hand_.*"])},
)
object_goal_tracking = RewTerm(
func=mdp.object_goal_distance,
params={"std": 0.3, "minimal_height": 0.97, "command_name": "object_pose"},
weight=16.0,
)
action_rate = RewTerm(func=mdp.action_rate_l2, weight=-1e-4)
joint_vel = RewTerm(func=mdp.joint_vel_l2, weight=-1e-4, params={"asset_cfg": SceneEntityCfg("robot")})
Notes:
- Dense shaping includes proximity (
object_ee_distance) and goal tracking (object_goal_distance) - Success-related terms check lifting height and near-grasp conditions
- Regularizers penalize fast actions and joint velocities
Events and Curriculum
@configclass
class EventCfg:
reset_all = EventTerm(func=mdp.reset_scene_to_default, mode="reset")
reset_object_position = EventTerm(
func=mdp.reset_root_state_uniform, mode="reset",
params={"pose_range": {"x": (-0.05, 0.05), "y": (-0.05, 0.05), "z": (0.0, 0.0), "yaw": (0.0, 90.0)},
"velocity_range": {}, "asset_cfg": SceneEntityCfg("object", body_names="apple_11")},
)
@configclass
class CurriculumCfg:
action_rate = CurrTerm(func=mdp.modify_reward_weight, params={"term_name": "action_rate", "weight": -5e-3, "num_steps": 36000})
joint_vel = CurrTerm(func=mdp.modify_reward_weight, params={"term_name": "joint_vel", "weight": -5e-3, "num_steps": 36000})
Compose the RL config
@rl_on(task=LiftObj)
@rl_on(embodiment=G1-RL)
class G1LiftObjStateRL(LwRL):
def __init__(self):
super().__init__()
# define or reuse cfg here
self.rewards_cfg = RewardsCfg()
self.events_cfg = EventCfg()
self.curriculum_cfg = CurriculumCfg()
self.commands_cfg = CommandsCfg()
self.policy_observation_cfg = G1LiftObjStateObsCfg()
self.resample_objects_placement_on_reset = False
self.resample_robot_placement_on_reset = False
def setup_env_config(self, orchestrator):
super().setup_env_config(orchestrator)
# G1 right eef name
orchestrator.task.commands_cfg.object_pose.body_name = "right_wrist_yaw_link"
Terminations
BaseRLEnvCfg includes a default time_out_ termination. Training scripts disable the time limit termination at runtime and handle episode rollouts explicitly.
Registration and presets
RL environment registration
File: lw_benchhub_rl/__init__.py
gym.register(
id="Robocasa-Rl-RobotTaskRL",
entry_point="isaaclab.envs:ManagerBasedRLEnv",
kwargs={
"env_cfg_entry_point": f"{__name__}.lift_obj.lift_obj:G1LiftObjStateRL",
"skrl_cfg_entry_point": f"{lift_obj_agents.__name__}:skrl_ppo_cfg.yaml"
"rsl_rl_cfg_entry_point": f"{lift_obj_agents.__name__}:rsl_rl_ppo_cfg:PPORunnerCfg"
},
disable_env_checker=True,
)
This registers the composed environment and links the SKRL agent YAML.
Global preset: rl_base.yml
File: configs/rl/skrl/rl_base.yml
Important fields:
disable_fabric(if false ⇒ env uses fabric; faster sim I/O)num_envs,layout,usd_simplify,robot_scale,first_person_viewrl: Specifies the RL config name binding for the robot-task combination.- Logging/video controls:
video,video_length,video_interval - Training controls:
seed,distributed,max_iterations - ML framework and algorithm selection:
ml_framework,algorithm - Checkpoint:
checkpoint,use_pretrained_checkpoint,real_time enable_cameras: enable cameras in the env (required by Visual variants)variant: RL variant name (e.g.,StateorVisual), used when registering RL configs
Task preset: g1_liftobj.yaml
File: configs/rl/skrl/g1_liftobj.yaml
Overrides and bindings:
_base_: rl_basetask: LiftObjrobot: G1-RLlayout: robocasakitchen-1-8rl: G1LiftObjStateRLnum_envs: 200usd_simplify: truevariant: State(default to State-based LiftObj)enable_cameras: false(set true for Visual variant)
SKRL agent configuration (LiftObj)
File: lw_benchhub_rl/lift_obj/agents/skrl_ppo_cfg.yaml
Key sections:
models: network tops for policy/value.GaussianMixinis used for continuous actions.memory: rollout buffer backend (RandomMemory) withmemory_size = -1to follow agentrollouts.agent(PPO): core hyperparameters.rollouts,learning_epochs,mini_batchesdiscount_factor(gamma),lambda(GAE lambda)learning_rateand scheduler (KLAdaptiveLRwithkl_threshold)ratio_clip,value_clip,entropy_loss_scale,value_loss_scalerewards_shaper_scale: multiplies rewards before updatesexperiment: logging/checkpoint frequency and directory
trainer: training driver andtimestepsfor total steps
Practical tips:
- Effective batch =
num_envs * rollouts(per update); increase with GPU memory. - Use KL-based LR scheduler to stabilize large batch PPO.
- Increase
entropy_loss_scaleto increase exploration; tune down later.
Training and evaluation
Default run via scripts:
bash train.sh # uses configs/rl/skrl/g1_liftobj_state.yaml by default
bash eval.sh # evaluate the latest checkpoint
Or directly via Python:
python ./lw_benchhub/scripts/rl/train.py --task_config g1_liftobj_state --headless
python ./lw_benchhub/scripts/rl/play.py --task_config g1_liftobj_state
Troubleshooting
- OOM / low FPS: reduce
num_envs, disable cameras, setusd_simplify: true. - Instability: lower
learning_rate, increaselearning_epochs, reducemini_batches. - Poor exploration: increase
entropy_loss_scaleor relax reward shaping initially. - Reward plateaus: verify
object_posecommand ranges,minimal_heightthresholds, and camera feature devices.