CleanRL Tutorial

This tutorial shows how to use CleanRL to implement a training algorithm from scratch and train it on the Pistonball environment.

  • Implementing PPO: Train an agent using a simple PPO implementation

  • Advanced PPO: CleanRL’s official PPO example, with CLI, TensorBoard and WandB integration

CleanRL Overview

CleanRL is a lightweight, highly-modularized reinforcement learning library, providing high-quality single-file implementations with research-friendly features.

See the documentation for more information.

Examples using PettingZoo:

WandB Integration

A key feature is CleanRL’s tight integration with Weights & Biases (WandB): for experiment tracking, hyperparameter tuning, and benchmarking. The Open RL Benchmark allows users to view public leaderboards for many tasks, including videos of agents’ performance across training timesteps.

CleanRl integration with Weights & Biases