This tutorial shows how to use CleanRL to implement a training algorithm from scratch and train it on the Pistonball environment.
Implementing PPO: Train an agent using a simple PPO implementation
Advanced PPO: CleanRL’s official PPO example, with CLI, TensorBoard and WandB integration
CleanRL is a lightweight, highly-modularized reinforcement learning library, providing high-quality single-file implementations with research-friendly features.
See the documentation for more information.
Examples using PettingZoo:#
A key feature is CleanRL’s tight integration with Weights & Biases (WandB): for experiment tracking, hyperparameter tuning, and benchmarking. The Open RL Benchmark allows users to view public leaderboards for many tasks, including videos of agents’ performance across training timesteps.