CleanRL Tutorial¶

This tutorial shows how to use CleanRL to implement a training algorithm from scratch and train it on the Pistonball environment.

Implementing PPO: Train an agent using a simple PPO implementation
Advanced PPO: CleanRL’s official PPO example, with CLI, TensorBoard and WandB integration

CleanRL Overview¶

CleanRL is a lightweight, highly-modularized reinforcement learning library, providing high-quality single-file implementations with research-friendly features.

See the documentation for more information.

Examples using PettingZoo:¶

PPO PettingZoo Atari example

WandB Integration¶

A key feature is CleanRL’s tight integration with Weights & Biases (WandB): for experiment tracking, hyperparameter tuning, and benchmarking. The Open RL Benchmark allows users to view public leaderboards for many tasks, including videos of agents’ performance across training timesteps.

CleanRl integration with Weights & Biases