# nanoPPO [![PyPI](https://img.shields.io/pypi/v/nanoPPO.svg)](https://pypi.org/project/nanoPPO/) [![Changelog](https://img.shields.io/github/v/release/jamesliu/nanoPPO?include_prereleases&label=changelog)](https://github.com/jamesliu/nanoPPO/releases) [![Tests](https://github.com/jamesliu/nanoPPO/workflows/Test/badge.svg)](https://github.com/jamesliu/nanoPPO/actions?query=workflow%3ATest) [![Documentation Status](https://readthedocs.org/projects/nanoPPO/badge/?version=stable)](http://nanoPPO.readthedocs.org/en/stable/?badge=stable) [![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/jamesliu/nanoPPO/blob/main/LICENSE) nanoPPO is a Python package that provides a simple and efficient implementation of the Proximal Policy Optimization (PPO) algorithm for reinforcement learning. It is designed to support both continuous and discrete action spaces, making it suitable for a wide range of applications. ## Installation You can install nanoPPO directly from PyPI using pip: ```bash pip install nanoPPO ``` Alternatively, you can clone the repository and install from source: ```bash git clone https://github.com/jamesliu/nanoPPO.git cd nanoPPO pip install . ``` ## Usage Here are examples of how to use nanoPPO to train an agent. ### On the MountaionCarContinuous-v0 environment: ```python from nanoppo.train_ppo_agent import train_agent import pickle ppo, model_file, metrics_file = train_agent( env_name="MountainCarContinuous-v0", max_episodes=50, policy_lr=0.0005, value_lr=0.0005, vl_coef=0.5, checkpoint_dir="checkpoints", checkpoint_interval=10, log_interval=10, wandb_log=False, ) ppo.load(model_file) print("Loaded best weights from", model_file) metrics = pickle.load(open(metrics_file, "rb")) print("Loaded metrics from", metrics_file) best_reward = metrics["best_reward"] episode = metrics["episode"] print("best_reward", best_reward, "episode", episode) ``` #### Use Custom LR Scheduler and Custom Policy * Set Cosine Annealing Learning Rate Scheduler * Set CausalAttention Policy instead of Linear Policy ```python from nanoppo.train_ppo_agent import train_agent from nanoppo.cosine_lr_scheduler import CosineLRScheduler from nanoppo.policy.actor_critic_causal_attention import ActorCriticCausalAttention lr_scheduler=CosineLRScheduler( learning_rate=config['cosine_lr'], warmup_iters=config['cosine_warmup_iters'], lr_decay_iters=config['cosine_decay_steps'], min_lr=config['cosine_min_lr']) policy_class = ActorCriticCausalAttention ppo, model_file, metrics_file = train_agent( env_name=env_name, env_config = env_config, max_episodes=config['max_episode'], stop_reward=config['stop_reward'], policy_class = policy_class, lr_scheduler=lr_scheduler, policy_lr=config['policy_lr'], value_lr=config['value_lr'], vl_coef=config['vl_coef'], betas = config['betas'], n_latent_var=config['n_latent_var'], gamma=config['gamma'], K_epochs=config['K_epochs'], eps_clip=config['eps_clip'], el_coef=config['el_coef'], checkpoint_dir=checkpoint_dir, checkpoint_interval=10, log_interval=10, wandb_log=wandb_log, debug=True) ``` ### On the CartPole-v1 environment: ```python from nanoppo.discrete_action_ppo import PPO import gym env = gym.make('CartPole-v1') ppo = PPO(env.observation_space.shape[0], env.action_space.n) # Training code here... ``` ## Examples See the [examples](https://github.com/jamesliu/nanoPPO/tree/main/examples) directory for more comprehensive usage examples. examples/train_mountaincar.sh ``` python nanoppo/train_ppo_agent.py --env_name=MountainCarContinuous-v0 --policy_lr=0.0005 --value_lr=0.0005 --max_episodes=50 --vl_coef=0.5 --wandb_log ``` ![mountaincar](https://github.com/jamesliu/nanoPPO/blob/main/assets/MountainCarContinuous-v0.png) examples/train_pointmass1d.sh ``` python nanoppo/train_ppo_agent.py --env_name=PointMass1D-v0 --policy_lr=0.0005 --value_lr=0.0005 --max_episodes=50 --vl_coef=0.5 --wandb_log ``` examples/train_pointmass2d.sh ``` python nanoppo/train_ppo_agent.py --env_name=PointMass2D-v0 --policy_lr=0.0005 --value_lr=0.0005 --max_episodes=50 --vl_coef=0.5 --wandb_log ``` ## Documentation Full documentation is available [here](https://nanoppo.readthedocs.io/en/latest/). ## Contributing We welcome contributions to nanoPPO! If you're interested in contributing, please see our [contribution guidelines](https://github.com/jamesliu/nanoPPO/blob/main/CONTRIBUTING.md). ## License nanoPPO is licensed under the Apache License 2.0. See the [LICENSE](https://github.com/jamesliu/nanoPPO/blob/main/LICENSE) file for more details. ## Support For support, questions, or feature requests, please open an issue on our [GitHub repository](https://github.com/jamesliu/nanoPPO/issues) or contact the maintainers. ## Changelog See the [releases](https://github.com/jamesliu/nanoPPO/releases) page for a detailed changelog of each version.