Metadata-Version: 2.1
Name: rl-toolkit
Version: 3.1.1
Summary: The RL-Toolkit: A toolkit for developing and comparing your reinforcement learning agents in various games (OpenAI Gym or Pybullet).
Home-page: https://github.com/markub3327/rl-toolkit
Author: Martin Kubovčík
Author-email: markub3327@gmail.com
License: mit
Project-URL: Bug Tracker, https://github.com/markub3327/rl-toolkit/issues
Description: # RL toolkit
        
        [![Release](https://img.shields.io/github/release/markub3327/rl-toolkit)](https://github.com/markub3327/rl-toolkit/releases)
        ![Tag](https://img.shields.io/github/v/tag/markub3327/rl-toolkit)
        [![Issues](https://img.shields.io/github/issues/markub3327/rl-toolkit)](https://github.com/markub3327/rl-toolkit/issues)
        ![Commits](https://img.shields.io/github/commit-activity/w/markub3327/rl-toolkit)
        ![Languages](https://img.shields.io/github/languages/count/markub3327/rl-toolkit)
        ![Size](https://img.shields.io/github/repo-size/markub3327/rl-toolkit)
        
        ## Papers
          * [**Soft Actor-Critic**](https://arxiv.org/pdf/1812.05905.pdf)
          * [**Generalized State-Dependent Exploration**](https://arxiv.org/pdf/2005.05719.pdf)
          * [**Reverb: A framework for experience replay**](https://arxiv.org/pdf/2102.04736.pdf)
        
        ## Setting up container
        ```bash
        # Preview
        docker pull markub3327/rl-toolkit:latest
        
        # Stable
        docker pull markub3327/rl-toolkit:2.0.2
        ```
        
        ## Run
        ```bash
        # Training container (learner)
        docker run -it --rm markub3327/rl-toolkit python3 training.py [-h] -env ENV_NAME -s PATH_TO_MODEL_FOLDER [--wandb]
        
        # Simulation container (agent)
        docker run -it --rm markub3327/rl-toolkit python3 testing.py [-h] -env ENV_NAME -f PATH_TO_MODEL_FOLDER [--wandb]
        ```
        
        ## Tested environments
        
          | Environment              | Observation space | Observation bounds | Action space | Action bounds |
          | ------------------------ | :---------------: | :----------------: | :----------: | :-----------: |
          | BipedalWalkerHardcore-v3 | (24, ) | [-inf , inf] | (4, ) | [-1.0 , 1.0] |
          | Walker2DBulletEnv-v0     | (22, ) | [-inf , inf] | (6, ) | [-1.0 , 1.0] |
          | AntBulletEnv-v0          | (28, ) | [-inf , inf] | (8, ) | [-1.0 , 1.0] |
          | HalfCheetahBulletEnv-v0  | (26, ) | [-inf , inf] | (6, ) | [-1.0 , 1.0] |
          | HopperBulletEnv-v0       | (15, ) | [-inf , inf] | (3, ) | [-1.0 , 1.0] |
          | HumanoidBulletEnv-v0     | (44, ) | [-inf , inf] | (17, ) | [-1.0 , 1.0] |
        
        
        ## Results
        
        <p align="center"><b>Summary</b></p>
        <p align="center">
          <a href="https://wandb.ai/markub/rl-toolkit?workspace=user-markub" target="_blank"><img src="img/results.png" alt="results"></a>
        </p>
        
        <p align="center"><b>Return from game</b></p>
        
          | Environment              | gSDE | gSDE<br>+ Huber loss |
          | ------------------------ | :---: | :-----------------: |
          | BipedalWalkerHardcore-v3[<sup>(2)</sup>](https://sb3-contrib.readthedocs.io/en/stable/modules/tqc.html#results) | 13 ± 18 | - |
          | Walker2DBulletEnv-v0[<sup>(1)</sup>](https://paperswithcode.com/paper/generalized-state-dependent-exploration-for)     | 2270 ± 28 | **2732 ± 96** |
          | AntBulletEnv-v0[<sup>(1)</sup>](https://paperswithcode.com/paper/generalized-state-dependent-exploration-for)          | 3106 ± 61 | **3460 ± 119** |
          | HalfCheetahBulletEnv-v0[<sup>(1)</sup>](https://paperswithcode.com/paper/generalized-state-dependent-exploration-for)  | 2945 ± 95 | **3003 ± 226** |
          | HopperBulletEnv-v0[<sup>(1)</sup>](https://paperswithcode.com/paper/generalized-state-dependent-exploration-for)       | 2515 ± 50 | **2555 ± 405** |
          | HumanoidBulletEnv-v0 | - | ** ± ** |
        ----------------------------------
        
        **Frameworks:** Tensorflow, Reverb, OpenAI Gym, PyBullet, WanDB, OpenCV
        <br>
        **Languages:** Python, Shell
        <br>
        **Author**: Martin Kubovčík
        
        
        # v3.0.7 (June 1, 2021)
        ## Features 🔊
        - Reverb
        - updated kernel_initializer for last layers
        - without clipping the mean
        - setup.py (package is available on PyPI)
        - split research process into **agent**, **learner** and **tester** roles
        
        <br>
        
        # v2.0.2 (May 23, 2021)
        ## Bug fixes 🛠️
        - update Dockerfile
        - update README.md
        - formatted code by Black & Flake8
        
        <br>
        
        # v2.0.1 (April 27, 2021)
        ## Bug fixes 🛠️
        - fix Critic model
        
        <br>
        
        # v2.0.0 (April 22, 2021)
        ## Features 🔊
        - Add Huber loss
        - In test mode, rendering to the video file
        - Normalized observation by Min-max method
        - Remove TD3 algorithm
Keywords: reinforcement-learning,ml,openai-gym,pybullet,reverb,docker,rl-agents,rl,sac,rl-algorithms,soft-actor-critic,gsde,rl-toolkit,games,tensorflow,wandb
Platform: UNKNOWN
Classifier: License :: OSI Approved :: MIT License
Classifier: Environment :: Console
Classifier: Environment :: GPU :: NVIDIA CUDA
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Education
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Requires-Python: >=3.7
Description-Content-Type: text/markdown
