# -*- coding: utf-8 -*-
from setuptools import setup

packages = \
['pytorch_optimizer',
 'pytorch_optimizer.base',
 'pytorch_optimizer.experimental',
 'pytorch_optimizer.lr_scheduler',
 'pytorch_optimizer.optimizer']

package_data = \
{'': ['*']}

install_requires = \
['torch>=1.10,<2.0']

extras_require = \
{':python_version >= "3.7" and python_version < "3.8"': ['numpy==1.21.1'],
 ':python_version >= "3.8"': ['numpy']}

setup_kwargs = {
    'name': 'pytorch-optimizer',
    'version': '2.1.1',
    'description': 'Bunch of optimizer implementations in PyTorch with clean-code, strict types. Also, including useful optimization ideas.',
    'long_description': '=================\npytorch-optimizer\n=================\n\n+--------------+------------------------------------------+\n| Build        | |workflow| |Documentation Status|        |\n+--------------+------------------------------------------+\n| Quality      | |codecov| |black|                        |\n+--------------+------------------------------------------+\n| Package      | |PyPI version| |PyPI pyversions|         |\n+--------------+------------------------------------------+\n| Status       | |PyPi download| |PyPi month download|    |\n+--------------+------------------------------------------+\n\n| **pytorch-optimizer** is bunch of optimizer collections in PyTorch. Also, including useful optimization ideas.\n| Most of the implementations are based on the original paper, but I added some tweaks.\n| Highly inspired by `pytorch-optimizer <https://github.com/jettify/pytorch-optimizer>`__.\n\nDocumentation\n-------------\n\nhttps://pytorch-optimizers.readthedocs.io/en/latest/\n\nUsage\n-----\n\nInstall\n~~~~~~~\n\n::\n\n    $ pip3 install -U pytorch-optimizer\n\nor\n\n::\n\n    $ pip3 install -U --no-deps pytorch-optimizer\n\nSimple Usage\n~~~~~~~~~~~~\n\n::\n\n    from pytorch_optimizer import AdamP\n\n    model = YourModel()\n    optimizer = AdamP(model.parameters())\n\n    # or you can use optimizer loader, simply passing a name of the optimizer.\n\n    from pytorch_optimizer import load_optimizer\n\n    model = YourModel()\n    opt = load_optimizer(optimizer=\'adamp\')\n    optimizer = opt(model.parameters())\n\nAlso, you can load the optimizer via `torch.hub`\n\n::\n\n    import torch\n\n    model = YourModel()\n    opt = torch.hub.load(\'kozistr/pytorch_optimizer\', \'adamp\')\n    optimizer = opt(model.parameters())\n\n\nAnd you can check the supported optimizers & lr schedulers.\n\n::\n\n    from pytorch_optimizer import get_supported_optimizers, get_supported_lr_schedulers\n\n    supported_optimizers = get_supported_optimizers()\n    supported_lr_schedulers = get_supported_lr_schedulers()\n\n\nSupported Optimizers\n--------------------\n\n+--------------+----------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+\n| Optimizer    | Description                                                                            | Official Code                                                                     | Paper                                                                                         |\n+==============+========================================================================================+===================================================================================+===============================================================================================+\n| AdaBelief    | *Adapting Step-sizes by the Belief in Observed Gradients*                              | `github <https://github.com/juntang-zhuang/Adabelief-Optimizer>`__                | `https://arxiv.org/abs/2010.07468 <https://arxiv.org/abs/2010.07468>`__                       |\n+--------------+----------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+\n| AdaBound     | *Adaptive Gradient Methods with Dynamic Bound of Learning Rate*                        | `github <https://github.com/Luolc/AdaBound/blob/master/adabound/adabound.py>`__   | `https://openreview.net/forum?id=Bkg3g2R9FX <https://openreview.net/forum?id=Bkg3g2R9FX>`__   |\n+--------------+----------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+\n| AdaHessian   | *An Adaptive Second Order Optimizer for Machine Learning*                              | `github <https://github.com/amirgholami/adahessian>`__                            | `https://arxiv.org/abs/2006.00719 <https://arxiv.org/abs/2006.00719>`__                       |\n+--------------+----------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+\n| AdamD        | *Improved bias-correction in Adam*                                                     |                                                                                   | `https://arxiv.org/abs/2110.10828 <https://arxiv.org/abs/2110.10828>`__                       |\n+--------------+----------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+\n| AdamP        | *Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights*         | `github <https://github.com/clovaai/AdamP>`__                                     | `https://arxiv.org/abs/2006.08217 <https://arxiv.org/abs/2006.08217>`__                       |\n+--------------+----------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+\n| diffGrad     | *An Optimization Method for Convolutional Neural Networks*                             | `github <https://github.com/shivram1987/diffGrad>`__                              | `https://arxiv.org/abs/1909.11015v3 <https://arxiv.org/abs/1909.11015v3>`__                   |\n+--------------+----------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+\n| MADGRAD      | *A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic*               | `github <https://github.com/facebookresearch/madgrad>`__                          | `https://arxiv.org/abs/2101.11075 <https://arxiv.org/abs/2101.11075>`__                       |\n+--------------+----------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+\n| RAdam        | *On the Variance of the Adaptive Learning Rate and Beyond*                             | `github <https://github.com/LiyuanLucasLiu/RAdam>`__                              | `https://arxiv.org/abs/1908.03265 <https://arxiv.org/abs/1908.03265>`__                       |\n+--------------+----------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+\n| Ranger       | *a synergistic optimizer combining RAdam and LookAhead, and now GC in one optimizer*   | `github <https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer>`__          | `https://bit.ly/3zyspC3 <https://bit.ly/3zyspC3>`__                                           |\n+--------------+----------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+\n| Ranger21     | *a synergistic deep learning optimizer*                                                | `github <https://github.com/lessw2020/Ranger21>`__                                | `https://arxiv.org/abs/2106.13731 <https://arxiv.org/abs/2106.13731>`__                       |\n+--------------+----------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+\n| Lamb         | *Large Batch Optimization for Deep Learning*                                           | `github <https://github.com/cybertronai/pytorch-lamb>`__                          | `https://arxiv.org/abs/1904.00962 <https://arxiv.org/abs/1904.00962>`__                       |\n+--------------+----------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+\n| Shampoo      | *Preconditioned Stochastic Tensor Optimization*                                        | `github <https://github.com/moskomule/shampoo.pytorch>`__                         | `https://arxiv.org/abs/1802.09568 <https://arxiv.org/abs/1802.09568>`__                       |\n+--------------+----------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+\n| Nero         | *Learning by Turning: Neural Architecture Aware Optimisation*                          | `github <https://github.com/jxbz/nero>`__                                         | `https://arxiv.org/abs/2102.07227 <https://arxiv.org/abs/2102.07227>`__                       |\n+--------------+----------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+\n| Adan         | *Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models*               | `github <https://github.com/sail-sg/Adan>`__                                      | `https://arxiv.org/abs/2208.06677 <https://arxiv.org/abs/2208.06677>`__                       |\n+--------------+----------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+\n| Adai         | *Disentangling the Effects of Adaptive Learning Rate and Momentum*                     | `github <https://github.com/zeke-xie/adaptive-inertia-adai>`__                    | `https://arxiv.org/abs/2006.15815 <https://arxiv.org/abs/2006.15815>`__                       |\n+--------------+----------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+\n\nUseful Resources\n----------------\n\nSeveral optimization ideas to regularize & stabilize the training. Most\nof the ideas are applied in ``Ranger21`` optimizer.\n\nAlso, most of the captures are taken from ``Ranger21`` paper.\n\n+------------------------------------------+---------------------------------------------+--------------------------------------------+\n| `Adaptive Gradient Clipping`_            | `Gradient Centralization`_                  | `Softplus Transformation`_                 |\n+------------------------------------------+---------------------------------------------+--------------------------------------------+\n| `Gradient Normalization`_                | `Norm Loss`_                                | `Positive-Negative Momentum`_              |\n+------------------------------------------+---------------------------------------------+--------------------------------------------+\n| `Linear learning rate warmup`_           | `Stable weight decay`_                      | `Explore-exploit learning rate schedule`_  |\n+------------------------------------------+---------------------------------------------+--------------------------------------------+\n| `Lookahead`_                             | `Chebyshev learning rate schedule`_         | `(Adaptive) Sharpness-Aware Minimization`_ |\n+------------------------------------------+---------------------------------------------+--------------------------------------------+\n| `On the Convergence of Adam and Beyond`_ | `Gradient Surgery for Multi-Task Learning`_ |                                            |\n+------------------------------------------+---------------------------------------------+--------------------------------------------+\n\nAdaptive Gradient Clipping\n--------------------------\n\n| This idea originally proposed in ``NFNet (Normalized-Free Network)`` paper.\n| ``AGC (Adaptive Gradient Clipping)`` clips gradients based on the ``unit-wise ratio of gradient norms to parameter norms``.\n\n-  code : `github <https://github.com/deepmind/deepmind-research/tree/master/nfnets>`__\n-  paper : `arXiv <https://arxiv.org/abs/2102.06171>`__\n\nGradient Centralization\n-----------------------\n\n+-----------------------------------------------------------------------------------------------------------------+\n| .. image:: https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/gradient_centralization.png  |\n+-----------------------------------------------------------------------------------------------------------------+\n\n``Gradient Centralization (GC)`` operates directly on gradients by centralizing the gradient to have zero mean.\n\n-  code : `github <https://github.com/Yonghongwei/Gradient-Centralization>`__\n-  paper : `arXiv <https://arxiv.org/abs/2004.01461>`__\n\nSoftplus Transformation\n-----------------------\n\nBy running the final variance denom through the softplus function, it lifts extremely tiny values to keep them viable.\n\n-  paper : `arXiv <https://arxiv.org/abs/1908.00700>`__\n\nGradient Normalization\n----------------------\n\nNorm Loss\n---------\n\n+---------------------------------------------------------------------------------------------------+\n| .. image:: https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/norm_loss.png  |\n+---------------------------------------------------------------------------------------------------+\n\n-  paper : `arXiv <https://arxiv.org/abs/2103.06583>`__\n\nPositive-Negative Momentum\n--------------------------\n\n+--------------------------------------------------------------------------------------------------------------------+\n| .. image:: https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/positive_negative_momentum.png  |\n+--------------------------------------------------------------------------------------------------------------------+\n\n-  code : `github <https://github.com/zeke-xie/Positive-Negative-Momentum>`__\n-  paper : `arXiv <https://arxiv.org/abs/2103.17182>`__\n\nLinear learning rate warmup\n---------------------------\n\n+----------------------------------------------------------------------------------------------------------+\n| .. image:: https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/linear_lr_warmup.png  |\n+----------------------------------------------------------------------------------------------------------+\n\n-  paper : `arXiv <https://arxiv.org/abs/1910.04209>`__\n\nStable weight decay\n-------------------\n\n+-------------------------------------------------------------------------------------------------------------+\n| .. image:: https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/stable_weight_decay.png  |\n+-------------------------------------------------------------------------------------------------------------+\n\n-  code : `github <https://github.com/zeke-xie/stable-weight-decay-regularization>`__\n-  paper : `arXiv <https://arxiv.org/abs/2011.11152>`__\n\nExplore-exploit learning rate schedule\n--------------------------------------\n\n+---------------------------------------------------------------------------------------------------------------------+\n| .. image:: https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/explore_exploit_lr_schedule.png  |\n+---------------------------------------------------------------------------------------------------------------------+\n\n-  code : `github <https://github.com/nikhil-iyer-97/wide-minima-density-hypothesis>`__\n-  paper : `arXiv <https://arxiv.org/abs/2003.03977>`__\n\nLookahead\n---------\n\n| ``k`` steps forward, 1 step back. ``Lookahead`` consisting of keeping an exponential moving average of the weights that is\n| updated and substituted to the current weights every ``k_{lookahead}`` steps (5 by default).\n\n-  code : `github <https://github.com/alphadl/lookahead.pytorch>`__\n-  paper : `arXiv <https://arxiv.org/abs/1907.08610v2>`__\n\nChebyshev learning rate schedule\n--------------------------------\n\nAcceleration via Fractal Learning Rate Schedules\n\n-  paper : `arXiv <https://arxiv.org/abs/2103.01338v1>`__\n\n(Adaptive) Sharpness-Aware Minimization\n---------------------------------------\n\n| Sharpness-Aware Minimization (SAM) simultaneously minimizes loss value and loss sharpness.\n| In particular, it seeks parameters that lie in neighborhoods having uniformly low loss.\n\n-  SAM paper : `paper <https://arxiv.org/abs/2010.01412>`__\n-  ASAM paper : `paper <https://arxiv.org/abs/2102.11600>`__\n-  A/SAM code : `github <https://github.com/davda54/sam>`__\n\nOn the Convergence of Adam and Beyond\n-------------------------------------\n\n- paper : `paper <https://openreview.net/forum?id=ryQu7f-RZ>`__\n\nGradient Surgery for Multi-Task Learning\n----------------------------------------\n\n- paper : `paper <https://arxiv.org/abs/2001.06782>`__\n\nCitations\n---------\n\n`AdamP <https://github.com/clovaai/AdamP#how-to-cite>`__\n\n`Adaptive Gradient Clipping <https://ui.adsabs.harvard.edu/abs/2021arXiv210206171B/exportcitation>`__\n\n`Chebyshev LR Schedules <https://ui.adsabs.harvard.edu/abs/2021arXiv210301338A/exportcitation>`__\n\n`Gradient Centralization <https://github.com/Yonghongwei/Gradient-Centralization#citation>`__\n\n`Lookahead <https://ui.adsabs.harvard.edu/abs/2019arXiv190708610Z/exportcitation>`__\n\n`RAdam <https://github.com/LiyuanLucasLiu/RAdam#citation>`__\n\n`Norm Loss <https://ui.adsabs.harvard.edu/abs/2021arXiv210306583G/exportcitation>`__\n\n`Positive-Negative Momentum <https://github.com/zeke-xie/Positive-Negative-Momentum#citing>`__\n\n`Explore-Exploit Learning Rate Schedule <https://ui.adsabs.harvard.edu/abs/2020arXiv200303977I/exportcitation>`__\n\n`On the adequacy of untuned warmup for adaptive optimization <https://ui.adsabs.harvard.edu/abs/2019arXiv191004209M/exportcitation>`__\n\n`Stable weight decay regularization <https://github.com/zeke-xie/stable-weight-decay-regularization#citing>`__\n\n`Softplus transformation <https://ui.adsabs.harvard.edu/abs/2019arXiv190800700T/exportcitation>`__\n\n`MADGRAD <https://github.com/facebookresearch/madgrad#tech-report>`__\n\n`AdaHessian <https://github.com/amirgholami/adahessian#citation>`__\n\n`AdaBound <https://github.com/Luolc/AdaBound#citing>`__\n\n`Adabelief <https://ui.adsabs.harvard.edu/abs/2020arXiv201007468Z/exportcitation>`__\n\n`Sharpness-aware minimization <https://ui.adsabs.harvard.edu/abs/2020arXiv201001412F/exportcitation>`__\n\n`Adaptive Sharpness-aware minimization <https://ui.adsabs.harvard.edu/abs/2021arXiv210211600K/exportcitation>`__\n\n`diffGrad <https://ui.adsabs.harvard.edu/abs/2019arXiv190911015D/exportcitation>`__\n\n`On the Convergence of Adam and Beyond <https://ui.adsabs.harvard.edu/abs/2019arXiv190409237R/exportcitation>`__\n\n`Gradient surgery for multi-task learning <https://ui.adsabs.harvard.edu/abs/2020arXiv200106782Y/exportcitation>`__\n\n`AdamD <https://ui.adsabs.harvard.edu/abs/2021arXiv211010828S/exportcitation>`__\n\n`Shampoo <https://ui.adsabs.harvard.edu/abs/2018arXiv180209568G/exportcitation>`__\n\n`Nero <https://ui.adsabs.harvard.edu/abs/2021arXiv210207227L/exportcitation>`__\n\n`Adan <https://ui.adsabs.harvard.edu/abs/2022arXiv220806677X/exportcitation>`__\n\n`Adai <https://github.com/zeke-xie/adaptive-inertia-adai#citing>`__\n\nCitation\n--------\n\nPlease cite original authors of optimization algorithms. If you use this software, please cite it as below.\nOr you can get from "cite this repository" button.\n\n::\n\n    @software{Kim_pytorch_optimizer_Bunch_of_2022,\n        author = {Kim, Hyeongchan},\n        month = {1},\n        title = {{pytorch_optimizer: Bunch of optimizer implementations in PyTorch with clean-code, strict types}},\n        version = {1.0.0},\n        year = {2022}\n    }\n\nAuthor\n------\n\nHyeongchan Kim / `@kozistr <http://kozistr.tech/about>`__\n\n.. |workflow| image:: https://github.com/kozistr/pytorch_optimizer/actions/workflows/ci.yml/badge.svg?branch=main\n.. |Documentation Status| image:: https://readthedocs.org/projects/pytorch-optimizers/badge/?version=latest\n   :target: https://pytorch-optimizers.readthedocs.io/en/latest/?badge=latest\n.. |PyPI version| image:: https://badge.fury.io/py/pytorch-optimizer.svg\n   :target: https://badge.fury.io/py/pytorch-optimizer\n.. |PyPi download| image:: https://pepy.tech/badge/pytorch-optimizer\n   :target: https://pepy.tech/project/pytorch-optimizer\n.. |PyPi month download| image:: https://pepy.tech/badge/pytorch-optimizer/month\n   :target: https://pepy.tech/project/pytorch-optimizer\n.. |PyPI pyversions| image:: https://img.shields.io/pypi/pyversions/pytorch-optimizer.svg\n   :target: https://pypi.python.org/pypi/pytorch-optimizer/\n.. |black| image:: https://img.shields.io/badge/code%20style-black-000000.svg\n.. |codecov| image:: https://codecov.io/gh/kozistr/pytorch_optimizer/branch/main/graph/badge.svg?token=L4K00EA0VD\n   :target: https://codecov.io/gh/kozistr/pytorch_optimizer\n',
    'author': 'kozistr',
    'author_email': 'kozistr@gmail.com',
    'maintainer': 'kozistr',
    'maintainer_email': 'kozistr@gmail.com',
    'url': 'https://github.com/kozistr/pytorch_optimizer',
    'packages': packages,
    'package_data': package_data,
    'install_requires': install_requires,
    'extras_require': extras_require,
    'python_requires': '>=3.7.2,<4.0.0',
}


setup(**setup_kwargs)
