# -*- coding: utf-8 -*-
from setuptools import setup

packages = \
['bitnet']

package_data = \
{'': ['*']}

install_requires = \
['einops', 'torch', 'zetascale']

setup_kwargs = {
    'name': 'bitnet',
    'version': '0.1.2',
    'description': 'bitnet - Pytorch',
    'long_description': '[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)\n\n# BitNet\n![bitnet](/bitnet.png)\nPyTorch Implementation of the linear methods and model from the paper "BitNet: Scaling 1-bit Transformers for Large Language Models"\n\n[Paper link:](https://arxiv.org/pdf/2310.11453.pdf)\n\nBitLinear = tensor -> layernorm -> Binarize -> abs max quantization -> dequant\n\n"The implementation of the BitNet architecture is quite simple, requiring only the replacement of linear projections (i.e., nn.Linear in PyTorch) in the Transformer. " -- BitNet is really easy to implement just swap out the linears with the BitLinear modules! \n\n## **NEWS**\n- BitNet Transformer has been trained using the `train.py` file that trains on enwiki8 a small 1gb dataset of wikipedia: [HERE IS THE LINK](https://drive.google.com/file/d/1gBuZRFBqMV3cVD902LXA_hmZl4e0dLyY/view)\n- **New Iteration** 🔥 There is an all-new iteration from the paper "[The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits](https://arxiv.org/abs/2402.17764)", we\'re implementing it now. Join the Agora discord and contribute! [Join Here](https://discord.gg/hFzevCjG8c)\n\n## Appreciation\n- Dimitry, Nullonix for analysis and code review and revision\n- Vyom, for providing 4080 to train!\n\n## Installation\n`pip install bitnet`\n\n## Usage:\n\n### `BitLinear`\n- Example of the BitLinear layer which is the main innovation of the paper!\n```python\nimport torch\n\nfrom bitnet import BitLinear\n\n# Input\nx = torch.randn(10, 512)\n\n# BitLinear layer\nlayer = BitLinear(512, 400)\n\n# Output\ny = layer(x)\n\nprint(y)\n```\n----\n\n### `BitNetTransformer`\n- Fully implemented Transformer as described in the diagram with MHA, and BitFeedforwards\n- Can be utilized not just for text but for images and maybe even video or audio processing\n- Complete with residuals and skip connections for gradient flow\n\n```python\nimport torch\n\nfrom bitnet import BitNetTransformer\n\nbitnet = BitNetTransformer(\n    num_tokens=20000,\n    dim=512,\n    depth=6,\n    dim_head=64,\n    heads=8,\n    ff_mult=4,\n)\n\ntokens = torch.randint(0, 20000, (1, 512))\nlogits = bitnet(tokens)\nprint(logits.shape)\n```\n\n\n### `BitAttention`\nThis Attention has been modified to use BitLinear instead of the default linear projection. It\'s also using Multi-Grouped Query Attention instead of regular multi-head attention for faster decoding and longer context handling.\n\n```python\nimport torch\nfrom bitnet import BitMGQA\n\n# Create a random tensor of shape (1, 10, 512)\nx = torch.randn(1, 10, 512)\n\n# Create an instance of the BitMGQA model with input size 512, 8 attention heads, and 4 layers\ngqa = BitMGQA(512, 8, 4)\n\n# Pass the input tensor through the BitMGQA model and get the output and attention weights\nout, attn = gqa(x, x, x, need_weights=True)\n\n# Print the shapes of the output tensor and attention tensor\nprint(out.shape, attn.shape)\n```\n\n### `BitFeedForward`\n- Feedforward as shown in the diagram with BitLinear and a GELU:\n- Linear -> GELU -> Linear\n- You can add dropouts, or layernorms, or other layers for a better ffn\n\n```python\nimport torch\n\nfrom bitnet.bitffn import BitFeedForward\n\n# Random input\nx = torch.randn(10, 512)\n\n# FFN\nff = BitFeedForward(512)\n\n# Apply FFN\ny = ff(x)\n\nprint(y.shape)\n# torch.Size([10, 512])\n```\n\n## Inference\n```python\nfrom bitnet import BitNetInference\n\nbitnet = BitNetInference()\nbitnet.load_model("../model_checkpoint.pth")  # Download model\noutput_str = bitnet.generate("The dog jumped over the ", 512)\nprint(output_str)\n```\n\n## Huggingface Usage\n```python\nimport torch\nfrom transformers import AutoModelForSequenceClassification, AutoTokenizer\n\nfrom bitnet import replace_linears_in_hf\n\n# Load a model from Hugging Face\'s Transformers\nmodel_name = "bert-base-uncased"\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nmodel = AutoModelForSequenceClassification.from_pretrained(model_name)\n\n# Replace Linear layers with BitLinear\nreplace_linears_in_hf(model)\n\n# Example text to classify\ntext = "Replace this with your text"\ninputs = tokenizer(\n    text, return_tensors="pt", padding=True, truncation=True, max_length=512\n)\n\n# Perform inference\nmodel.eval()  # Set the model to evaluation mode\nwith torch.no_grad():\n    outputs = model(**inputs)\n    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)\n    print(predictions)\n\n# Process predictions\npredicted_class_id = predictions.argmax().item()\nprint(f"Predicted class ID: {predicted_class_id}")\n\n# Optionally, map the predicted class ID to a label, if you know the classification labels\n# labels = ["Label 1", "Label 2", ...]  # Define your labels corresponding to the model\'s classes\n# print(f"Predicted label: {labels[predicted_class_id]}")\n```\n\n# License\nMIT\n\n# Citation\n```bibtex\n@misc{2310.11453,\nAuthor = {Hongyu Wang and Shuming Ma and Li Dong and Shaohan Huang and Huaijie Wang and Lingxiao Ma and Fan Yang and Ruiping Wang and Yi Wu and Furu Wei},\nTitle = {BitNet: Scaling 1-bit Transformers for Large Language Models},\nYear = {2023},\nEprint = {arXiv:2310.11453},\n}\n\n```\n\n\n# Todo\n- [x] Double check BitLinear implementation and make sure it works exactly as in paper \n- [x] Implement training script for `BitNetTransformer`\n- [x] Train on Enwiki8, copy and past code and data from Lucidrains repos\n- [x] Benchmark performance\n- [x] Look into Straight Through Estimator for non-differentiable backprop\n- [x] Implement BitFeedForward\n- [x] Clean up codebase \n- [x] Add unit tests for each module\n- [ ] Implement the new BitNet1.5b from the [paper](https://arxiv.org/abs/2402.17764)\n- [ ] Implement the BitNet15b in Cuda\n',
    'author': 'Kye Gomez',
    'author_email': 'kye@apac.ai',
    'maintainer': 'None',
    'maintainer_email': 'None',
    'url': 'https://github.com/kyegomez/bitnet',
    'packages': packages,
    'package_data': package_data,
    'install_requires': install_requires,
    'python_requires': '>=3.6,<4.0',
}


setup(**setup_kwargs)
