Metadata-Version: 2.4
Name: FAI-RL
Version: 0.1.5
Summary: Foundation of AI - Reinforcement learning Library
Author-email: Roblox <ylim@roblox.com>, Roblox <mnandwana@roblox.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/Roblox/FAI-RL
Project-URL: Documentation, https://github.com/Roblox/FAI-RL#readme
Project-URL: Repository, https://github.com/Roblox/FAI-RL
Project-URL: Issues, https://github.com/Roblox/FAI-RL/issues
Keywords: reinforcement learning,language models,transformers,rlhf,dpo,ppo,sft
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: torch==2.7.1
Requires-Dist: torchvision==0.22.1
Requires-Dist: torchaudio==2.7.1
Requires-Dist: datasets==4.0.0
Requires-Dist: transformers==4.56.1
Requires-Dist: trl==0.23.0
Requires-Dist: wandb==0.21.0
Requires-Dist: bitsandbytes==0.46.1
Requires-Dist: peft==0.17.0
Requires-Dist: deepspeed==0.17.4
Requires-Dist: ipykernel==6.30.1
Requires-Dist: ipywidgets==8.1.7
Requires-Dist: fsspec==2025.3.0
Requires-Dist: huggingface_hub==0.34.4
Requires-Dist: mpi4py==4.1.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: flake8>=4.0.0; extra == "dev"
Requires-Dist: mypy>=0.950; extra == "dev"

# FAI-RL: Foundation of AI - Reinforcement learning Library

A modular, production-ready library designed for **easy training, inference, and evaluation** of language models using reinforcement learning methods. Currently supports: 
- SFT (Supervised Fine-Tuning)
- DPO (Direct Preference Optimization)
- PPO (Proximal Policy Optimization)
- GRPO (Group Relative Preference Optimization)
- GSPO (Group Sequence Policy Optimization)

## 🚀 Quick Start

Get started with installation, training, inference, and evaluation in just a few commands:

### 📦 Installation

```bash
pip install --extra-index-url https://download.pytorch.org/whl/cu118 FAI-RL
```


### Training

Train a model using SFT, DPO, PPO, GRPO, or GSPO:

```bash
# Single GPU training
fai-rl-train --config configs/training/sft/llama3_3B_lora_recipe.yaml --num-gpus 1

# Multi-GPU training in background (8 GPUs)
fai-rl-train --config configs/training/sft/llama3_3B_lora_recipe.yaml --num-gpus 8 --nohup

# Runtime parameter overrides
fai-rl-train --config configs/training/sft/llama3_3B_lora_recipe.yaml --num-gpus 8 --nohup \
model.base_model_name=Qwen/Qwen3-4B-Instruct-2507 \
training.num_train_epochs=3
```

### Inference

Generate responses from your trained models:

```bash
# Run inference on trained model
fai-rl-inference --config configs/inference/llama3_3B_inference.yaml

# Run inference with debug mode
fai-rl-inference --config configs/inference/llama3_3B_inference.yaml --debug
```

### Evaluation

Evaluate model performance on benchmarks:

```bash
# Evaluate on MMLU benchmark
fai-rl-eval --config configs/evaluation/mmlu/llama3_3B_recipe.yaml

# Evaluate with debug output
fai-rl-eval --config configs/evaluation/mmlu/llama3_3B_recipe.yaml --debug
```

-----

## Flexible Configuration System
* YAML-based configuration for all training parameters
* Pre-configured recipes for popular models
* DeepSpeed ZeRO-3 integration for distributed training


## 📁 Project Structure

```
FAI-RL/
├── core/                      # Core framework components
├── trainers/                  # Training method implementations
├── inference/                 # Inference components
├── evaluations/               # Evaluation system
├── configs/                   # Configuration files
│   ├── training/              # Training configurations
│   ├── inference/             # Inference configurations
│   ├── evaluation/            # Evaluation configurations
│   └── deepspeed/             # DeepSpeed ZeRO configurations
├── utils/                     # Utility modules
├── scripts/                   # Scripts
├── logs/                      # Training logs (auto-generated)
└── outputs/                   # Inference output (auto-generated)
```

-----

## 🔗 Quick Links

* **[Training Guide](./trainers/README.md)** - Comprehensive guide to configuring and running model training with detailed parameter explanations
* **[Inference Guide](./inference/README.md)** - Running model inference and text generation
* **[Evaluation Guide](./evaluations/README.md)** - Evaluating model performance on standard benchmarks

## Memory Optimization

FAI-RL supports various techniques to train large models efficiently:

* **Full Fine-tuning:** Train all model parameters (requires most memory)
* **LoRA:** Parameter-efficient training (~10% memory of full fine-tuning)
* **QLoRA:** 4-bit quantized LoRA (train 7B+ models on single consumer GPU)
* **DeepSpeed ZeRO-3:** Distributed training for models that don't fit on single GPU

## 🧪 Tested Environment

This framework has been validated on:

* **Instance:** AWS EC2 p4d.24xlarge
* **GPUs:** 8 x NVIDIA A100-SXM4-80GB (80GB VRAM each)
* **CPU:** 96 vCPUs
* **Memory:** 1152 GiB
* **Storage:** 8TB NVMe SSD
* **Network:** 400 Gbps

## 🛠 For Maintainers

To release a new version of FAI-RL:

1. Update version in pyproject.toml:
```bash
[project]
name = "FAI-RL"
version = "__NEW_VERSION__"
```

2. Build and upload the package:
```bash
# Upgrade pip and build tools
pip install --upgrade pip
pip install build twine

# Clean previous builds
rm -rf dist/ build/ *.egg-info

# Build the package
python -m build

# Upload to PyPI
python -m twine upload dist/*
```
