# Changelog

All notable changes to Parallel-LLM will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.6.0] - 2025-01-XX

### Fixed
- **Critical CUDA Error Fix**: Fixed `torch.topk` CUDA device-side assert error in `_top_k_filtering`
  - Added vocab_size bounds checking to prevent top_k from exceeding vocabulary size
  - Added safety checks for edge cases (top_k <= 0)
- **Robust Sampling**: Enhanced token sampling with comprehensive safety checks
  - Added NaN and inf value handling in logits using `torch.nan_to_num`
  - Added probability normalization to prevent sampling errors
  - Improved top-p filtering to always keep at least the top token
- **Configuration Validation**: Added `__post_init__` validation to `GenerationConfig`
  - Validates temperature, top_k, top_p, repetition_penalty ranges
  - Ensures all parameters are within safe operating bounds
  - Prevents configuration errors that could cause CUDA failures

### Improved
- Enhanced error resilience in inference pipeline
- Better handling of extreme parameter values
- More robust multimodal inference support

## [0.5.6] - 2025-01-XX

### Fixed
- **Critical CUDA Error Fix**: Resolved device-side assert triggered errors in training and inference
  - Fixed out-of-bounds token indexing (tokens were exceeding vocab_size)
  - Changed `padding_idx` from `vocab_size` (invalid) to `0` (valid)
  - Removed `+1` from embedding size to match vocab_size correctly
  - Added `torch.clamp` for all sampled tokens to ensure [0, vocab_size-1] range
- **Loss Computation**: Fixed loss function to handle padding tokens correctly
  - Added target clamping to valid range
  - Added safety checks for empty masked positions
  - Added null checks for confidence-weighted loss
- **Generator Safety**: Fixed sampling and bounds checking in `ParallelGenerator`
  - Updated mask token initialization
  - Fixed repetition penalty to skip invalid tokens
  - Fixed CUDA graphs setup with valid token ranges
- **Trainer Safety**: Fixed training pipeline token handling
  - Updated to use `mask_token_id` instead of invalid `padding_idx`
  - Added input_ids clamping before training

### Improved
- All token IDs are now guaranteed to be in valid range [0, vocab_size-1]
- Proper handling of mask tokens during training and inference
- Safe loss computation with comprehensive bounds checking

## [0.5.5] - 2025-01-XX

### Added
- Initial stable release with core functionality
- Diffusion-based parallel token generation
- Multimodal support (text + image)
- Distributed training with FSDP and DeepSpeed
- Flash Attention support
- KV cache and CUDA graphs optimization

### Known Issues
- CUDA device-side assert errors on some configurations (fixed in 0.5.6)

---

## Release Notes

### Version 0.6.0 Highlights
This release focuses on **production stability** with critical fixes for CUDA errors during inference:

✅ **Fixed top-k filtering CUDA errors** - No more device-side asserts  
✅ **Enhanced sampling robustness** - Handles NaN/inf values gracefully  
✅ **Configuration validation** - Prevents invalid parameter combinations  
✅ **Improved inference reliability** - Battle-tested on multimodal workloads  

**Upgrade recommended for all users** experiencing CUDA errors during inference.

### Version 0.5.6 Highlights
Major stability release fixing critical CUDA errors:

✅ **No more device-side assert triggered errors**  
✅ **Safe token handling** throughout the pipeline  
✅ **Proper bounds checking** for all tensor operations  
✅ **Fixed embedding size mismatch** issues  

This release ensures smooth execution on all three example scripts:
- `inference_unimodal.py`
- `inference_multimodal.py`
- `train_multimodal.py`

---

## Installation

```bash
# Latest stable version
pip install --upgrade parallel-llm

# Specific version
pip install parallel-llm==0.6.0
```

## Migration Guide

### From 0.5.5 to 0.5.6+
No breaking changes. Simply upgrade:
```bash
pip install --upgrade parallel-llm
```

### From 0.5.6 to 0.6.0
No breaking changes. The fixes are backwards compatible:
```bash
pip install --upgrade parallel-llm
```

If you were working around CUDA errors with custom code, you can now remove those workarounds.

---

For more information, see:
- [Documentation](https://parallel-llm.readthedocs.io/)
- [GitHub Repository](https://github.com/furqan-y-khan/parallel-llm)
- [PyPI Package](https://pypi.org/project/parallel-llm/)

