Metadata-Version: 2.3
Name: secureml
Version: 0.2.4
Summary: A Python library for privacy-preserving machine learning
License: MIT
Keywords: machine learning,privacy,security,compliance,gdpr
Author: Enzo Paloschi Biondo
Author-email: enzobiondo11@outlook.com
Requires-Python: >=3.11,<3.13
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Provides-Extra: pdf
Provides-Extra: vault
Requires-Dist: click (>=8.1.8,<9.0.0)
Requires-Dist: faker (>=37.0.1,<38.0.0)
Requires-Dist: flwr[simulation] (>=1.17.0,<2.0.0)
Requires-Dist: hvac (>=1.1.1,<2.0.0) ; extra == "vault"
Requires-Dist: jinja2 (>=3.1.4,<4.0.0)
Requires-Dist: matplotlib (>=3.7.1,<4.0.0)
Requires-Dist: numpy (>=1.26.4,<2.0.0)
Requires-Dist: opacus (>=1.5.3,<2.0.0)
Requires-Dist: pandas (>=2.2.3,<3.0.0)
Requires-Dist: pyarrow (>=19.0.1,<20.0.0)
Requires-Dist: pydantic (>=2.11.3,<3.0.0)
Requires-Dist: pyyaml (==6.0.2)
Requires-Dist: scikit-learn (>=1.6.1,<2.0.0)
Requires-Dist: sdv
Requires-Dist: spacy (>=3.8.3,<4.0.0)
Requires-Dist: sphinx-rtd-theme (>=1.3.0,<2.0.0)
Requires-Dist: torch (>=2.6.0,<3.0.0)
Requires-Dist: weasyprint (>=64.1,<65.0) ; extra == "pdf"
Project-URL: Documentation, https://secureml.readthedocs.io
Project-URL: Repository, https://github.com/scimorph/secureml
Description-Content-Type: text/markdown

<p align="center">
  <img src="https://github.com/scimorph/secureml/blob/master/secureml_logo_2-.png" alt="SecureML Logo" width="500">
</p>

<p align="center">
  <a href="https://github.com/scimorph/secureml/actions/workflows/ci.yml"><img src="https://img.shields.io/github/actions/workflow/status/scimorph/secureml/ci.yml?branch=master&label=CI/CD&logo=github" alt="CI/CD Status"></a>
  <a href="https://github.com/scimorph/secureml/actions/workflows/ci.yml"><img src="https://img.shields.io/github/actions/workflow/status/scimorph/secureml/ci.yml?branch=master&label=tests&logo=pytest" alt="Tests Status"></a>
  <a href="https://pypi.org/project/secureml/"><img src="https://img.shields.io/pypi/v/secureml.svg" alt="PyPI Version"></a>
  <a href="https://github.com/scimorph/secureml/blob/master/LICENSE"><img src="https://img.shields.io/github/license/scimorph/secureml" alt="License"></a>
  <img src="https://img.shields.io/pypi/pyversions/secureml.svg" alt="Python Versions">
</p>

<h3 align="center">
  <a href="https://secureml.readthedocs.io/en/latest/index.html">Documentation</a>
</h3>

SecureML is an open-source Python library that integrates with popular machine learning frameworks like TensorFlow and PyTorch. It provides developers with easy-to-use utilities to ensure that AI agents handle sensitive data in compliance with data protection regulations.

## Key Features

- **Data Anonymization Utilities**:
  - K-anonymity implementation with adaptive generalization
  - Pseudonymization with format-preserving encryption
  - Configurable data masking with statistical property preservation
  - Hierarchical data generalization with taxonomy support
  - Automatic sensitive data detection
- **Privacy-Preserving Training Methods**: 
  - Differential privacy integration with PyTorch (via Opacus) and TensorFlow (via TF Privacy)
  - Federated learning with Flower, allowing training on distributed data without centralization
  - Support for secure aggregation and privacy-preserving federated learning
- **Compliance Checkers**: Tools to analyze datasets and model configurations for potential privacy risks
- **Synthetic Data Generation**: Utilities to create synthetic datasets that mimic real data
- **Regulation-Specific Presets**: 
  - Pre-configured YAML settings aligned with major regulations (GDPR, CCPA, HIPAA)
  - Detailed compliance requirements for each regulation
  - Customizable identifiers for personal data and sensitive information
  - Integration with compliance checking functionality
- **Audit Trails and Reporting**: Automatic logging of privacy measures and model decisions

## Installation

With pip (Python 3.11-3.12):
```bash
pip install secureml
```
### Optional Dependencies

```bash
# For generating PDF reports for compliance and audit trails
pip install secureml[pdf]

# For secure key management with HashiCorp Vault
pip install secureml[vault]

# For all optional components
pip install secureml[pdf,vault]
```

## Quick Start

### Data Anonymization

Anonymizing a dataset to comply with privacy regulations:

```python
import pandas as pd
from secureml import anonymize

# Load your dataset
data = pd.DataFrame({
    "name": ["John Doe", "Jane Smith", "Bob Johnson"],
    "age": [32, 45, 28],
    "email": ["john.doe@example.com", "jane.smith@example.com", "bob.j@example.com"],
    "ssn": ["123-45-6789", "987-65-4321", "456-78-9012"],
    "zip_code": ["10001", "94107", "60601"],
    "income": [75000, 82000, 65000]
})
    
# Anonymize using k-anonymity
anonymized_data = anonymize(
    data,
    method="k-anonymity",
    k=2,
        sensitive_columns=["name", "email", "ssn"]
    )
    
    print(anonymized_data)
```

### Compliance Checking with Regulation Presets

SecureML includes built-in presets for major regulations (GDPR, CCPA, HIPAA) that define the compliance requirements specific to each regulation:

```python
import pandas as pd
from secureml import check_compliance
    
# Load your dataset
data = pd.read_csv("your_dataset.csv")
    
# Model configuration
model_config = {
    "model_type": "neural_network",
    "input_features": ["age", "income", "zip_code"],
    "output": "purchase_likelihood",
    "training_method": "standard_backprop"
}
    
# Check compliance with GDPR
report = check_compliance(   
    data=data,
    model_config=model_config,
    regulation="GDPR"
)
    
# View compliance issues
if report.has_issues():
    print("Compliance issues found:")
    for issue in report.issues:
        print(f"- {issue['component']}: {issue['issue']} ({issue['severity']})")
        print(f"  Recommendation: {issue['recommendation']}")

```

### Privacy-Preserving Machine Learning

Train a model with differential privacy guarantees:

```python
import torch.nn as nn
import pandas as pd
from secureml import differentially_private_train
    
# Create a simple PyTorch model
model = nn.Sequential(
    nn.Linear(10, 64),
    nn.ReLU(),
    nn.Linear(64, 2),
    nn.Softmax(dim=1)
)
    
# Load your dataset
data = pd.read_csv("your_dataset.csv")
    
# Train with differential privacy
private_model = differentially_private_train(
    model=model,
    data=data,
    epsilon=1.0,  # Privacy budget
    delta=1e-5,   # Privacy delta parameter
    epochs=10,
    batch_size=64
)
```

### Synthetic Data Generation

Generate synthetic data that maintains the statistical properties of the original data:

```python
import pandas as pd
from secureml import generate_synthetic_data
    
# Load your dataset
data = pd.read_csv("your_dataset.csv")
    
# Generate synthetic data
synthetic_data = generate_synthetic_data(
    template=data,
    num_samples=1000,
    method="statistical",  # Options: simple, statistical, sdv-copula, gan
    sensitive_columns=["name", "email", "ssn"]
)
    
print(synthetic_data.head())
```

## Documentation

For detailed documentation, examples, and API reference, visit [our documentation](https://secureml.readthedocs.io).

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request or Issue.
Our focus is expanding supported legislations beyond GDPR, CCPA, and HIPAA. You can help us with that!

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
