# SLAMA: LightAutoML on Spark

SLAMA is a version of [LightAutoML library](https://github.com/sb-ai-lab/LightAutoML) modified to run in distributed mode with Apache Spark framework.

It requires:
1. Python 3.9
2. PySpark 3.2+ (installed as a dependency)
3. [Synapse ML library](https://microsoft.github.io/SynapseML/)
   (It will be downloaded by Spark automatically)

Currently, only tabular Preset is supported. See demo with spark-based tabular automl
preset in [examples/spark/tabular-preset-automl.py](https://github.com/sb-ai-lab/SLAMA/tree/main/examples/spark/tabular-preset-automl.py).
For further information check docs in the root of the project containing dedicated SLAMA section.

<a name="apache"></a>
# License
This project is licensed under the Apache License, Version 2.0. See [LICENSE](https://github.com/sb-ai-lab/SLAMA/tree/main/LICENSE) file for more details.


# Installation

1. First of all you need to install [git](https://git-scm.com/downloads) and [poetry](https://python-poetry.org/docs/#installation).

2. Clone repo and install all dependencies

```bash

# Load SLAMA source code
git clone https://github.com/sb-ai-lab/SLAMA.git

cd SLAMA/

# !!!Choose only one item!!!

# Create virtual environment inside your project directory
poetry config virtualenvs.in-project true

# For more information read poetry docs

# Install SLAMA
poetry install
```

3. Install SLAMA jars

* Download the jar when starting the spark session:

```python
from pyspark.sql import SparkSession

spark = SparkSession \
    .builder \
    .appName("SLAMA") \
    .config("spark.jars.repositories", "https://oss.sonatype.org/content/repositories/releases") \
    .config("spark.jars.packages", "io.github.sb-ai-lab:spark-lightautoml_2.12:0.1") \
    .getOrCreate()
...
```

* Or download the lastest [jar](https://repository.sonatype.org/service/local/artifact/maven/redirect?r=central-proxy&g=io.github.sb-ai-lab&a=spark-lightautoml_2.12&v=LATEST) and add it localy:

```python
from pyspark.sql import SparkSession

spark = SparkSession \
    .builder \
    .appName("SLAMA") \
    .config("spark.jars.packages", "JAR_DIR/spark-lightautoml_2.12-0.1.jar") \
    .getOrCreate()
...
```


# Сonfiguring the cluster

You can find information about setting up different types of clusters to use the code in the [documentation](https://github.com/sb-ai-lab/SLAMA/tree/main/docs).
