Metadata-Version: 2.1
Name: udata-analysis-service
Version: 0.0.1.dev38
Summary: udata analysis service
Author-email: Sixte de Maupeou <sixte.de-maupeou@data.gouv.fr>
Description-Content-Type: text/markdown
Requires-Dist: boto3==1.21.21
Requires-Dist: celery==5.2.3
Requires-Dist: click==8.0.4
Requires-Dist: csv-detective==0.4.5
Requires-Dist: flake8==4.0.1
Requires-Dist: flit==3.6.0
Requires-Dist: kafka-python==2.0.2
Requires-Dist: pytest==7.1.1
Requires-Dist: python-dotenv==0.19.2
Requires-Dist: redis==4.1.4
Requires-Dist: requests==2.27.1
Requires-Dist: udata_event_service==0.0.8
Requires-Dist: pytest-mock==3.7.0
Requires-Dist: pytest-asyncio==0.18.3
Project-URL: Home, https://github.com/sixtedemaupeou/udata-analysis-service

# udata-analysis-service

This service's purpose is to analyse udata datalake files to enrich the metadata, starting with CSVs.
It uses csv-detective to detect the type and format of CSV columns by checking both headers and contents.

## Installation

Install **udata-analysis-service**:

```shell
pip install udata-analysis-service
```

Rename the `.env.sample` to `.env` and fill it with the right values.

```shell
REDIS_URL = redis://localhost:6381/0
REDIS_HOST = localhost
REDIS_PORT = 6381
KAFKA_HOST = localhost
KAFKA_PORT = 9092
KAFKA_API_VERSION = 2.5.0
MINIO_URL = https://object.local.dev/
MINIO_USER = sample_user
MINIO_PWD = sample_pwd
ROWS_TO_ANALYSE_PER_FILE=500
CSV_DETECTIVE_REPORT_BUCKET = benchmark-de
CSV_DETECTIVE_REPORT_FOLDER = report
TABLESCHEMA_BUCKET = benchmark-de
TABLESCHEMA_FOLDER = schemas
UDATA_INSTANCE_NAME=udata
```

## Usage

Start the Kafka consumer:

```shell
udata-analysis-service consume
```

Start the Celery worker:

```shell
udata-analysis-service work
```

### Logging & Debugging
The log level can be adjusted using the environment variable LOGLEVEL.
For example, to set the log level to `DEBUG` when consuming Kafka messages, use `LOGLEVEL="DEBUG" udata-analysis-service consume`.

