Metadata-Version: 2.1
Name: cdm-csv-loader
Version: 0.1.5
Summary: A package for loading OHDSI CDM CSV files into a relational database.
Home-page: https://github.com/DavidIkechi/ohdsi_cdm_loader.git
Author: David Chibuike Ikechi Akwuru
Author-email: akwuru.david@ul.ie
License: MIT
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.7
Description-Content-Type: text/markdown

This repository provides scripts to load Common Data Model (CDM) data from OHDSI's standardized vocabularies (version 5.4 or 5.3) into CDM tables in a relational database. It is designed for the OHDSI community and those working with OHDSI's Common Data Model for large-scale observational research.

This project has been primarily tested with PostgreSQL. We hope to expand across other databases supported within the OHDSI community in upcoming versions.

## Requirements

### Python and Database Requirements

- Python 3.9
- PostgreSQL database, still checking for others.

### R Requirements

Some of the processes and dependencies in the OHDSI environment may require specific R packages to interact with the OHDSI CDM and tools. Ensure the following R packages are installed:

```r
# Install OHDSI-specific R packages
install.packages("devtools")
install.packages("DatabaseConnector")
install.packages("SqlRender")
devtools::install_github("OHDSI/CommonDataModel")  # For working with CDM-related functionality
install.packages("arrow")
```

#### Example (Python): It uses the default port

```python
from ohdsi_cdm_loader.db_connector import DatabaseHandler

database_connector = DatabaseHandler(
    db_type="postgresql",  # Database type (e.g., postgresql)
    host="localhost",      # Database host
    user="postgres",       # Database user
    password="your_password",  # Database password
    database="ohdsi_cdm",  # OHDSI CDM database
    driver_path="path_to_driver"  # path to driver for selected database
    schema="schema" # schema for holding the database table.-- make sure it is set.
    port=port # integer that defines the port, 5432 is used if not specified.
)

### connection to the database.
db_conn = database_connector.connect_to_db()

if db_conn:
    print("Connected to the database successfully!")
else:
    print("Failed to connect to the database.")
```

### 2. `CSVLoader`

This loads the OHDSI CDM vocabularies (version 5.3 or 5.4) from CSV files into the CDM tables in the database.

#### Key Features:
- Loads all CSV files for the standardized vocabularies from the specified directory into the corresponding create database. For clarity the database can be created using the execute_ddl function from the commondatamodel package.

```python
from ohdsi_cdm_loader.db_connector import DatabaseHandler

# Initialize the database connection
# Initialize the database connection
database_connector = DatabaseHandler(
    db_type="postgresql",  # Database type (e.g., postgresql)
    host="localhost",      # Database host
    user="postgres",       # Database user
    password="your_password",  # Database password
    database="ohdsi_cdm",  # OHDSI CDM database
    driver_path="path_to_driver"  # path to driver for selected database
    schema="schema" # schema for holding the database table.-- make sure it is set.
    port=port # integer that defines the port, 5432 is used if not specified.
)

# Connect to the CDM database
db_conn = database_connector.connect_to_db()
# generate the table in the database
database_connector.execute_ddl(cdm_version = "value")
```
- Uses the active database connection and CDM-compliant table structure.

#### Example (Python):
#### Note: please download the latest vocabulary from [OHDSI vocabulary list](https://athena.ohdsi.org/vocabulary/list)

```python
from ohdsi_cdm_loader.load_csv import CSVLoader
csv_loader = CSVLoader(
    db_connection=db_conn,  # Active database connection
    database_handler=database_connector
)

csv_loader.load_all_csvs("path_to_downloaded_csv_directory")
```

### 3. `main.py`
#### Usage:
#### Workflow in Main Script:
#### Environment Variables

optional (using environment variables) - recommended for flexibility and security.

```bash
- `DB_PORT`: The port number for the database.
- `DB_TYPE`: The type of the database (e.g., `postgresql`, `mysql`).
- `DB_SERVER`: The server address of the database.
- `DB_NAME`: The name of the database.
- `DB_PASSWORD`: The password for the database user.
- `DB_USER`: The username for the database.
- `DRIVER_PATH`: The path to the database driver.
- `DB_SCHEMA`: The schema name in the database.
- `CSV_PATH`: The path to the CSV file to be loaded.
- `CDM_VERSION`: The version of the Common Data Model (CDM).
```
```python
import os
from ohdsi_cdm_loader.db_connector import DatabaseHandler

database_connector = DatabaseHandler(
    db_type=os.getenv('DB_TYPE'),
    host=os.getenv('DB_SERVER'),
    user=os.getenv('DB_USER'),
    password=os.getenv('DB_PASSWORD'),
    database=os.getenv('DB_NAME'),
    driver_path=os.getenv('DRIVER_PATH'),
    schema=os.getenv('DB_SCHEMA'),
    port=os.getenv('DB_PORT')
)

db_conn = database_connector.connect_to_db()
csv_loader = CSVLoader(db_conn, database_connector)
csv_loader.load_all_csvs(os.get('CSV_PATH'))

# to create the cdm tables. (optional)
database_connector.execute_ddl(os.get('CDM_VERSION'))
```

## Credits

This project is designed to work with OHDSI's Common Data Model (CDM) and standardized vocabularies. The tools and processes used here are compatible with OHDSI standards, and the database loader has been tested specifically for PostgreSQL, though it should work with other databases supported by OHDSI.

<a href="https://ohdsi.org">
  <img src="https://res.cloudinary.com/dc29czhf9/image/upload/v1729287157/h243-ohdsi-logo-with-text_hhymri.png" alt="OHDSI" width="100"/>
</a>

**OHDSI** (Observational Health Data Sciences and Informatics) is a multi-stakeholder, interdisciplinary collaborative that aims to bring out the value of observational health data through large-scale analytics. Learn more about OHDSI and the CDM on the [official OHDSI website](https://ohdsi.org).

<a href="https://ehealth4cancer.org">
  <img src="https://res.cloudinary.com/dc29czhf9/image/upload/v1729287084/download_umxgmo.jpg" alt="eHealth Hub Limerick" width="100"/>
</a>

This project was also supported by **eHealth Hub Limerick**, contributing to the development and deployment of health data tools for innovative healthcare solutions. Learn more about eHealth Hub Limerick at [eHealth Hub Limerick's official website](https://ehealth4cancer.org).

## License

This project is licensed under the MIT License. See the `LICENSE` file for more details.

