Metadata-Version: 2.1
Name: smart-data-science
Version: 0.1.2
Summary: Personal side project to streamline the most common tasks of data science solutions in an efficient manner. This project is based on my experience as a lead data scientist in the industry and financial services sectors, where I have gained expertise in delivering effective data-driven insights and solutions
License: MIT
Author: Angel Martinez-Tenor
Author-email: angelmtenor@gmail.com
Requires-Python: >=3.10,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Provides-Extra: full
Provides-Extra: ml
Provides-Extra: plot
Provides-Extra: ui
Requires-Dist: chart-studio (>=1.1.0)
Requires-Dist: explainerdashboard (>=0.4.2.1); extra == "full" or extra == "ml"
Requires-Dist: fastapi (>=0.93.0)
Requires-Dist: flask-simplelogin (>=0.1.2); extra == "full"
Requires-Dist: google-cloud-aiplatform (>=1.25.0); extra == "full"
Requires-Dist: graphviz (>=0.20.1)
Requires-Dist: gunicorn (>=20.1.0)
Requires-Dist: ipython (>=8.10)
Requires-Dist: jupyter (>=1.0.0)
Requires-Dist: jupyter-contrib-nbextensions (>=0.7.0)
Requires-Dist: langchain (>=0.0.186)
Requires-Dist: lightgbm (>=3.3.4); extra == "full" or extra == "ml"
Requires-Dist: mapie (>=0.6.4); extra == "full" or extra == "ml"
Requires-Dist: matplotlib (>=3.6.3); extra == "plot"
Requires-Dist: openai (>=0.27.6); extra == "full"
Requires-Dist: pandarallel (>=1.6.3)
Requires-Dist: pandas (>=1.5.0)
Requires-Dist: pandas-profiling (>=3.4.0); (python_version >= "3.10" and python_version < "3.11") and (extra == "full")
Requires-Dist: pandera (>=0.13.4)
Requires-Dist: plotly (>=5.10.0); extra == "full" or extra == "plot"
Requires-Dist: psutil (>=5.9.4)
Requires-Dist: py-cpuinfo (>=9.0.0)
Requires-Dist: pyarrow (>=9.0.0)
Requires-Dist: python-dotenv (>=1.0.0)
Requires-Dist: scikit-learn (>=1.2.0); extra == "full" or extra == "ml"
Requires-Dist: scikit-optimize (>=0.9.0); extra == "ml"
Requires-Dist: seaborn (>=0.12.2); extra == "plot"
Requires-Dist: sentence-transformers (>=2.2.2); (python_version >= "3.10" and python_version < "4.0") and (extra == "full")
Requires-Dist: shap (>=0.41.0); extra == "full" or extra == "ml"
Requires-Dist: streamlit (>=1.15.0); (python_version >= "3.10" and python_version < "4.0") and (extra == "full" or extra == "ui")
Requires-Dist: tabulate (>=0.9.0)
Requires-Dist: tqdm (>=4.64.1)
Requires-Dist: ua-parser (>=0.16.1)
Requires-Dist: uvicorn (>=0.20.0)
Requires-Dist: xgboost (>=1.7.3); extra == "ml"
Description-Content-Type: text/markdown

# smart_data_science

Personal side project to streamline the most common tasks of data science solutions in an efficient manner. This project is based on my experience as a lead data scientist in the industry and financial services sectors, where I have gained expertise in delivering effective data-driven insights and solutions

[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)<br>

## Installation in Dev / Editor mode

Note: A Debian/Ubuntu Machine, VM or container is highly recommended


**Step 0: One-time Machine setup only valid for all Data Science Projects**

Create or use a Machine with Conda, Git and Poetry as closely as defined in `.devcontainer/Dockerfile`:

- This Dockerfile contains a non-root user so the same configuration can be applied to a WSL Ubuntu Machine and any Debian/Ubuntu CLoud Machine (Vertex AI workbench, Azure VM ...)
- In case of having an Ubuntu/Debian machine with non-root user (e.g.: Ubuntu in WSL, Vertex AI VM ...), just install the tools from  "non-root user" (no sudo) section of the Dockerfile  (sudo apt-get install \<software\> may be required)
- A pre-configured Cloud VM usually has Git and Conda pre-installed, those steps can be skipped
- The development container defined in `.devcontainer/Dockerfile` can be directly used for a fast setup (Docker required).  With Visual Studio Code, just open the root folder of this repo, press `F1` and select the option **Dev Containers: Open Workspace in Container**. The container will open the same workspace after the Docker Image is built.


**Step 1**. Enter to the root path of the repo and use or create a new conda environment for development:

```bash
$ conda create -n dev python=3.10 -y && conda activate dev
```

**Step 2**. Install all the Dependencies and the package in editor mode:

```bash
$ make setup
```

## Installation for Production/Usage (Not published in PyPi yet)
```bash
$ conda create -n smart python=3.10 -y && conda activate smart
$ pip install dist/smart-data-science-0.1.1-py3-none-any.whl
```

## Installation for Production/Usage (after the package is published in PyPi)

```bash
$ pip install smart_data_science
```


## Usage

- Still under development. Please refer to the notebooks and examples folders for usage examples

## Contributing

Check out the contributing guidelines

## License

`smart_data_science` was created by Angel Martinez-Tenor. It is licensed under the terms of the MIT license.

## Credits

`smart_data_science` was created from a Data Science Template developed by Angel Martinez-Tenor. The template was built upon `py-pkgs-cookiecutter` [template] (https://github.com/py-pkgs/py-pkgs-cookiecutter)

