Metadata-Version: 2.1
Name: template_data_project
Version: 0.0.6
Summary: data engineering package template involves setting up a directory structure and including essential files and documentation that data engineers can use as a starting point for their projects.
Author-email: ABOULOIFA Najib <najib.abouloifa@gmail.com>
Project-URL: Homepage, https://github.com/ABNajib/data_engineering_template
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE.md

## Project Overview

This repository is a structured template for organizing a data engineering project. The project is divided into various folders to facilitate effective management of project assets, resources, and code. Below is an overview of the project's folder structure

## Folder Structure

- **Project Root Folder**: The top-level folder for the entire data engineering project.

- **Documentation**: Contains all project-related documentation, including design documents, project plans, and README files.

- **Source Code**: Houses all the code related to data extraction, transformation, loading (ETL), data processing, and data infrastructure.

  - **ETL Scripts**: Subfolder for ETL scripts and code.

  - **Data Processing**: Subfolder for data processing scripts and code.

  - **Infrastructure as Code (IaC)**: If using cloud services like AWS, Azure, or Google Cloud, store Infrastructure as Code templates (e.g., Terraform or CloudFormation) here.

- **Data**: Contains all the data used by the project.

  - **Raw Data**: Stores raw, unprocessed data from various sources.

  - **Processed Data**: Holds cleaned, transformed, and structured data ready for analysis.

- **Configurations**: Stores configuration files for various components of the project, such as database connection strings, ETL job parameters, and API keys.

- **Testing**: Includes test scripts and data used for testing ETL processes and data quality.

- **Logs**: Stores log files generated by ETL processes, data pipelines, and system components.

- **Reports and Visualizations**: Contains reports, dashboards, and visualizations created from the processed data.

- **Libraries and Dependencies**: Houses libraries and dependencies required for running the project's code.

- **Infrastructure**: If you're managing the infrastructure, include subfolders for cloud service configurations, server configurations, and networking configurations.

- **Environments**: Subfolders for different environments like "Development," "Staging," and "Production" if you have separate environments for your project.

- **Utilities**: Include any utility scripts or tools used for project management, data validation, or monitoring.

- **Archives**: Store backups, historical data, or archived project versions.

- **README Files**: Each subfolder may contain a README file explaining its contents and usage.

- **Tests and Test Data**: If you have specific folders for automated testing or test data, include them here.

- **Docker **: If using Docker containers, you may have a folder for Dockerfiles and related configuration files.

- **CI/CD **: If you have a continuous integration/continuous deployment pipeline, include relevant configuration and scripts here.

## Getting Started

<div class="container my-4">
        <div class="card">
            <div class="card-body">
                <ul>
                    <li> <strong>from</strong> template_data_project <strong>import</strong>  project_structure</li>
                    <li>project_structure.template(root_dir)</li>
                </ul>
            </div>
        </div>
</div>



## License

This project is licensed under the [MIT License](LICENSE.md).

## Acknowledgments

- This project structure template is provided as a starting point for organizing data engineering projects effectively.

# Build package
  <a>https://packaging.python.org/en/latest/tutorials/packaging-projects/</a>
