Metadata-Version: 2.1
Name: pyQualitas
Version: 1.0.3
Summary: A project to ensure the data quality using python
Home-page: https://github.com/IamVenkatesh/pyQuality/wiki
Author: Venkatesh Venkataramani
Author-email: venkatesh.venkataramani@gmail.com
Classifier: Programming Language :: Python :: 3.8
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: Unix
Classifier: Development Status :: 4 - Beta
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Topic :: Software Development :: Testing
Description-Content-Type: text/markdown
License-File: LICENSE

**PyQualitas**

This project aims towards developing a python library ensure quality of the data. This project is an inspiration from deequ and 
dataflare which are also aimed towards the quality of the data.

**Requirements:**

1. Pyspark - Version 3.3.0
2. Pandas - Version 1.5.0
3. Jinja2 - Version 3.1.2

**Installation:**

This section will be updated upon reaching the milestone of project packaging

**Use Cases:**

The main agenda behind creating this library is to help the QA Engineers to ensure quality of the data. Given the volume of the data & the frequency of the releases happening in the industry, there is an enormous responsibility on the Quality Assurance team to ensure & sign-off the quality of the data generated by the application. 

It is very hard to achieve this using manual testing and scheduling an automated validation helps achieve the timelines and ensure a high quality of the data with less efforts.

There are various tests in this library that would come in handy during the regression testing process. Since the project is implemented in Python, the learning curve is short when compared to the libraries that are available in Scala.

The documentation can be found in the following link:

https://github.com/IamVenkatesh/pyQualitas/wiki
