Metadata-Version: 2.1
Name: cuoco
Version: 0.1.5
Summary: Cuoco is a tool for automatic data preprocessing. Cuoco comes from Italy, means chef.
Author: Francisco de Borja Garcia Lamas
Author-email: borjagl2014@gmail.com
License: : OSI Approved :: MIT License
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE


# CUOCO

Cuoco is a tool for automatic processing of data. 

## Example
Import the library

`import cuoco`

`from cuoco import dataPipeline`

Use the dataPipeline

`dataPipeline.readJson('/content/biostats.csv', '/content/jsonTESTFILE.json')`

## Documentation


How it works:
Cuoco uses a json created by the user to automatically apply data-processing functions to the desired dataset. The Json has the next values:

- input_format: format of the input dataset. Can be csv, parquet, orc and txt
- output_format: format of the resulted dataset. Can be csv, parquet, orc and txt
- new_fileName: name of the new dataset the DataChef will write
- new_file_route: route where to store the new data file
- index: if you want your final dataset to have a row index. Can be:
    - yes
    - no
- header: if yor datasets has a header. Can be yes or none
- separator: the separator of your dataset. Only applies if its csv o txt format. 
- num_nans: method you want to use against possible numerical nans (include empties). Can be:
    - drop: drop rows that contains nans
    - yes: dont do anything with rows that contains nans
    - mean: fill nans with the mean value of the column
    - median: fill nans with the median value of the column
    - mode: fill nans with the mode value of the column
- str_nans: method you want to use against possible string nans (include empties). Can be:
    - yes: keep nans columns
    - no: drop nans columns
- caps: method you want to use with strings that contains Upper and Lower case letters:
    - no: dont do anything
    - upper: put all strings of string columns to uppercase
    - lower: put all strings of string columns to lowercase
- normalize_method: method to use to normalize numerical columns. Can be:
    - no: dont normalize
    - max_abs: uses max absolute value to normalize 
    - min_max: uses min - max value method to normalize 
    - z_score: uses z-score value method to normalize
- normalize:
    - write the name of the columns you want to normalize
    - Note: if yor dataset does not have a header, you must write the columns's names you want to 
            normalize in number format, if it has a header you must write the columns's names between ""
- balance_data: if you want to balance your data (recomended for AI datasets). Can be:
    - yes
    - no
- Inside balance_params there are two items:
  - balance_method: mehod you want for oversampling. Can be:
    - random: random oversampling
    - smote: perform SMOTE technique for oversampling.
  - y_col: column of the dataset you want to use as target for the balance

