Metadata-Version: 2.1
Name: dedupe_FuzzyWuzzy
Version: 1.0.2
Summary: Deduplication using RapidFuzz library.
Home-page: https://github.com/Gandharv30/dedupe-FuzzyWuzzy
Author: Gandharv Pathak
Author-email: pathakgandharv@gmail.com
License: MIT
Description: # dedupe-FuzzyWuzzy
        Deduplication of a data set using rapidfuzz library which is just the same as FuzzyWuzzy but is a lot faster.
        
        # Installation
        pip install dedupe-FuzzyWuzzy
        
        # Basic Usage
        It is very simple to use, you just have to import the library and pass the dataframe in dedupeFuzzy along with a list of columns which will be used for deduplication and thershold for how strict you want the criteria to be .
        A higher threshold will be strict in matching and will give you less matches whereas a lower threshold will give you more matches .
        
        
        ### Deduplication
        
            import pandas as pd
            import dedupe_FuzzyWuzzy
            from dedupe_FuzzyWuzzy import deduplication
        
            #load dataframe
            df = pd.read_csv('messy.csv')
        
            #initiate deduplication
            df_1 = deduplication.deduplication(df,['Site name','Address'],threshold=90)
        
            #send output to csv
            df_1.to_csv('dedupeOutput.csv')
        
        # Credits
         This would have not been possible without the rapidfuzz package whose author is @maxbachmann, so kudos to you !
        
Platform: UNKNOWN
Description-Content-Type: text/markdown
