Metadata-Version: 2.1
Name: dupliCat
Version: 2.2.1
Summary: A simple package to hunt down file duplicates
Home-page: https://github.com/teddbug-S/dupliCat
Author: Divine Darkey (teddbug-S)
Author-email: teddbug47@gmail.com
Maintainer: Divine Darkey (teddbug-S)
Maintainer-email: teddbug47@gmail.com
License: MIT
Project-URL: Issues, https://github.com/teddbug-S/dupliCat/issues
Project-URL: Pull Requests, https://github.com/teddbug-S/dupliCat/pulls
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE

# DupliCat

A simple utility for finding duplicated files within a specified path.
It is intended to be a library but can also be used as a commandline tool,
it doesn't delete the duplicate files found but returns a list of junk files so that you can choose the ones to delete.

# Usage As A Library:
   - Import the [dupliCat](https://github.com/teddbug-S/dupliCat/blob/main/src/dupliCat/__init__.py) class and create an object by passing the following arguments,
       - `path`
           where the search will be made, defaults to current directory.
       - `recurse`
           boolean, set to true if you want it to recurse down to all files in the path including sub-dirs
           defaults to `False`
   
   ## Methods

   - the `generate_secure_hash` method takes a file as first argument and generates a secure-hash for it.
     Hashing algorithm is blake2b, key is the size of the file, it returns the file with secure_hash attribute
     set. File must be of type `dupliFile`.

   - `read_chunk` this method reads a default 400 bytes of data from file. It takes the file as first positional
      argument and size as second argument which defaults to 400. File must be of type `dupliFile`

   - `human_size` this is a static method that converts bytes into human-readable format.
   ```doctest
     >>> human_size(nbytes=123456)
     >>> 120.56 KB
   ```
   - `hash_chunk` static method, takes two positional arguments, `text: str` and `key: int`
      hashes text with key using blake2b.
  
   - call the `search_duplicate` method to begin the 🔍 search, search results will be stored in 
       the `duplicates` property of the class. This method is somewhat the main api of the class, it 
       does everything for you, calling other methods instead of this might remove the functionality of
       using files from `size_index` as input for generating a hash index.
       
       the `search_duplicate` method has the following optional arguments
       - `use_hash`
           find duplicates using hash table if set to True otherwise uses size_table, using the 
           hash_table is more accurate in conditions where different files have same sizes
           and quite fast since table is generated using the size table, parameter defaults to `True`.
           Using the size table is faster but does not guarantee accuracy.
       - `from_size`
          it defaults to True, this enables the `search_duplicate` method to generate a hash index
          using files from the size index. Defaults to `True`.
       
    Note: 
        Both parameters are set to True for more accurate search since search will be done using the
       `size_index` and also the `hash_index`.
        Junk files set by this method contains all duplicates with one file left over for each.
 
   - use the `analyse` method to analyse search result, this returns a named tuple of type `Analysis`.
   It contains 
       the total number of duplicate files accessed through `analysis.total_file_num`, their total size on the disk
       accessed through `analysis.total_size` and the most occurred file, accessed through `analysis.most_occurrence`.
  
  ## Properties

   - `size_index`
      You can also access the size index using the property. it is a dictionary containing files
      grouped by their sizes.
   - `hash_index`
      You can also access the hash index using this property. It is a dictionary containing files
      grouped by their secure hashes.
   -  `fetched_file`
      access all fetched files from the search path
   - `path`
           where the search will be made, defaults to current directory.
   - `recurse`
      boolean, set to true if you want it to recurse down to all files in the path including sub-dirs
      defaults to `False`
   
     
# Usage From Commandline
   Functionality removed for now.

# Contact
   - twitter: [teddbug](https://www.twitter.com/teddbug)
   - facebook: [Tedd Bug](https://www.facebook.com/tedd.bug.79/)

😄 Happy Coding!


