Metadata-Version: 2.1
Name: filehole
Version: 0.0.4
Summary: Python library to find missing files in a scheduled delivery.
Author-email: Virginie Grosboillot <vestalisvirginis@gmail.com>
Project-URL: Homepage, https://github.com/vestalisvirginis/filehole
Project-URL: Bug Tracker, https://github.com/vestalisvirginis/filehole/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE

# filehole
Python library to find missing files in a scheduled delivery.


Simple and quick solution to help finding missing files in a scheduled delivery, particularly when dealing with large amount of files or a long history of file delivery.


## Dependencies
- Python 3.8.9 or higher
- Numpy 1.23.0
- Holidays 0.14.2


## Install

The latest stable version can always be installed or updated via pip:
```python
$ pip install filehole
```


## Usage

```python
filehole(
    path_to_files: str,
    file_system: Globable,
    date_pattern: str,
    date_format: str,
    country: str,
    subdivision: str = None,
    start_date: str = f"{date.today().year}-01-01",
    end_date: str = date.today().strftime("%Y-%m-%d"),
    week_schedule: str = "1111100",
    frequency: str = "D",
    repetition: int = 1,
    position: int = 1,
)
```


### Parameters:

- `path_to_files` : Wild card enabled string to search for files  
- `file_system` : Modules that have a `glob` function such as `glob` in a local environment or `adls` in a cloud environment.
- `date_pattern` : Regular expression reflecting the pattern in which the date is written in files or directories.
- `date_format` : Standard date format of the date written in files or directories.
- `country` : Country name or abbreviation for the selection of the holidays calendar. For the exhaustive list of available holidays calendars, please refer to the documentation of the `holidays` python library (https://pypi.org/project/holidays/).
- `subdivision` : Province, state, ... for the selection of the holidays calendar. The available option can be found in the documentation of the `holidays` python library (https://pypi.org/project/holidays/).
-  `start_date` : Start of the search period. Format: `'%Y-%m-%d'`. Default is set to the first day of the current year.
-  `end_date` : End of the search period. Format: `'%Y-%m-%d'`. Default is set to the current date.
-  `week_schedule`: String of 7 digits of 0 and 1. 1 represents a working day and 0 a non-working day. Week starts on Monday. By default, the working week is set from Monday to Friday included -> `'1111100'`.
-  `frequency` : Takes `'D'` for daily delivery, `'W'` for weekly delivery and `'M'` for monthly delivery.
-  `repetition` : Default value: `1`. Used only for weekly and monthly file delivery. e.g.: `repetition=1` -> every week/month, `repetition=2` -> every two weeks/months...
-  `position` : Takes `1` for first business day of the month or `-1` for last business day of the month.


### Description:

Retrieve list of files from a given location.  
Extract dates from filenames.  
Create a calendar of holidays.  
Create a set of expected dates and compare them to the extracted dates.  
Return a set of missing dates.


### Example: 
- Daily file delivery for the month of July according to the french holiday calendar, assuming that files from the 12 and 13 of July are missing:
```python
> filehole(
        path_to_files="my_file_path/*.txt",  
        file_system=glob,
        date_pattern=r"[0-9]{8}",
        date_format="%Y%m%d",
        country="FR",
        start_date="2022-07-01",
        end_date="2022-07-31",
        week_schedule="1111100",
        frequency="D",
    )
> {datetime.date(2022, 7, 12), datetime.date(2022, 7, 13)}
```


## Limitations
- All files are expected to be at the same level.
- The files or the directory containing the files should contain a date in their name.
- Current version works only with daily, weekly and monthly file delivery.


## License
This project is under the MIT License.
