Metadata-Version: 2.1
Name: commonregex-improved
Version: 0.0.4
Summary: Python cli tool to redact sensitive data
Home-page: https://github.com/brootware/commonregex-improved
License: MIT
Author: brootware
Author-email: brootware@outlook.com
Requires-Python: >=3.7,<4.0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: General
Requires-Dist: regex (>=2022.4.24,<2023.0.0)
Project-URL: Repository, https://github.com/brootware/commonregex-improved
Description-Content-Type: text/markdown

<br><br>

<h1 align="center">CommonRegex Improved</h1>

<p align="center">
  <a href="/LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue.svg"/></a>
  <!-- <img alt="PyPI - Downloads" src="https://pepy.tech/badge/commonregex-improved/month"> -->
   <img alt="PyPI - Downloads" src="https://pepy.tech/badge/commonregex-improved">
   <a href="https://twitter.com/brootware"><img src="https://img.shields.io/twitter/follow/brootware?style=social" alt="Twitter Follow"></a>
   <img alt="PyPI - Python Version" src="https://img.shields.io/pypi/pyversions/commonregex-improved"> <img alt="PyPI" src="https://img.shields.io/pypi/v/commonregex-improved">
   <a href="https://sonarcloud.io/summary/new_code?id=brootware_commonregex-improved"><img src="https://sonarcloud.io/api/project_badges/measure?project=brootware_commonregex-improved&metric=alert_status" alt="reliability rating"></a>
   <img alt="GitHub Workflow Status" src="https://img.shields.io/github/workflow/status/brootware/commonregex-improved/CI?label=CI&branch=main">
</p>

<p align="center">
  An improved version of commonly used regular expressions in Python
</p>

<br><br>

> Inspired by and improved upon [CommonRegex](https://github.com/madisonmay/CommonRegex)

This is a collection of commonly used regular expressions. This library provides a simple API interface to match the strings corresponding to specified patterns.

## Installation

```pip install --upgrade commonregex-improved```

## Usage

```python
import commonregex_improved as CommonRegex

text = "John, please get that article on www.linkedin.com to me by 5:00PM on Jan 9th 2012. 4:00 would be ideal, actually or 5:30 P.M. If you have any questions, You can reach me at (519)-236-2723x341 or get in touch with my associate at harold.smith@gmail.com. You can find my ip address at 127.0.0.1 or at 64.248.67.225. I also have a secret protected with md5 8a2292371ee60f8212096c06fe3335fd"

date_list = CommonRegex.dates(text)
# ['Jan 9th 2012']
time_list = CommonRegex.times(text)
# ['5:00PM', '4:00 ', '5:30 P.M.']
url_list = CommonRegex.links(text)
# ['www.linkedin.com', 'harold.smith@gmail.com']
phone_list = CommonRegex.phones_with_exts(text)  
# ['(519)-236-2723x341']
email_list = CommonRegex.emails(text)
# ['harold.smith@gmail.com']
md5_list = CommonRegex.md5_hashes(text)
# ['8a2292371ee60f8212096c06fe3335fd']
```

## ⚔️ Performance benchmark

[CommonRegex](https://github.com/madisonmay/CommonRegex) is awesome!

So why re-implement the popular original commonregex project? The API calls to each of the regular expressions are really slow.

It takes 12 seconds for a total of 2999 calls to Dates function in the original version of CommonRegex. While the improved version of CommonRegex with the same number of calls merely takes 2 seconds.

![improved](./benchmark/benchmark.png)

You can find more detailed results about [original](./benchmark/original_cregex_result.pdf) and [improved](./benchmark/cregex_improved_result.pdf) versions.

## Features / Supported Methods

* `dates(text: str)`
* `times(text: str)`
* `phones(text: str)`
* `phones_with_exts(text: str)`
* `links(text: str)`
* `emails(text: str)`
* `ipv4s(text: str)`
* `ipv6s(text: str)`
* `ips(text: str)`
* `not_known_ports(text: str)`
* `prices(text: str)`
* `hex_colors(text: str)`
* `credit_cards(text: str)`
* `visa_cards(text: str)`
* `master_cards(text: str)`
* `btc_address(text: str)`
* `street_addresses(text: str)`
* `zip_codes(text: str)`
* `po_boxes(text: str)`
* `ssn_numbers(text: str)`
* `md5_hashes(text: str)`
* `sha1_hashes(text: str)`
* `sha256_hashes(text: str)`
* `isbn13s(text: str)`
* `isbn10s(text: str)`
* `mac_addresses(text: str)`
* `iban_numbers(text: str)`
* `git_repos(text: str)`

