Metadata-Version: 2.1
Name: inverted_index_search
Version: 1.3.0
Summary: A module for creating ngrams and searching multiple phrases using inverted index searching in a document
Project-URL: Homepage, https://github.com/Affanmir/Inverted-Index-search
Project-URL: Bug Tracker, https://github.com/pypa/sampleproject/issues
Author-email: Affan Mir <affanmir95@gmail.com>
License-File: LICENSE
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.7
Description-Content-Type: text/markdown

# inverted-index-search
inverted-index-search is python library for searching up keywords or sub words in a corpus of data using inverted index lookup

## Installation

Use the package manager pip to install.

```bash
pip install inverted-index-search
```

## Usage

```python
from  inverted_index_search import search_doc



phrases = ["something in the way for us", "bussing in "]
doc = 'something in in in'
print(search_doc(doc, phrases, n_gram_level='char', doc_ngrams=[1],phrase_ngrams=[1], verbose=False))
>> {'something in the way for us': {'s': {'count': 2, 'occured': [(0, 1)]}, 'o': {'count': 2, 'occured': [(1, 2)]}, 'm': {'count': 1, 'occured': [(2, 3)]}, 'e': {'count': 2, 'occured': [(3, 4)]}, 't': {'count': 2, 'occured': [(4, 5)]}, 'h': {'count': 2, 'occured': [(5, 6)]}, 'i': {'count': 8, 'occured': [(6, 7), (10, 11), (13, 14), (16, 17)]}, 'n': {'count': 8, 'occured': [(7, 8), (11, 12), (14, 15), (17, 18)]}, 'g': {'count': 1, 'occured': [(8, 9)]}}, 'bussing in ': {'s': {'count': 2, 'occured': [(0, 1)]}, 'i': {'count': 8, 'occured': [(6, 7), (10, 11), (13, 14), (16, 17)]}, 'n': {'count': 8, 'occured': [(7, 8), (11, 12), (14, 15), (17, 18)]}, 'g': {'count': 1, 'occured': [(8, 9)]}}}


print(search_doc.__doc__)
>> """ This function creates ngrams out of the document you have based, it then creates ngrams for the phrases you have
    entered and finds the matching substrings in the document. You can specify what ngram for the document you are looking for by adding the integer to both phrase_ngrams and
    doc_ngrams i.e to make 1,2,7 ngrams for the document and 3,4 ngrams for the phrases. Simply pass doc_ngrams=[1,7,2] and phrase_ngrams=[3,4]. There are two level ngram either words or charcter which you can change by changing the n_gram_level to either 'char' or 'word'. To remove a specific n_gram while processing simply add to the remove_gram list i.e remove_gram =['apple'] will make sure that any ngrams containing 'apple' wont be used for searching. To turn on logging setting verbose to True
    
    Default value for document ngram is 1,2,3,4,5 and default for phrase is the range from 1 till the length of the largest phrase split by space.
    """


```

## Contributing

Pull requests are welcome. For major changes, please open an issue first
to discuss what you would like to change.

Please make sure to update tests as appropriate.

## Github

[Affan](https://github.com/Affanmir)
