Metadata-Version: 2.1
Name: django-native-search
Version: 0.5.2
Summary: A simple search engine using native django database backend.
Home-page: https://github.com/kmierzeje/django-native-search
Author: Kamil Mierzejewski
License: UNKNOWN
Description: # django-native-search
        
        django-native-search implements basic full-text search engine for Django models.
        
        The engine itself uses Django ORM to manage its index, so no additional backend is needed for searching to work. Just create a model for index, run `makemigrations` and `migrate` and you are ready to feed it with data and search.
        
        ## Installation
        
        Install the package from PyPi:
        ```
        pip install django-native-search
        ```
        
        The package will be installed with all its dependencies including `django-expression-index`.
        
        ## Setup
        Setting up the search in basic configuration is quite simple.
        ### 1. Register the app
        Add `django_native_search` to `INSTALLED_APPS` in your settings:
        
        ```python
        INSTALLED_APPS = [
            ...
            'django_native_search.apps.DjangoNativeSearch',
            ...
        ]
        ```
        ### 2. Define your Index Model
        Create a new app or in existing app, in your `models.py`, define an index model. In this example we are creating a simple index for `books.Book` model:
        ```python
        from django_native_search.models import IndexEntry
        
        class BookIndexEntry(IndexEntry):
            object = models.OneToOneField('books.Book', on_delete=models.CASCADE)
            search_template='search_index/book.txt'
        ```
        The `object` field defines a relation to a model which is being indexed. The engine uses `search_template` to render the text with `object` variable in template context. By default the rendered text is tokenized with `str.split`. You can change this behavior by overriding `tokenize` class method in your index model.
        ```python
        import re
        
        class BookIndexEntry(IndexEntry):
            ...
            @classmethod
            def tokenize(self, text):
                return re.findall(r"[^\W_]+(['_]?[^\W_]+)*", text)
        ```
        #### Index for multiple models
        It is also possible to create index for multiple models by using model inheritance. Create a single concrete descendant model of `IndexEntry` with multiple descendants for each indexed model. 
        
        You can add some common fields to this model to be used for filtering the entries, but do not add `objects` field. Then create descendants of your IndexEntry model. Each of the derived classes should have `object` field which points to a model to be indexed and a `search_template`.
        
        I would advise to put some additional fields to your root index model, to be able to filter entries of any kind or display the results without additional query for descendant models. You can fill the fields with data by overriding your `save` method in your index model.
        
        It should also be possible to use `GenericForeignKey` to define the `object` field, but I haven't tried it.
        
        #### Multiple indexes
        Each direct descendant of `IndexEntry` is a separate index, so you can have multiple independent indexes in your site.
        ### 3. Prepare the database
        Run the well known commands:
        ```
        manage.py makemigrations
        manage.py migrate
        ```
        The index was tested with `sqlite` and `PostreSQL`.
        ## Usage
        Usually you use your index to do full-text seach within your data. Just remember to fill it with data first.
        ### Feeding the index with data
        The only thing you need to do is to create your `IndexEntry` descendant model instance and save it. 
        ```python
        from book_index.models import BookIndexEntry
        from books.models import Book
        
        for book in Book.objects.all():
            BookIndexEntry(object=book).save()
        ```
        
        There is a convenient shortcut for indexing querysets:
        ```python
        from book_index.models import BookIndexEntry
        BookIndexEntry.objects.rebuild()
        ```
        You can override `get_index_queryset` method in your class to do `select_related` or `filter` or anything you need, before passing the queryset for indexing. 
        
        You can call the `rebuild` method on your index model root class manager, to rebuild all descendant index models.
        
        Probably you would like to create you own management command to run the indexing, but actually you would not use it...
        
        #### Runtime index updates
        The indexing should be fast enough to be executed in runtime on every save of the indexed model. Just connect a handler to `post_save` signal:
        ```python
        from django.db.models.signals import post_save
        
        class BookIndexEntry(IndexEntry):
            ...
            @classmethod
            def update_index(cls, instance, **kwargs):
                cls.objects.refresh([instance])
                
        post_save.connect(BookIndexEntry.update_index, sender=Book)
        ```
        Now your index will be always up-to-date.
        
        ### Searching
        You can search the index simply by calling the manager's `search` method:
        ```python
        qs = BookIndexEntry.objects.search('Monty Python')
        ```
        This will return a `QuerySet` of `BookIndexEntry` which contain both "Monty" and "Python" case sensitively. If you want your search to be case-insensitive, then provide the query in lowercase:
        ```python
        qs = BookIndexEntry.objects.search('circus')
        ```
        You can filter the search results, just as any other `QuerySet`:
        ```python
        qs = BookIndexEntry.objects.search('circus').filter(object__release_date__year__gt=1970)
        ```
        By default search returns matches only for whole words. If there is a single keyword in a query, the engine does a substring search, so search results may contain documents with words matching the keyword or containing it.
        
        For example searching for "yth" may return documents containing "python", "pythonic", "myth", "demythologization".
        
        Substring search works fine in `sqlite`. In `PostgreSQL` there is a problem with using the db index, so the searching might be too slow.
        ### Search form
        There is `SearchFormMixin` available to easily to create your search view:
        ```python
        from django.views.generic.base import TemplateView
        from book_index.models import BookIndexEntry
        from django_native_search.forms import SearchFormMixin, searchform_factory
        
        class SearchView(SearchFormMixin, TemplateView):
            template_name = "books_index/search.html"
            form_class = searchform_factory(BookIndexEntry)
        ```
        The `searchform_factory` function will use all fields with `db_index = True` in `BookIndexEntry` to create `MultipleChoiceField` in your form. The fields can be used to filter the results. Each filtering field in your form will contain all possible values of the field in the database.
        
        ### Search template
        The templated referred by `template_name` is rendered with `form` containing the form instance and `results` containing the queryset of search results if form is valid. 
        ```django
        {% block content %}
            <h2>Search</h2>
            <form method="get" action=".">
                <table>
                    {{ form }}
                    <tr>
                        <td>&nbsp;</td>
                        <td>
                            <input type="submit" value="Search" class="btn"/>
                        </td>
                    </tr>
                </table>
            </form>
            {% if form.is_valid %}
                <br/>
                <h3>Found {{ results.count }} results</h3>
                <ul>
                    {% for result in results %}
                        <li class = "search-result">
                            <ul>
                                <li class="result-link">
                                    <a href="{{result.object.get_absolute_url}}">{{ result.object.title }}</a>
                                </li>
                                <li class="result-excerpt'>
                                    {{result.excerpt}}
                                </li>
                            </ul>
                        </li>
                    {% endfor %}
                </ul>
            {% endif %}
        {% endblock %}
        ```
        The `excerpt` member of index entry instance returns a fragment of the indexed document with occurrences of search keywords hihghted with `<em>`.
        ### Settings
        There are serveral settings to tweak the search engine.
        #### `SEARCH_MIN_SUBSTR_LENGTH`
        Default : `2`
        
        Minimum number of characters in keyword to run substring search.
        #### `SEARCH_MAX_SUBTSTR_COUNT_IN_QUERY`
        Default : `300`
        
        Maximum number of indexed words containing the substring to run substring search.
        #### `SEARCH_MAX_EXCERPT_FRAGMENTS`
        Default : `5`
        
        Maximum number of fragments containing keywords to be returned in excerpt.
        #### `SEARCH_EXCERPT_FRAGMENT_START_OFFSET`
        Default : `-2`
        
        Offset of excerpt fragment start.
        #### `SEARCH_EXCERPT_FRAGMENT_END_OFFSET`
        Default : `5`
        
        Offset of excerpt fragment end.
        #### `SEARCH_MAX_RANKING_KEYWORDS_COUNT`
        Default : `3`
        
        Maximum number of keywords to be used for ranking the results. If the query contains more keywords, only the first ones will be used to calculate the ranking of results. 
        ### Search API
        To be described...
        
        Look into the code to check what you can do with it.
        ### Performance
        Despite the naive design, the index performs surpsisingly well, even with quite large datasets. It can search through 100k documents containing 10M words in a fraction of a second.
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Web Environment
Classifier: Framework :: Django
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.7
Description-Content-Type: text/markdown
