Metadata-Version: 2.1
Name: dbnomics-solr
Version: 1.1.7
Summary: Index DBnomics data with Apache Solr for full-text and faceted search
Home-page: https://git.nomics.world/dbnomics/dbnomics-solr
License: AGPLv3+
Author: Christophe Benz
Author-email: christophe.benz@nomics.world
Requires-Python: >=3.7,<4.0
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Requires-Dist: daiquiri (>=3.0.1,<4.0.0)
Requires-Dist: dbnomics-data-model (>=0.13.28,<0.14.0)
Requires-Dist: dirhash (>=0.2.1,<0.3.0)
Requires-Dist: environs (>=9.5.0,<10.0.0)
Requires-Dist: humanfriendly (>=10.0,<11.0)
Requires-Dist: orjson (>=3.6.7,<4.0.0)
Requires-Dist: pysolr (>=3.9.0,<4.0.0)
Requires-Dist: python-slugify (>=6.1.1,<7.0.0)
Requires-Dist: requests (>=2.27.1,<3.0.0)
Requires-Dist: solrq (>=1.1.1,<2.0.0)
Requires-Dist: tenacity (>=8.0.1,<9.0.0)
Requires-Dist: typer (>=0.4.0,<0.5.0)
Project-URL: Repository, https://git.nomics.world/dbnomics/dbnomics-solr
Description-Content-Type: text/markdown

# DBnomics Solr

Index DBnomics data into Apache Solr for full-text and faceted search.

Requirements:

- a running instance of [Apache Solr](http://lucene.apache.org/solr/); at the time this documentation is written, we use the version 7.3.

See [dbnomics-docker](https://git.nomics.world/dbnomics/dbnomics-docker) to run a local DBnomics instance with Docker that includes a service for Apache Solr.

## Configuration

Environment variables:

- `DEBUG_PYSOLR`: display pysolr DEBUG logging messages (cf <https://github.com/django-haystack/pysolr>)

## Index a provider

Replace `wto` by the real provider slug in the following command:

```bash
dbnomics-solr index-provider /path/to/wto-json-data
```

### Full mode vs incremental mode

When data is stored in a regular directory, the script always indexes all datasets and series of a provider. This is called _full mode_.

When data is stored in a Git repository, the script runs by default in _incremental mode_: it indexes only the datasets modified since the last indexation.

It is possible to force the _full mode_ with the `--full` option.

### Bare repositories

The script has an option `--bare-repo-fallback` which tries to add `.git` at the end of the storage dir name, if not found.

## Remove all data from a provider

To remove all the documents related to a provider (`type:provider`, `type:dataset` and `type:series`):

```bash
dbnomics-solr --debug delete-provider --code <provider_code>
dbnomics-solr --debug delete-provider --slug <provider_slug>

# Examples:
dbnomics-solr --debug delete-provider --code WTO
dbnomics-solr --debug delete-provider --slug wto
```

