Metadata-Version: 2.1
Name: scrapers_for_journalists
Version: 0.1.1
Summary: Scrapers that helps journalists at Kristeligt Dagblad
Author: MadsLang
Author-email: lang@k.dk
Requires-Python: >=3.10,<4.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Dist: bs4 (>=0.0.2,<0.0.3)
Requires-Dist: lxml (>=5.2.2,<6.0.0)
Requires-Dist: openpyxl (>=3.1.5,<4.0.0)
Requires-Dist: pandas (>=2.1.2,<2.2)
Requires-Dist: requests (>=2.24.0)
Requires-Dist: tqdm (>=4.66.4,<5.0.0)
Description-Content-Type: text/markdown

# scrapers-for-journalists
Scraper(s) to help the journalists retrieve data or monitor sites for potential leads for stories.

## Using the scrapers

```
pip install scrapers_for_journalists==0.1.0
```

And then import a scraper, e.g. `from domstoldk.retrive import DomStolScrape`

Every file in `utils/`can be imported in your scrapers, as it is added as a package in pyproject.toml. For example, you can import the BaseScraper with generic utilities like: `from base import BaseScraper`.

## Description of current scrapers

### domstol.dk

This scrapers retrieves information about current court cases ("retslister") in Danish "byretter" (Currently, Højesteret etc. are not included). Civil cases and tvangsauktioner are filtered away. Relevance of the cases are estimated based on keywords and "gerningskoder" (types of crimes) from the Danish Police.

To run it manually, use:
```
poetry run python domstol-dk/retrieve.py --outfile test.xlsx
```

