Metadata-Version: 2.1
Name: purifier
Version: 0.2.8
Summary: A simple scraping library.
Author: Gleb Akhmerov
Author-email: nontrivial-analysis@proton.me
Requires-Python: >=3.9,<4.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.9
Requires-Dist: beautifulsoup4 (>=4.11.1,<5.0.0)
Requires-Dist: cloudscraper (>=1.2.60,<2.0.0)
Requires-Dist: jq (>=1.2.2,<2.0.0)
Requires-Dist: jsonfinder (>=0.4.2,<0.5.0)
Requires-Dist: lxml (>=4.9.1,<5.0.0)
Requires-Dist: parsy (>=1.4.0,<2.0.0)
Requires-Dist: requests (>=2.28.1,<3.0.0)
Project-URL: Homepage, https://github.com/gleb-akhmerov/purifier
Description-Content-Type: text/markdown

# Purifier

A simple scraping library.

It allows you to easily create simple and concise scrapers, even when the input
is quite messy.


## Example usage

Extract titles and URLs of articles from Hacker News:

```python
from purifier import request, html, xpath, maps, fields, one

scraper = (
    request()
    | html()
    | xpath('//a[@class="titlelink"]')
    | maps(
        fields(
            title=xpath("text()") | one(),
            url=xpath("@href") | one(),
        )
    )
)

result = scraper.scrape("https://news.ycombinator.com")
```
```python
result == [
     {
         "title": "Why Is the Web So Monotonous? Google",
         "url": "https://reasonablypolymorphic.com/blog/monotonous-web/index.html",
     },
     {
         "title": "Old jokes",
         "url": "https://dynomight.net/old-jokes/",
     },
     ...
]
```


## Tutorial

See [docs/Tutorial.md](https://github.com/gleb-akhmerov/purifier/blob/main/docs/Tutorial.md)

