Metadata-Version: 2.1
Name: scrab
Version: 0.0.3
Summary: Fast and easy to use scraper for the content-centered web pages, e.g. blog posts, news, etc.
Home-page: https://github.com/gindex/scrab
Author: Yevgen Pikus
Author-email: yevgen.pikus@gmail.com
License: MIT
Description: # scrab - Fuzzy content scraper
        
        [![Python package](https://github.com/gindex/scrab/workflows/Python%20package/badge.svg?branch=master)](https://github.com/gindex/scrab/actions)
        ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/scrab)
        [![GitHub Release](https://img.shields.io/github/v/release/gindex/scrab.svg)](https://github.com/gindex/scrab/releases) 
        [![GitHub Release](https://img.shields.io/pypi/v/scrab.svg)](https://pypi.org/project/scrab) 
        [![License: MIT](https://img.shields.io/badge/License-MIT-brightgreen.svg)](https://opensource.org/licenses/MIT)
        
        
        Fast and easy to use content scraper for topic-centred web pages, e.g. blog posts, news and wikis.    
        
        The tool uses heuristics to extract main content and ignores surrounding noise. No processing rules. No XPath. No configuration.
        
        ### Installing
        
        ```shell script
        pip install scrab
        ```
        
        ### Usage
        ```shell script
        scrab https://blog.post
        ``` 
        
        Store extracted content to a file:
        
        ```shell script
        scrab https://blog.post > content.txt
        ``` 
        
        ### ToDo List
        - [ ] Add support for lists
        - [ ] Add support for scripts 
        - [ ] Add support for markdown output format
        - [ ] Download and save referenced images
        - [ ] Extract and embed links
         
        ### Development
        ```shell script
        # Lint with flake8
        flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
        flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
        
        # Check with mypy
        mypy ./scrab
        mypy ./tests
        
        # Run tests
        pytest
        ``` 
        Publish to PyPI:
        ```shell script
        rm -rf dist/*
        python setup.py sdist bdist_wheel
        twine upload dist/*
        ```
         
        ### License
        This project is licensed under the [MIT License](README.md).
        
        
Keywords: scrab scraper crawler extractor converter web content html text
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: End Users/Desktop
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.8
Classifier: Topic :: Utilities
Classifier: Topic :: Text Processing
Requires-Python: >=3.8
Description-Content-Type: text/markdown
