Metadata-Version: 2.1
Name: scrapesy
Version: 1.2.0b1
Summary: Simple Python web scraper/page fetcher with cache
Home-page: https://github.com/scoopgracie/scrapesy
Author: ScoopGracie
Author-email: scoopgracie@scoopgracie.com
License: MIT
Description: # scrapesy
        [![Build Status](https://api.travis-ci.com/scoopgracie/scrapesy.svg?branch=master)](https://travis-ci.com/scoopgracie/scrapesy)
        
        Easy and Pythonic way to get and parse a Web page
        
        ## Usage
        
        To get a `Page` object, use `scrapesy.get(url)`. The `Page` object has two
        properties, `page` and `request`. `page` is a `BeautifulSoup` object.
        `request` is a Requests `Response` object.
        
        However, if you just want this for the cache (see below), and do not need or
        want Beautiful Soup to parse the pages, pass `parse=False` to get(). This will
        simply return a Requests request object, not a `Page` object. Disabling
        parsing and caching is possible, but rather useless, because parsing and
        caching are the main features of Scrapesy. If you find yourself always
        disabling both, just use Requests directly.
        
        ### Caching
        
        By default, Scrapesy implements a cache, allowing for near-instantaneous
        results on pages that have been requested previously. This cache operates
        automatically, and it operates transparently to any code that does not
        specifically interact with it. It is possible to use Scrapesy without any
        understanding of the cache.
        
        However, it is possible to disable the cache. Simply run `scrapesy.caching =
        False`. To re-enable it, use `scrapesy.caching = True`. If you simply need to
        ignore the cache for a single call, simply add `use_cache=False` to your
        `scrapesy.get()` call.
        
        To empty the cache, call `scrapesy.empty_cache()`.
        
        To remove a single page from the cache, call `scrapesy.uncache(url)`.
        
        To enable selective caching, set `scrapesy.cache_check` to a function that
        takes `url` as an input and returns `True` if the page should be cached and
        `False` otherwise.
        
        Run `demo.py` for a demonstration of the impact of the cache.
        
        ## Requirements
        
        * Beautiful Soup 4
        * Requests
        * Python 3 (it may work on 2.7, but is not tested)
        
        ## Note
        
        This project was originally called PyScrape. If you find that name used
        anywhere in this repo, please report it as an issue!
        
Keywords: scrape,web,scraper,scraping
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Description-Content-Type: text/markdown
