Metadata-Version: 1.1
Name: sitecrawl
Version: 1.0.5
Summary: Simple Python3 module to crawl a website and extract URLs
Home-page: https://github.com/gabfl/sitecrawl
Author: Gabriel Bordeaux
Author-email: pypi@gab.lc
License: MIT
Description: sitecrawl
        =========
        
        |Pypi| |Build Status| |codecov| |MIT licensed|
        
        Simple Python module to crawl a website and extract URLs.
        
        Installation
        ------------
        
        Using pip:
        
        .. code:: bash
        
           pip3 install sitecrawl
        
           sitecrawl --help
        
        Or build from sources:
        
        .. code:: bash
        
           # Clone project
           git clone https://github.com/gabfl/sitecrawl && cd sitecrawl
        
           # Installation
           pip3 install .
        
        Usage
        -----
        
        CLI
        ~~~
        
        .. code:: bash
        
           sitecrawl --url https://www.yahoo.com/ --depth 2 --max 4 --verbose
        
        ->
        
        ::
        
           * Found 4 internal URLs
             https://www.yahoo.com
             https://www.yahoo.com/entertainment
             https://www.yahoo.com/lifestyle
             https://www.yahoo.com/plus
        
           * Found 5 external URLs
             https://mail.yahoo.com/
             https://news.yahoo.com/
             https://finance.yahoo.com/
             https://sports.yahoo.com/
             https://shopping.yahoo.com/
        
           * Skipped 0 URLs
        
        As a module
        ~~~~~~~~~~~
        
        Basic example:
        
        .. code:: py
        
           from sitecrawl import crawl
        
           crawl.base_url = 'https://www.yahoo.com'
           crawl.deep_crawl(depth=2)
        
           print('Internal URLs:', crawl.get_internal_urls())
           print('External URLs:', crawl.get_external_urls())
           print('Skipped URLs:', crawl.get_skipped_urls())
        
        A more detailed example is available in
        `example.py <https://github.com/gabfl/sitecrawl/blob/main/example.py>`__.
        
        .. |Pypi| image:: https://img.shields.io/pypi/v/sitecrawl.svg
           :target: https://pypi.org/project/sitecrawl
        .. |Build Status| image:: https://github.com/gabfl/sitecrawl/actions/workflows/ci.yml/badge.svg?branch=main
           :target: https://github.com/gabfl/sitecrawl/actions
        .. |codecov| image:: https://codecov.io/gh/gabfl/sitecrawl/branch/main/graph/badge.svg
           :target: https://codecov.io/gh/gabfl/sitecrawl
        .. |MIT licensed| image:: https://img.shields.io/badge/license-MIT-green.svg
           :target: https://raw.githubusercontent.com/gabfl/sitecrawl/main/LICENSE
        
Platform: UNKNOWN
Classifier: Topic :: Internet
Classifier: Topic :: Internet :: WWW/HTTP :: Site Management :: Link Checking
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS
Classifier: Operating System :: POSIX :: Linux
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python
Classifier: Development Status :: 4 - Beta
