Metadata-Version: 2.1
Name: scrapetools
Version: 1.0.2
Summary: A collection of tools to aid in web scraping.
Project-URL: Homepage, https://github.com/matt-manes/scrapetools
Project-URL: Documentation, https://github.com/matt-manes/scrapetools/tree/main/docs
Project-URL: Source code, https://github.com/matt-manes/scrapetools/tree/main/src/scrapetools
Author: Matt Manes
License-File: LICENSE.txt
Keywords: email,html,scrape,scraping,web,webscraping
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.10
Requires-Dist: bs4~=0.0.1
Requires-Dist: phonenumbers~=8.12.57
Requires-Dist: pytest~=7.2.1
Description-Content-Type: text/markdown

# Scrapetools
A collection of tools to aid in web scraping.<br>
Install using:
<pre>pip install scrapetools</pre>
Scrapetools contains three functions (scrape_emails, scrape_phone_numbers, scrape_inputs)
and one class (LinkScraper).
<br>
## Basic usage
<pre>
import scrapetools
import requests

url = 'https://somewebsite.com'
source = requests.get(url).text

emails = scrapetools.scrape_emails(source)

phoneNumbers = scrapetools.scrape_phone_numbers(source)

scraper = scrapetools.LinkScraper(source, url)
scraper.scrape_page()
# links can be accessed and filtered via the get_links() function
same_site_links = scraper.get_links(same_site_only=True)
same_site_image_links = scraper.get_links(link_type='img', same_site_only=True)
external_image_links = scraper.get_links(link_type='img', excluded_links=same_site_image_links)

# scrape_inputs() returns a tuple of BeautifulSoup Tag elements for various user input elements
forms, inputs, buttons, selects, text_areas = scrapetools.scrape_inputs(source)
</pre>
