Metadata-Version: 2.1
Name: wikiscraper
Version: 1.0.1
Summary: Easy scraper that extracts data from Wikipedia articles thanks to its URL slug
Home-page: https://github.com/Alexandre333/wikiscraper
Author: Alexandre Meyer
Author-email: contact@alexandremeyer.fr
License: UNKNOWN
Keywords: python,web scraping,wikipedia,slug
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: Unix
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: Microsoft :: Windows
Description-Content-Type: text/markdown

# wikiscraper

Easy scraper that extracts data from Wikipedia articles thanks to its URL slug : title, images, summary, sections paragraphs, sidebar info

Developed by Alexandre MEYER - CC BY-NC-SA 2.0

Installation

```python
$ pip install wikiscraper
```

## Initialization

Import
```python
import wikiscraper as ws
```

Main request
```python
# Set the language page in Wikipedia for the query
# (ISO 639-1 & by default "en" for English)
ws.lang("fr")
```

```python
# Search and get content by the URL slug of the article
# (Exemple : https://fr.wikipedia.org/wiki/Paris)
result = ws.searchBySlug("Paris")
```
## Examples

```python
# Get article's title
result.getTitle()
```
Sidebar
```python
# Get value of the sidebar information label
result.getSideInfo("GentilÃ©")
```

Summary
```python
# Get all paragraphs of summary
print(result.getSummary())
# Get the second paragraph of summary
print(result.getSummary()[1])
# Optional : Get the x paragraphs, starting from the beginning
print(result.getSummary(2))
```
Images
```python
# Get all illustration images
img = result.getImage()
# Get a specific image thanks to its position in the page
print(img[0]) # Main image
```

Sections
```python
# Get paragraphs from a specific section thanks to the parents' header title
# All optional args : .getSection(h2Title, h3Title, h4Title)
# Exemple : https://fr.wikipedia.org/wiki/Paris#Politique_et_administration
print(result.getSection('Politique et administration', 'Statut et organisation administrative', 'Historique')[0])
```

