Metadata-Version: 2.1
Name: xkcd-scrape
Version: 0.3.0
Summary: Scrape the XKCD comic archive
License: MIT
Author: calamity
Author-email: clmty@vk.com
Requires-Python: >=3.10,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Dist: beautifulsoup4 (>=4.11.1,<5.0.0)
Requires-Dist: requests (>=2.28.2,<3.0.0)
Description-Content-Type: text/markdown

# XKCD Scrape

`xkcd-scrape` is a Python module to dump the XKCD.com archive and get comic info using BS4. Honestly, it's a very basic module with one premise - easily get information about comics.

## Examples
Basic usage:
```py
from xkcdscrape import xkcd

# Load the archive of comics into a variable
archive = xkcd.parseArchive()

# Get info about latest comic
info = xkcd.getComicInfo(archive)

# Get info about specific comic
# The comic can either be an int (200), a str ("200"|"/200/"), or a link ("https://xkcd.com/200")
info = xkcd.getComicInfo(archive, 2000)

# Get info about a random comic
# Passing the second paramenter as True makes the module only fetch comics that are present in the archive
info = xkcd.getRandomComic(archive, False)

# Dump archive to file
# Use indent=None if you want to save space or make parsing easier
xkcd.dumpToFile(archive, "dump.json", indent=None)

# Get info using archive dump. You can do the same with getRandomComic()
info = xkcd.getComicInfo("dump.json", 2000)
```

The `getComicInfo` function (also called inside of `getRandomComic`) returns a dict with following keys:
```py
# xkcd.getComicInfo(archive, 2000)
{
    'num': '2000', 
    'link': 'https://xkcd.com/2000/', 
    'name': 'xkcd Phone 2000', 
    'date': '2018-5-30', 
    'image': 'https://imgs.xkcd.com/comics/xkcd_phone_2000.png', 
    'title': 'Our retina display features hundreds of pixels per inch in the central fovea region.'
}
```
As you can see, it returns the following list of keys:
- `num` - comic number
- `link` - hyperlink to comic
- `name` - the name of the comic
- `date` - YYYY-MM-DD formatted date of when the comic was posted
- `image` - hyperlink to image used in the comic
- `title` - title (hover) text of the comic

## Archive
The [XKCD archive](https://xkcd.com/archive/) is where we get the list of comics, 
as well as their names and date of posting. This is the only place where we can get 
the date of posting (unless we go into HTTP headers, but that's a mess), so it's 
required. The module is still in the `0.Y.Z` version, so bugs will be expected, 
such as not being able to fetch the latest comic's info, if it isn't yet in 
your archive. That and many more things will be patched by the 1.0.0 release.

The archive is an dict containing various dicts with keys of `/num/`. Example:
```py
{
    ...,
    "/2000/": {
        "date": "2018-5-30", 
        "name": "xkcd Phone 2000"
    },
    ...
}
```

## Tests
Tests can be run from the project's shell after installing and activating the venv using `poetry run pytest`.

## TODO
- Rewrite docstrings, make them simpler
- Not require archive for fetching info (dateless?)
- Use Codeberg's CI/CD to push to PYPI
- Add RSS/Atom feed support to fetch latest comic (includes date inside)
- API setup script w/ Flask
