[![CI](https://github.com/rehanhaider/pyhtmlproofer/actions/workflows/ci.yml/badge.svg)](https://github.com/rehanhaider/pyhtmlproofer/actions)
[![PyPI Version](https://img.shields.io/pypi/v/pyhtmlproofer?color=blue)](https://pypi.org/project/pyhtmlproofer/)
![License](https://img.shields.io/github/license/rehanhaider/pyhtmlproofer?color=blue)

# pyHTMLProofer

Check for website and static HTML pages for link rot.


## Features

pyHTMLProofer can be used on
1. Static HTML pages (typically generated by an SSG). You can specify either files or directories to be checked.
2. Webpages, you can specify a URL/link to be checked.


pyHTMLProofer at the moment does the following:

1. Checks for broken internal links in HTML files
2. Checks if external links in HTML or website link are valid
3. Check for scripts / stylesheets in HTML files
4. Check for images in HTML files

You can read more details below in [What's Tested?](#whats-tested) section.

### Roadmap
The follower features are under development:

1. Check for images and alt-text in HTML files
2. Check Favicons
3. Check optimal SEO meta tags
4. Caching results
5. Config file

## Installation
Install pyHTMLProofer with pip:
```
pip install pyhtmlproofer
```

## What's tested?

You can configure pyHTMLProofer to check:

- a file
- a directory or list of directories
- a URL / Link


### Links / Hyperlinks

`a`, `link` elements: PyHTMLProofer checks -

- If the internal links are valid
- If the internal references (`#in-page-links`) are valid
- If the external links are valid


### Images

`img` elements: PyHTMLProofer checks -

- if the internal image references are valid
- if the external image references are valid


### Scripts

`script` elements: PyHTMLProofer checks -
- If the internal script references are valid
- If the external script references are reachable



## Usage
To check a file:
```python
import pyHtmlProofer
file = "path/to/file1.html"
pyHtmlProofer.file(file).check()
```

To check a directories:
```python
import pyHtmlProofer
directory_paths = ["path/to/1/file.html", "path/to/2/file.html"]
pyHtmlProofer.directories(directory_paths).check()
```

To validate URL(s):
```python
import pyHtmlProofer
links = ["https://example.com", "https://cloudbytes.dev"]
pyHtmlProofer.links(links).check()
```

## Available Config Options

```python
PROOFER_DEFAULTS = {
    "assume_extension": ".html",
    "directory_index_file": "index.html",
    "disable_external": False,
    "ignore_files": [],
    "ignore_urls": [],
    "enforce_https": True,
    "extensions": [".html"],
    "log_level": "ERROR",
    "report_to_file": True,
    "report_filename": "proofer_report",
}
```

You can override the default configuration options by passing a dictionary of options.

```python
import pyHtmlProofer

options = {"log_level": "ERROR", "disable_external": True}
directory_paths = ["path/to/1/file.html", "path/to/2/file.html"]

pyHtmlProofer.directories(directory_paths, , options=options).check()
```


## Credits

The inspiration was by Ruby based [HTMLProofer](https://github.com/gjtorikian/html-proofer) and lack of Python based alternatives. Although, [pyHTMLProofer](https://github.com/rehanhaider/pyhtmlproofer) is not a Python rewrite, instead it focuses on solving problems that I encountered while maintaining [CloudBytes/Dev>](https://cloudbytes.dev) website.
