Metadata-Version: 2.1
Name: pyHtmlProofer
Version: 0.5.1.alpha
Summary: pyHtmlProofer - A tool for validating internal & external links in HTML files / Websites
License: MIT
Author-email: Rehan Haider <email@rehanhaider.com>
Requires-Python: >=3.8
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Internet :: WWW/HTTP :: Site Management
Classifier: Topic :: Internet :: WWW/HTTP :: Site Management :: Link Checking
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Topic :: Software Development :: Testing
Project-URL: Funding, https://github.com/sponsors/rehanhaider
Project-URL: Homepage, https://github.com/rehanhaider/pyhtmlproofer
Project-URL: Source Code, https://github.com/rehanhaider/pyhtmlproofer
Description-Content-Type: text/markdown

[![CI](https://github.com/rehanhaider/pyhtmlproofer/actions/workflows/ci.yml/badge.svg)](https://github.com/rehanhaider/pyhtmlproofer/actions)
[![PyPI Version](https://img.shields.io/pypi/v/pyhtmlproofer?color=blue)](https://pypi.org/project/pyhtmlproofer/)
![License](https://img.shields.io/github/license/rehanhaider/pyhtmlproofer?color=blue)

# pyHTMLProofer

Check for website and static HTML pages for link rot.


## Features

**pyHTMLProofer can be used on**
1. **Static HTML pages** (typically generated by an SSG). You can specify either files or directories to be checked.
2. Webpages, you can specify a URL/link to be checked.


**pyHTMLProofer at the moment does the following**:

1. Checks for broken internal links in HTML files
2. Checks if external links in HTML or website link are valid
3. Check for scripts / stylesheets in HTML files
4. Check for images in HTML files

You can read more details below in [What's Tested?](#whats-tested) section.

### Roadmap
**The follower features are under development**:

1. Check for images and alt-text in HTML files
2. Check Favicons
3. Check optimal SEO meta tags
4. Caching results
5. Config file

## Installation
**Install pyHTMLProofer with pip**:
```
pip install pyhtmlproofer
```

## What's tested?

**You can configure pyHTMLProofer to check**:

- a file
- a directory or list of directories
- a URL / Link


### Links / Hyperlinks

`a`, `link` **elements: PyHTMLProofer checks**-

- If the internal links are valid
- If the internal references (`#in-page-links`) are valid
- If the external links are valid


### Images

`img` **elements: PyHTMLProofer checks** -

- if the internal image references are valid
- if the external image references are valid


### Scripts

`script` **elements: PyHTMLProofer checks** -
- If the internal script references are valid
- If the external script references are reachable



## Usage
**a) To check a file**:
```python
import pyHtmlProofer
file = "path/to/file1.html"
pyHtmlProofer.file(file).check()
```

**b) To check a directories**:
```python
import pyHtmlProofer
directory_paths = ["path/to/1/file.html", "path/to/2/file.html"]
pyHtmlProofer.directories(directory_paths).check()
```

**c) To validate URL(s):**
```python
import pyHtmlProofer
links = ["https://example.com", "https://cloudbytes.dev"]
pyHtmlProofer.links(links).check()
```

## Available Config Options

```python
PROOFER_DEFAULTS = {
    "assume_extension": ".html",
    "directory_index_file": "index.html",
    "disable_external": False,
    "ignore_files": [],
    "ignore_urls": [],
    "enforce_https": True,
    "extensions": [".html"],
    "log_level": "ERROR",
    "report_to_file": True,
    "report_filename": "proofer_report",
}
```

You can override the default configuration options by passing a dictionary of options.

```python
import pyHtmlProofer

options = {"log_level": "ERROR", "disable_external": True}
directory_paths = ["path/to/1/file.html", "path/to/2/file.html"]

pyHtmlProofer.directories(directory_paths, , options=options).check()
```


## Credits

The inspiration was by Ruby based [HTMLProofer](https://github.com/gjtorikian/html-proofer) and lack of Python based alternatives. Although, [pyHTMLProofer](https://github.com/rehanhaider/pyhtmlproofer) is not a Python rewrite, instead it focuses on solving problems that I encountered while maintaining [CloudBytes/Dev>](https://cloudbytes.dev) website.

