Metadata-Version: 2.1
Name: pdf-scout
Version: 0.0.2
Summary: automatically create bookmarks in a PDF file
Home-page: https://github.com/hueyy/pdf_scout
License: GPL-v3
Keywords: pdf,bookmark,outline
Author: Huey
Author-email: hello@huey.xyz
Requires-Python: >=3.9,<4.0
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.9
Requires-Dist: PyPDF2 (>=2.10.9,<3.0.0)
Requires-Dist: joblib (>=1.2.0,<2.0.0)
Requires-Dist: pdfplumber (>=0.7.4,<0.8.0)
Requires-Dist: rich (>=12.5.1,<13.0.0)
Requires-Dist: typer[all] (>=0.6.1,<0.7.0)
Project-URL: Documentation, https://github.com/hueyy/pdf_scout
Project-URL: Repository, https://github.com/hueyy/pdf_scout
Description-Content-Type: text/markdown

# pdf_scout

This CLI tool automatically generates PDF bookmarks (also known as an 'outline' or a 'table of contents') for computer-generated PDF documents.

```bash
cd pdf_scout
poetry install
poetry run python ./src/app.py
```

![screenshot](./assets/screenshot.png)

This project is a work in progress and will likely only generate accurate bookmarks for documents that conform to the following requirements:

* Single column of text (not multiple columns)
* Font size of header text >= font size of body text
* Header text is justified or left-aligned

## Development

This project manages its dependencies using [poetry](https://python-poetry.org) and is only supported for Python ^3.9. After installing poetry and entering the project folder, run the following to install the dependencies:

```bash
poetry install
```

To open a virtualenv in the project folder with the dependencies, run:

```bash
poetry shell
```

To run a script directly, run:

```bash
poetry run python ./src/app.py
```

### Tests

There are snapshot tests. Input PDFs are not provided at the moment, so you will have populate the `/pdf` folder manually:

```bash
poetry run pytest
poetry run pytest --snapshot-update
```
