Metadata-Version: 2.1
Name: pdf-scout
Version: 0.0.5
Summary: automatically create bookmarks in a PDF file
Home-page: https://github.com/hueyy/pdf_scout
License: GPL-v3
Keywords: pdf,bookmark,outline
Author: Huey
Author-email: hello@huey.xyz
Requires-Python: >=3.9,<4.0
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.9
Requires-Dist: PyPDF2 (>=2.10.9,<3.0.0)
Requires-Dist: joblib (>=1.2.0,<2.0.0)
Requires-Dist: pdfplumber (>=0.7.4,<0.8.0)
Requires-Dist: rich (>=12.5.1,<13.0.0)
Requires-Dist: typer[all] (>=0.6.1,<0.7.0)
Project-URL: Documentation, https://github.com/hueyy/pdf_scout
Project-URL: Repository, https://github.com/hueyy/pdf_scout
Description-Content-Type: text/markdown

# pdf_scout


![PyPI](https://img.shields.io/pypi/v/pdf_scout)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/pdf_scout)
![PyPI - License](https://img.shields.io/pypi/l/pdf_scout)

This CLI tool automatically generates PDF bookmarks (also known as an 'outline' or a 'table of contents') for computer-generated PDF documents.

You can install it globally via pip:

```
pip install --user pdf_scout
pdf_scout ./my_document.pdf

pip uninstall pdf_scout
```

![screenshot](./assets/screenshot.png)

This project is a work in progress and will likely only generate suitable bookmarks for documents that conform to the following requirements:

* Single column of text (not multiple columns)
* Font size of header text > font size of body text
* Header text is justified or left-aligned
* Paragraph spacing for headers > body text paragraph spacing
* Consistent left margins on every page

## Supported document types

`pdf_scout` has been tested on and expressly supports the following classes of documents:

- Singapore State Court and Supreme Court Judgments (unreported)
- Singapore Law Reports

It may support other types of documents as well. If a particular class of document isn't supported or does not work well, please open an issue and I will consider adding support for it.

## Development

This project manages its dependencies using [poetry](https://python-poetry.org) and is only supported for Python ^3.9. After installing poetry and entering the project folder, run the following to install the dependencies:

```bash
poetry install
```

To open a virtualenv in the project folder with the dependencies, run:

```bash
poetry shell
```

To run a script directly, run:

```bash
poetry run python ./pdf_scout/app.py <INPUT_FILE_PATH>
```

### Tests

There are snapshot tests. Input PDFs are not provided at the moment, so you will have to populate the `/pdf` folder manually using the relevant sources (you may want to consider using [Clerkent](https://clerkent.huey.xyz) to download the unreported versions of judgments):

```bash
poetry run pytest
poetry run pytest --snapshot-update
```

### Static type-checking

```bash
poetry run mypy pdf_scout/app.py
```
