Metadata-Version: 2.1
Name: sheatless
Version: 1.0.1
Summary: A python library for extracting parts from sheetmusic pdfs
Home-page: https://gitlab.com/taktlause/sheatless
Author: The Beatless
License: UNKNOWN
Project-URL: Bug Tracker, https://gitlab.com/taktlause/sheatless/-/issues
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: GNU Affero General Public License v3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE

# sheatless - A python library for extracting parts from sheetmusic pdfs

Sheatless, a tool for The Beatless to become sheetless. Written and managed by the web-committee in the student orchestra The Beatless. Soon to be integrated in [taktlaus.no](https://taktlaus.no/).

# API

## processUploadedPdf

```python
def processUploadedPdf(pdfPath, imagesDirPath, instruments_file=None, instruments=None, use_lstm=False, tessdata_dir=None):
    ...
	return parts, instrumentsDefaultParts
```

which will be available with

```python
from sheatless import processUploadedPdf
```

Arguments description here:

| Argument         | Optional   | Description                                                                                                      |
| ---------------- | ---------- | ---------------------------------------------------------------------------------------------------------------- |
| pdfPath          |            | Full path to PDF file.                                                                                           |
| imagesDirPath    |            | Full path to output images.                                                                                      |
| instruments_file | (optional) | Full path to instruments file. Accepted formats: YAML (.yaml, .yml), JSON (.json).                               |
| instruments      | (optional) | Dictionary of instruments. Will override any provided instruments file.                                          |
|                  |            | If neither instruments_file nor instruments is provided a default instruments file will be used.                 |
| use_lstm         | (optional) | Use LSTM instead of legacy engine mode.                                                                          |
| tessdata_dir     | (optional) | Full path to tessdata directory. If not provided, whatever the environment variable `TESSDATA_DIR` will be used. |

Returns description here:

| Return                  | Description                                                                                                                       |
| ----------------------- | --------------------------------------------------------------------------------------------------------------------------------- |
| parts                   | A list of dictionaries `{ "name": "[name]", "fromPage": i, "toPage": j }` describing each part                                    |
| instrumentsDefaultParts | A dictionary `{ ..., "instrument_i": j, ... }`, where `j` is the index in the parts list for the default part for `instrument_i`. |

## predict_parts_in_pdf

```py
def predict_parts_in_pdf(
	pdf : BytesIO | bytes,
	instruments=None,
	instruments_file=None,
	instruments_file_format="yaml",
	use_lstm=False,
	tessdata_dir=None,
	):
    ...
    return parts, instrumentsDefaultParts
```

### Arguments:
- pdf                                - PDF file object
- instruments             (optional) - Dictionary of instruments. Will override any provided instruments file.
- instruments_file        (optional) - Full path to instruments file or instruments file object. Accepted extensions: .yaml, .yml, .json
- instruments_file_format (optional) - Format of instruments_file if it is a file object. Accepted formats: yaml, json
  - If neither instruments_file nor instruments is provided a default instruments file will be used.
- use_lstm                (optional) - Use LSTM instead of legacy engine mode.
- tessdata_dir            (optional) - Full path to tessdata directory. If not provided, whatever the environment variable TESSDATA_DIR will be used.

### Returns:
- parts                              - A list of dictionaries { "name": "[name]", "fromPage": i, "toPage": j } describing each part
- instrumentsDefaultParts            - A dictionary { ..., "instrument_i": j, ... }, where j is the index in the parts list for the default part for instrument_i.

## predict_parts_in_img

```py
def predict_parts_in_img(img : io.BytesIO | bytes | PIL.Image.Image, instruments, use_lstm=False, tessdata_dir=None) -> typing.Tuple[list, list]:
    ...
    return partNames, instrumentses
```

### Arguments:
- img                     - image object
- instruments             - dictionary of instruments
- use_lstm     (optional) - Use LSTM instead of legacy engine mode.
- tessdata_dir (optional) - Full path to tessdata directory. If not provided, whatever the environment variable TESSDATA_DIR will be used.

### Returns:
- partNames               - a list of part names
- instrumentses           - a list of lists of instruments for each part

# Example docker setup

Sheatless requires tesseract and poppler installed on the system to work. An example docker setup as well as integration of the library can be found in [sheatless-splitter](https://github.com/sigurdo/sheatless-splitter).


