# pie-chart-ocr
A tool to extract tabular data from pie charts, developed as a component of the CryptoSearchTools toolkit.

Note: The original repository was moved to https://git.ehtec.co/research/pie-chart-ocr.
https://github.com/ehtec/pie-chart-ocr is a mirror.

# Installation

### Install via PyPi

You can install all tagged versions of `piechartocr` from PyPi:

```commandline
python3 -m pip install --upgrade piechartocr
```

Note: You cannot run tests and examples from the PyPi installation. The required
files need to be downloaded from Gitlab.

### Build from source

Install Boost and Tesseract:

```commandline
sudo apt install libboost-system-dev tesseract-ocr build-essential git
```

Clone this repository including submodules:

```commandline
git clone --recursive https://github.com/ehtec/pie-chart-ocr.git
cd pie-chart-ocr
```

Install Python requirements:

```commandline
python3 -m pip install -r requirements.txt
```

Compile libraries:

```commandline
python3 setup.py build_ext
```

Create temporary directories:
```commandline
mkdir temp
mkdir temp1
mkdir temp2
```

Unpack test charts:

```commandline
unzip data/charts_steph.zip -d data
unzip data/charts_steph_upsampled.zip -d data
unzip data/generated_pie_charts_legend.zip -d data
unzip data/generated_pie_charts_without_legend.zip -d data
```

# Usage

Run unit tests:

```commandline
python3 -m nose2 --start-dir tests/ --with-coverage
```

Run legacy tests / examples:

```commandline
python3 run_examples.py
```

Generate test data (mock pie charts):

```commandline
python3 run_generate_test_data.py
```

To extract data from any pie chart:

```python
from piechartocr import pie_chart_ocr

# Path to pie chart
path = "/path/to/my/chart.png"

# Extract data
data = pie_chart_ocr.main(path, interactive=False)

# Print the extracted list of tuples of the form [(percentage / 100, label)]
print(data["res"])
```

# Metrics

These metrics are autogenerated by the CI-pipeline.

Metrics for mock pie charts with legend:

![chart](https://git.ehtec.co/research/pie-chart-ocr/-/jobs/artifacts/main/raw/artifacts/ocr_test_metrics_mock_legend.png?job=generatemetrics)

Metrics for mock pie charts without legend:

![chart](https://git.ehtec.co/research/pie-chart-ocr/-/jobs/artifacts/main/raw/artifacts/ocr_test_metrics_mock_without_legend.png?job=generatemetrics)

Metrics for real world pie charts (many of them in awful quality, some even unreadable for humans):

![chart](https://git.ehtec.co/research/pie-chart-ocr/-/jobs/artifacts/main/raw/artifacts/ocr_test_metrics.png?job=generatemetrics)
