Metadata-Version: 2.1
Name: road_data_scraper
Version: 0.0.18
Summary: Scrapes and Cleans WebTRIS Traffic Flow API
Author: Dominic Bean
License: MIT
Platform: unix
Platform: linux
Platform: osx
Platform: cygwin
Platform: win32
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Provides-Extra: testing
License-File: LICENSE

# Road Data Scraper

![Tests](https://github.com/dombean/road_data_scraper/actions/workflows/road_scraper.yml/badge.svg)

Scrapes and Cleans WebTRIS Traffic Flow API.

- Python Rewrite of [ONS Road Data Pipeline written in R](https://github.com/datasciencecampus/road-data-dump/tree/r-pipeline).
- Documentation of ONS Road Data Pipeline: https://datasciencecampus.github.io/road-data-pipeline-documentation/
- WebTRIS Traffic Flow API: https://webtris.highwaysengland.co.uk/api/swagger/ui/index

# Developer Usage

Download and Install __Python 3.9__; if using Anaconda or Miniconda, create a virtual environment with __Python 3.9__, e.g., `conda create --name py39 python=3.9`

1) Git clone the repository: `git clone https://github.com/dombean/road_data_scraper.git`
2) Change Directory inside the road_data_scraper folder: `cd road_data_scraper/`
3) Install package in editable mode: `pip install -e .`
4) Change directory into package folder: `cd src/road_data_scraper/`
5) Adjust config.ini file accordingly
6) Run the script: `python main.py` or `python3 main.py`

# Adjusting the Config File (config.ini)

There are 5 main configuration options in the config.ini file:
- __start_date__: provide a date in quotes, in the format, __%Y-%m-%d__; e.g, "2021-01-01" -- which is 1st January 2021.
- __end_date__: provide a date in quotes, in the format, __%Y-%m-%d__; e.g, "2021-01-31" -- which is 31st January 2021.
- __test_run__: can take on two values -- __True__ or __False__. Set test_run=False, when you want to download the entire data set. test_run by default is set to True, this is just to check the Pipeline works correctly (this will run the entire Pipeline on a subset of the available URL's).
- __generate_report__: can take on two values -- __True__ or __False__. By default, this is set to True, this will generate a HTML report with tables and graphs, showing the Active and Inactive ID's for each road sensor -- MIDAS, TMU, and TAME.
- __output_path__: provide a path, as a string, in order to save the outputs generated by the Road Data Scraper Pipeline; for example, "/home/user/Documents/"
- __rm_dir__: can take on two values -- __True__ or __False__. Set rm_dir=True, when you have a Google Cloud VM Instance and you don't want to storage the data on the VM (assuming you set __gcp_storage=True__).

## Google Cloud (GCP) Storage Options

Options to save output data to a Google Cloud bucket.

- __gcp_storage__: can take on two values -- __True__ or __False__. Set gcp_storage=True. This will save the data generated by a run of the Pipeline to a Google Cloud bucket.
- __gcp_credentials__: provide a path to your GCP credentials json file -- as a string; for example, "/home/user/gcp_credentials.json".
- __gcp_bucket_name__: provide the name of the GCP bucket -- as a string; for example, "road_data_scraper_bucket".
- __gcp_blob_name__: provide the name of the folder, you want the Pipeline to save the data to, in the GCP bucket -- as a string; for example, "landing_zone".

# Google Cloud VM Instance Setup

1) Login to __Google Cloud Platform__ and click on __Compute Engine__ in the left side-bar.
2) Then, in the left side-bar, click on __Marketplace__ and search for __Ubuntu 20.04 LTS (Focal)__, then, click __LAUNCH__.
3) Name the instance appropriately; click __COMPUTE-OPTIMISED__ (note: leave the defaults -- 4 vCPU, 16 GB memory); under __Firewall__, click __Allow HTTPS traffic__; and finally __CREATE__ the VM instance.
4) SSH into the VM instance.
5) Run the following commands: `sudo apt-get update && sudo apt-get dist-upgrade -y && sudo apt-get install python3-pip -y && sudo apt-get install wget -y`
6) Pip install the road_data_scraper Package using the command: `pip install road_data_scraper`
7) Upload GCP json credentials file.
8) Download the __config.ini__ file using the command: `wget https://raw.githubusercontent.com/dombean/road_data_scraper/main/src/road_data_scraper/config.ini`
9) Download the __runner.py__ file using the command: `wget https://raw.githubusercontent.com/dombean/road_data_scraper/main/runner.py`
10) Open __runner.py__ and put in the absolute path to the __config.ini__ file.
11) Change config.ini parameters accordingly, see README section: __Adjusting the Config File (config.ini)__.
12) Run the Road Data Scraper Pipeline using the command: `python3 runner.py`

# Google Cloud Run Setup

Note: Install Docker and Google Cloud SDK.
- Login to Google Cloud on the command line: ```gcloud auth login```
- Configure Google Cloud Project on the command line: ```gcloud config set project <project-name>```
- Configure Docker and Google Cloud Credentials: ```gcloud auth configure-docker```

1) Git clone the repository: `git clone https://github.com/dombean/road_data_scraper.git`
2) Change Directory inside the road_data_scraper folder: `cd road_data_scraper/`
3) Download Google Cloud __JSON Credentials__ into the repository.
4) Build the Docker Image: ```docker build -t road-data-scraper -f Dockerfile .```
5) Test the Docker Image: ```docker run -it --env PORT=80 -p 80:80 road-data-scraper```
6) Tag the Docker Image: ```docker tag road-data-scraper eu.gcr.io/<project-name>/road-data-scraper```
7) Push the Docker Image: ```docker push eu.gcr.io/<project-name>/road-data-scraper```
8) Deploy the Docker Image on Google Cloud Run: ```gcloud run deploy road-data-scraper --image eu.gcr.io/<project-name>/road-data-scraper --platform managed --region europe-west2 --timeout "3600" --cpu "4" --memory "16Gi" --max-instances "3"```
