Metadata-Version: 2.1
Name: notebook-error-reporter
Version: 0.1
Summary: A error notification system for remote Jupyter notebooks
Home-page: https://github.com/matteoferla/remote-notebook-error-collection
Author: Matteo Ferla
Author-email: matteo.ferla@gmail.com
License: MIT
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Chemistry
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Provides-Extra: server
License-File: LICENSE

# remote-notebook-error-collection

> :construction: WIP: This is a MVP prototype. This is nominally a weekend project.
> To do implement an alt catching system as documented in [experimentation notes](experimentation.md)
> for colabs.

This aim to collect errors generated by other users using a notebook that was shared.
Three classes presented here are successive steps in its construction.

In a Jupyter notebook, this is what is run:
```python
!python -m pip install git+https://github.com/matteoferla/remote-notebook-error-collection.git
from notebook_error_reporter import ErrorServer

es = ErrorServer(url='https://errors.matteoferla.com', notebook='test')
es.enable()
```
Then if an error is raised it gets logged (see lengthy privacy discussion below)
and can be inspected:

```python
es.retrieve_errors()
```

## Aims
I have a few notebooks that I have shared on Twitter and 
I occasionally get an email telling if the repo they use is broken
or there is a case that causes an error.
Similar in concept to Sentry.io, I would like to know when error happen.
Most users will not email about errors, so one sees the tip of an iceberg.
This is because:

1. it is something silly they did
2. they worry it may be something silly they did
3. they deem the code crap

Point 1 implies there is a problem with user experience: it could have been clearer.
The user is never wrong: they have simply been misled.

Point 2 and 3 is an error that needs fixing.
Point 2 in particular means that better error handling is needed.
Point 3 Okay, the user is never wrong. However, instead of obfuscating the crapiness,
one can document the issue.

I do not want any private or confidential data from the user or user given fields
—someone's target protein might be confidential.
The code therefore should not contain error codes raise someone's 
password or credit card number or mutation.

I only want to receive

* the error type
* the error message
* some traceback details (line number, function name and filename minus path)
* the notebook name
* the cell's first line

In a regular locally hosted notebook there is the issue that servers collect IP addresses,
which point to a user's location. This is not quite GDPR data, but still.
Not collecting IP addresses is a terrible idea as fail2ban etc. rely on IP addressed to block wannabe hackers.

In a colab notebook this is rather straightforward as the IP of the request is
from the server running the kernel, not the browser (for that a javascript function is required to pass this info over).

Data not sent is:

* inputted values
* (majorly) content of a mounted Google Drive

## Store

An alternative option is storing the error details `error_details`.
```
from notebook_error_reporter import ErrorStore
es = ErrorStore()
es.enable()
es.error_details
```


## Slack
The easiest way is getting slacked on error to a channel.
A Slack webhook is easy to set up (just remember the subdomain to do so is api not app).

```python
import os
os.environ['SLACK_WEBHOOK'] = "https://hooks.slack.com/services/XXXXXXXX"

from notebook_error_reporter import ErrorSlack
es = ErrorSlack(os.environ['SLACK_WEBHOOK'])
es.enable()
```

A regular cell does nothing. But one that is not successful will send a Slack message.

    {"error_name": "ValueError", 
     "error_message": "foo", 
     "traceback": [{"filename": "foo.py",
                    "fun_name": "run_code", 
                    "lineno": 666}, 
                    ...
                   ], 
     "first_line": "# cell that does foo",
     "execution_count": 111}

The 'filename' is stripped of the dist-packages path, 
because the `dist-packages` path in colab may have a username that _could_ have personal identifiable data.

If a Slack webhook is shared on GitHub, there are users that search GitHub for exposed webhooks 
and spam with adverts for their cybersecurity courses.
Also a single prankster user could make it really annoying.
Therefore, a server needs to be set up ideally to collect this...

## Server

> For myself I have set-up https://errors.matteoferla.com

A FastAPI app to get the errors is also present.
This needs to be set up on a hosting server exposed to the internet.

This has the largest risk of vandalism.

So the server host would run `run_app.py`, which contains this code:
```python
import uvicorn
from fastapi import FastAPI
from notebook_error_reporter.serverside import create_db, create_app

create_db()
app:FastAPI = create_app(debug=False, max_transparency=True, colab_only=False)

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000, log_level="info")
```

While a user activate logging on the notebook thusly:
```python
from notebook_error_reporter import ErrorServer

es = ErrorServer(url='http://127.0.0.1:8000', notebook='mine')
es.enable()
```

On error a dictionary typehintinted as `EventMessageType` is sent:

```python
from notebook_error_reporter import EventMessageType

EventMessageType.__annotations__
```

    {'execution_count': int,
     'first_line': str,
     'error_name': str,
     'error_message': str,
     'traceback': typing.List[notebook_error_reporter.error_event._traceback.TracebackDetailsType]}

and `TracebackDetailsType.__annotations__` is:

    {'filename': str, 'fun_name': str, 'lineno': int}

The server does keep track of IP addresses to prevent vandalism,
but it's the IP address of the colab notebook. No JavaScript call is present to get the browser IP.
(Annoyingly I'd love to do some JS calls to get some useful data, but best not obfuscate!)
Therefore the IP will be in the range: 142.250.0.0 - 142.251.255.255.

To see the errors sent:

```python
es.retrieve_errors()
```

I am unsure if to allow everyone to see the sessions and errors, hence the `max_transparency` argument.
For an internal server, this makes sense, but for a public one, revealing the session ids may 
result in vandals adding errors to sessions randomly.

## Colab

Colab runs on an ancient version of IPython (5.5, cf. 8.2).
As a result things are done a bit differently.

`.enable` calls either `load_ipython_extension` or `monkeypatch_extension` depending on the ipython version.
The former adds an event callback function (`shell.events.callbacks`), which is all proper and good.
The latter monkeypatches a decorating function around `shell.showtraceback`, which knows about the
ErrorEvent/ErrorSlack/ErrorServer/ErrorStorage instance, 
because it was created in a factory method of the latter. As it does not have a result object,
it does not know what is the excecution count nor the first line of the cell.

```python
!python -m pip install git+https://github.com/matteoferla/remote-notebook-error-collection.git
from notebook_error_reporter import ErrorServer

es = ErrorServer(url='https://errors.matteoferla.com', notebook='test')
es.enable()
# raise an error:
raise ValueError('Foo')
```
The latter error can be seen to have been sent successfully:
```python
es.retrieve_errors()
```




