Metadata-Version: 2.1
Name: netml
Version: 0.2.0
Summary: Network anomaly detection via machine learning
Home-page: https://github.com/chicago-cdac/netml
License: UNKNOWN
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: System :: Networking :: Monitoring
Requires-Python: >=3.8.11,<3.11
Description-Content-Type: text/markdown
Provides-Extra: cli
Provides-Extra: dev
License-File: LICENSE.txt

# netml

`netml` is a network anomaly detection tool & library written in Python.

The library contains two primary submodules:

* `pparser`: pcap parser\
Parse pcaps to produce flow features using [Scapy](https://scapy.net/).

* `ndm`: novelty detection modeling\
Detect novelties / anomalies, via different models, such as OCSVM.

The tool's command-line interface is documented by its built-in help flags such as `-h` and `--help`:

    netml --help


## Installation

The `netml` library is available on [PyPI](https://pypi.org/project/netml/):

    pip install netml

Or, from a repository clone:

    pip install .

### CLI

The CLI tool is available as a distribution "extra":

    pip install netml[cli]

Or:

    pip install .[cli]

#### Tab-completion

Shell tab-completion is provided by [`argcomplete`](https://github.com/kislyuk/argcomplete) (through `argcmdr`). Completion code appropriate to your shell may be generated by `register-python-argcomplete`, _e.g._:

    register-python-argcomplete --shell=bash netml

The results of the above should be evaluated, _e.g._:

    eval "$(register-python-argcomplete --shell=bash netml)"

Or, to ensure the above is evaluated for every session, _e.g._:

    register-python-argcomplete --shell=bash netml > ~/.bash_completion

For more information, refer to `argcmdr`: [Shell completion](https://github.com/dssg/argcmdr/tree/0.6.0#shell-completion).


## Use

### Classification of network traffic for outlier detection

Having [trained a model](#training-a-network-traffic-model) to your network traffic, the identification of anomalous traffic is as simple as providing a packet capture (PCAP) file to the `netml classify` command of the CLI:

    netml classify --model=model.dat < unclassified.pcap

Using the Python library, the same might be accomplished, _e.g._:

```python3
from netml.pparser.parser import PCAP
from netml.utils.tool import load_data

pcap = PCAP(
    'unclassified.pcap',
    flow_ptks_thres=2,
    random_state=42,
    verbose=10,
)

# extract flows from pcap
pcap.pcap2flows(q_interval=0.9)

# extract features from each flow given feat_type
pcap.flow2features('IAT', fft=False, header=False)

(model, train_history) = load_data('model.dat')

model.predict(pcap.features)
```

### Training a network traffic model

A model may be trained for outlier detection as simply as providing a PCAP file to the `netml learn` command:

    netml learn --pcap=traffic.pcap \
                --output=model.dat

(Note that for clarity and consistency with the `classify` command, the flags `--output` and `--model` are synonymous to the `learn` command.)

`netml learn` supports a great many additional options, documented by `netml learn --help`, `--help-algorithm` and `--help-param`, including:

* `--algorithm`: selection of model-training algorithms, such as One-Class Support Vector Machine (OCSVM), Kernel Density Estimation (KDE), Isolation Forest (IF) and Autoencoder (AE)
* `--param`: customization of model hyperparameters via YAML/JSON
* `--label`, `--pcap-normal` & `--pcap-abnormal`: optional labeling of traffic to enable post-training testing of the model

In the below examples, an OCSVM model is trained by demo traffic included in the library, and tested by labels in a CSV file, (both provided by the University of New Brunswick's [Intrusion Detection Systems dataset](https://www.unb.ca/cic/datasets/ids-2017.html)).

All of the below may be wrapped up into a single command via the CLI:

    netml learn --pcap=data/demo.pcap           \
                --label=data/demo.csv           \
                --output=out/OCSVM-results.dat

#### PCAP to features

To only extract features via the CLI:

    netml learn extract                         \
                --pcap=data/demo.pcap           \
                --label=data/demo.csv           \
                --feature=out/IAT-features.dat

Or in Python:

```python3
from netml.pparser.parser import PCAP
from netml.utils.tool import dump_data

pcap = PCAP(
    'data/demo.pcap',
    flow_ptks_thres=2,
    random_state=42,
    verbose=10,
)

# extract flows from pcap
pcap.pcap2flows(q_interval=0.9)

# label each flow (optional)
pcap.label_flows(label_file='data/demo.csv')

# extract features from each flow via IAT
pcap.flow2features('IAT', fft=False, header=False)

# dump data to disk
dump_data((pcap.features, pcap.labels), out_file='out/IAT-features.dat')

# stats
print(pcap.features.shape, pcap.pcap2flows.tot_time, pcap.flow2features.tot_time)
```

#### Features to model

To train from already-extracted features via the CLI:

    netml learn train                           \
                --feature=out/IAT-features.dat  \
                --output=out/OCSVM-results.dat

Or in Python:

```python3
from sklearn.model_selection import train_test_split

from netml.ndm.model import MODEL
from netml.ndm.ocsvm import OCSVM
from netml.utils.tool import dump_data, load_data

RANDOM_STATE = 42

# load data
(features, labels) = load_data('out/IAT-features.dat')

# split train and test sets
(
    features_train,
    features_test,
    labels_train,
    labels_test,
) = train_test_split(features, labels, test_size=0.33, random_state=RANDOM_STATE)

# create detection model
ocsvm = OCSVM(kernel='rbf', nu=0.5, random_state=RANDOM_STATE)
ocsvm.name = 'OCSVM'
ndm = MODEL(ocsvm, score_metric='auc', verbose=10, random_state=RANDOM_STATE)

# train the model from the train set
ndm.train(features_train)

# evaluate the trained model
ndm.test(features_test, labels_test)

# dump data to disk
dump_data((ocsvm, ndm.history), out_file='out/OCSVM-results.dat')

# stats
print(ndm.train.tot_time, ndm.test.tot_time, ndm.score)
```

For more examples, see the `examples/` directory in the source repository.


## Architecture

- `examples/`\
example code and datasets
- `src/netml/ndm/`\
detection models (such as OCSVM)
- `src/netml/pparser/`\
pcap processing (feature extraction) 
- `src/netml/utils/`\
common functions (such as `load_data` and `dump_data`)
- `tests/`\
test cases
- `LICENSE.txt`
- `manage.py`\
library development & management module
- `README.md`
- `setup.cfg`
- `setup.py`
- `tox.ini`


## To Do

Further work includes:

- Evaluate `pparser` performance on different pcaps
- Add test cases
- Add examples
- Add (generated) docs

We welcome any comments to make this tool more robust and easier to use!


## Development

Development dependencies may be installed via the `dev` extras (below assuming a source checkout):

    pip install --editable .[dev]

(Note: the installation flag `--editable` is also used above to instruct `pip` to place the source checkout directory itself onto the Python path, to ensure that any changes to the source are reflected in Python imports.)

Development tasks are then managed via [`argcmdr`](https://github.com/dssg/argcmdr) sub-commands of `manage …`, (as defined by the repository module `manage.py`), _e.g._:

    manage version patch -m "initial release of netml" \
           --build                                     \
           --release


## Thanks

`netml` is based on the initial work of the ["Outlier Detection" library `odet`](https://github.com/Learn-Live/odet) 🙌


## Citation

    @article{yang2020comparative,
             title={A Comparative Study of Network Traffic Representations for Novelty Detection},
             author={Kun Yang and Samory Kpotufe and Nick Feamster},
             year={2020},
             eprint={2006.16993},
             archivePrefix={arXiv},
             primaryClass={cs.NI}
    }


