Metadata-Version: 2.1
Name: pyMILES
Version: 0.0.6
Summary: Multiple instance learning via embedded instance selection
Home-page: https://github.com/johnvorsten/MILES
Author: John Vorsten
Author-email: johnvorsten@yahoo.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3
Description-Content-Type: text/markdown
License-File: LICENSE

[![Tests](https://github.com/johnvorsten/trendreview/actions/workflows/python-app.yml/badge.svg)](https://github.com/johnvorsten/pyMILES/actions/workflows/python-app.yml) ![coverage](https://img.shields.io/static/v1?label=Coverage&message=74%&color=green)

# Multiple instance learning via embedded instance selection
This python package is an implementation of MILES: Multiple-instance learning via embedded instance selection from IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 28, NO. 12, DECEMBER 2006.

The paper describes a method to encode bag-space features into a space defined by the most-likely-cause-estimator of the bag and training feature space.

The most likely cause estimator is defined as ![Most Likely Estimator](most_likely_estimator.png "Most Likely Estimator")

# An example encoding
Look at `embedding_test.py` for an example embedding of dummy data.
Dummy data is created from 5 normal distributions, and each instance is generated by one of the following two-dimensional probability distributions: 

N1([5,5]^T, I), -> The normal distribution with mean [5,5] and 1 unit standard deviation
N2([5,-5]^T, I), 
N3([-5,5]^T, I),
N4([-5,-5]^T, I), 
N5([0,0]^T, I)

Bags are created from a variable number of instances per bag, and this example uses 8. A bag is labeled positive if it contains instances from at least two different distributions among N1, N2, and N3.  Otherwise the bag is negative.  This image displays the raw 2-dimensional data ![#2-D Raw Data](raw_data.png "Raw 2-D dummy data")

A single bag is of shape `(N_INSTANCES, FEATURE_SPACE)` where n is the number of instances in a bag, and p is the feature space of the instances.

All positive bags are of shape `(N_POSITIVE_BAGS, N_INSTANCES, FEATURE_SPACE)` where `N_POSITIVE_BAGS` is the number of positive bags.  Negative bags are of shape `(N_NEGATIVE_BAGS, N_INSTANCES, FEATURE_SPACE)`.  The total set of training instances is of shape `(N_POSITIVE_BAGS + N_NEGATIVE_BAGS, N_INSTANCES, FEATURE_SPACE)`.

A single bag is embedded into a vector of shape `((N_POSITIVE_BAGS + N_NEGATIVE_BAGS) * N_INSTANCES)`, which is the total number of instances from all positive and negative bags.

In this example let 
When projecting the training instances onto the vectors 
```python
# Feature vectors close to mean of `true` positive distributions
x1 = np.array([4.3, 5.2])
x2 = np.array([5.4, -3.9])
x3 = np.array([-6.0, 4.8])
```
the result is a (3,40) matrix which is visualized below. ![#Linearly Separable Bags](example_embedding.png "Example Embedding onto positive distributions")

## Testing
* python -m unittest tests.embedding_test
* python -m unittest tests.l1_svm_test

## Code coverage and linting
* pylint -r n src/tests/ src/pyMILES
* From src directory: `coverage run -m unittest tests.embedding_test`
* autopep8 --recursive --in-place src/tests/ src/pyMILES/

## Building
Increment build version in setup.cfg
python -m build .
python -m twine upload dist/*
