Metadata-Version: 2.1
Name: pyMILES
Version: 0.0.1
Summary: Multiple instance learning via embedded instance selection
Home-page: https://github.com/johnvorsten/MILES
Author: John Vorsten
Author-email: johnvorsten@yahoo.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3
Description-Content-Type: text/markdown
License-File: LICENSE

# Multiple instance learning via embedded instance selection
This python package is an implementation of MILES: Multiple-instance learning via embedded instance selection from IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 28, NO. 12, DECEMBER 2006.

The paper describes a method to encode bag-space features into a space defined by the most-likely-cause-estimator of the bag and training feature space.

The most likely cause estimator is defined as ![Most Likely Estimator](most_likely_estimator.png "Most Likely Estimator")

# An example encoding
Look at `embedding_test.py` for an example embedding of dummy data.
Dummy data is created from 5 normal distributions, and each instance is generated by one of the following two-dimensional probability distributions: 

N1([5,5]^T, I), -> The normal distribution with mean [5,5] and 1 unit standard deviation
N2([5,-5]^T, I), 
N3([-5,5]^T, I),
N4([-5,-5]^T, I), 
N5([0,0]^T, I)

Bags are created from a variable number of instances per bag, and this example uses 8. A bag is labeled positive if it contains instances from at least two different distributions among N1, N2, and N3.  Otherwise the bag is negative.  This image displays the raw 2-dimensional data ![#2-D Raw Data](raw_data.png "Raw 2-D dummy data")

A single bag is of shape `(N_INSTANCES, FEATURE_SPACE)` where n is the number of instances in a bag, and p is the feature space of the instances.

All positive bags are of shape `(N_POSITIVE_BAGS, N_INSTANCES, FEATURE_SPACE)` where `N_POSITIVE_BAGS` is the number of positive bags.  Negative bags are of shape `(N_NEGATIVE_BAGS, N_INSTANCES, FEATURE_SPACE)`.  The total set of training instances is of shape `(N_POSITIVE_BAGS + N_NEGATIVE_BAGS, N_INSTANCES, FEATURE_SPACE)`.

A single bag is embedded into a vector of shape `((N_POSITIVE_BAGS + N_NEGATIVE_BAGS) * N_INSTANCES)`, which is the total number of instances from all positive and negative bags.

In this example let 
When projecting the training instances onto the vectors 
```python
# Feature vectors close to mean of `true` positive distributions
x1 = np.array([4.3, 5.2])
x2 = np.array([5.4, -3.9])
x3 = np.array([-6.0, 4.8])
```
the result is a (3,40) matrix which is visualized below. ![#Linearly Separable Bags](example_embedding.png "Example Embedding onto positive distributions")


