Metadata-Version: 2.1
Name: islearn
Version: 0.2.0
Summary: ISLearn (Learning ISLa input invariants)
Home-page: https://github.com/rindPHI/islearn
Author: Dominic Steinhöfel
Author-email: Dominic Steinhöfel <dominic.steinhoefel@cispa.de>
Project-URL: Homepage, https://github.com/rindPHI/islearn/
Project-URL: Bug Tracker, https://github.com/rindPHI/islearn/issues
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Provides-Extra: test
Provides-Extra: dev

# ISLearn

ISLearn is a system for learning ISLa constraints from a set of patterns.
Patterns are instantiated according to sample inputs (that can also automatically
be generated based on a given program property), filtered, and combined to
a disjunctive normal form (conjunctions of disjunctions of literals, where
literals are possibly negated, instantiated patterns). ISLearn output the result
ranked by estimates for *specificity* and *recall* of the results: Top-ranked
invariants were found to be best suitable for distinguishing positive (with
respect to a program property) and negative inputs.

## Example

Consider a grammar of a simple assignment programming language (e.g., "x := 1 ; y := x"):

```python
import string

LANG_GRAMMAR = {
    "<start>":
        ["<stmt>"],
    "<stmt>":
        ["<assgn>", "<assgn> ; <stmt>"],
    "<assgn>":
        ["<var> := <rhs>"],
    "<rhs>":
        ["<var>", "<digit>"],
    "<var>": list(string.ascii_lowercase),
    "<digit>": list(string.digits)
}
```

Let `prop` be a function returning `True` if a given statement evaluates normally,
and `False` if the evaluation raises an exception. The latter is the case if a
variable appears as a right-hand-side, but has not been assigned a variable before.

ISLearn can learn this property based on the "Def-Use (reST Strict)" pattern from
the catalog (the standard catalog is in `src/islearn/patterns.toml`). You call
the learner as follows:

```python
from islearn.learner import InvariantLearner
from isla.language import ISLaUnparser, Formula
from typing import Dict, Tuple

result: Dict[Formula, Tuple[float, float]] = InvariantLearner(
    LANG_GRAMMAR,
    prop=prop,
    activated_patterns={
        "Def-Use (reST Strict)",  # Optional; leads to quicker results
    },
).learn_invariants()

print("\n".join(map(
    lambda p: f"{p[1]}: " + ISLaUnparser(p[0]).unparse(),
    {f: p for f, p in result.items() if p[0] > .0}.items())))
```

The expected result is

```
(1.0, 1.0): forall <assgn> assgn_1="{<var> lhs_1} := {<var> var}" in start:
  exists <assgn> assgn_2="{<var> lhs_2} := {<rhs> rhs_2}" in start:
    (before(assgn_2, assgn_1) and (= lhs_2 var))
```

That invariant has full specificity (first value in the tuple) and recall
(second value), since it holds *exactly* for all valid inputs.

## Resources / Important Files

* You find the pattern catalog in `src/islearn/patterns.toml`.
* The evaluation scripts are in the directory `evaluations/`.
* The most important files of our implementation are `src/islearn/learner.py`,
  containing the learning incl. candidate generation, filtering, etc., and
  and `src/islearn/language.py`, which defines the abstract ISLa langauge for
  the patterns catalog.

## Configuration Parameters

ISLearn (the class `InvariantLearner`) has a number of optional configuration parameters that can be passed to the
constructor; many of them can be used to reduce or expand the search space, and thus either lead to more / longer
invariants or to quicker results.

| Parameter                                   | Default    | Description                                                                                                  |
|:--------------------------------------------|------------|--------------------------------------------------------------------------------------------------------------|
| prop                                        | None       | The program property for distinguishing valid from invalid inputs.                                           |
| positive_examples                           | None       | A set of valid sample inputs. Optional if prop is present.                                                   |
| negative_examples                           | None       | A set of invalid sample inputs. Optional. Needed to estimate specificifity if prop is not present.           |
| min_recall                                  | .9         | What is the minimum target recall value?                                                                     |
| min_specificity                             | .6         | What is the minimum target specificity value?                                                                |
| max_disjunction_size                        | 1          | Size of disjunctions to be generated. As default, no disjunctions (size 1). Potentially increases recall.    |
| include_negations_in_disjunctions           | False      | Also include negations in literals within disjunctions. Use with case; can lead to "spurious" invariants.    |
| max_conjunction_size                        | 2          | Size of conjunctions to be generated. As default, conjunctions of size 2. Potentially increases specificity. |
| activated_patterns                          | None       | A set of pattern names from the catalog that should be selected. As a default, all patterns are selected.    |
| deactivated_patterns                        | None       | A set of pattern names from the catalog that should *not* be selected.                                       |
| pattern_file                                | None       | A file name to a pattern catalog (TOML file). Standard is `src/islearn/patterns.toml`.                       |
| patterns                                    | None       | A set of patterns (abstract formulas) to be considered if no catalog should be used.                         |
| target_number_positive_samples              | 10         | How many positive samples should be created using fuzzing?                                                   |
| target_number_negative_samples              | 10         | How many negative samples should be created using fuzzing?                                                   |
| target_number_positive_samples_for_learning | 10         | How many positive samples should be generated specifically for learning? <= `positive_examples`              |
| reduce_inputs_for_learning                  | True       | If `True`, learning samples are reduced, keeping k-path coverage and validity.                               |
| reduce_all_inputs                           | False      | If `True`, all inputs are reduced. Reduction of learning inputs can be specifically disabled.                |
| generate_new_learning_samples               | True       | Only if `True`, the fuzzers are used to generate further learning examples.                                  |
| do_generate_more_inputs                     | True       | Only if `True`, any further examples (positive, negative) are generated by the fuzzers.                      |
| filter_inputs_for_learning_by_kpaths        | True       | If `True`, learning inputs are filtered k-paths: Retain only those with new k-path coverage information.     |
| mexpr_expansion_limit                       | 1          | To what depth should the learner search for instantiations of match expressions?                             |
| max_nonterminals_in_mexpr                   | None       | How many nonterminals are allowed in match expressions? `None` means no restriction.                         |
| exclude_nonterminals                        | None       | "Irrelevant" grammar nonterminals. Reduces search space. Example: White space nonterminals.                  |
| perform_static_implication_check            | False      | Statically exclude weaker invariants by a translation to Z3. Rather slow, handle with care.                  |
| k                                           | 3          | The `k` from `k`-Paths. Used by the input generators and filters.                                            |

## Install, Build, Test

ISLearn depends on Python 3.10 and the Python header files. To compile all of ISLearns's dependencies, you need
gcc, g++ make, and cmake. To check out the current ISLearn version, git will be needed. Furthermore, 
python3.10-venv is required to run ISLearn in a virtual environment.

On *Alpine Linux*, all dependencies can be installed using

```shell
apk add python3.10 python3.10-dev python3.10-venv gcc g++ make cmake git 
```

### Install

To install ISLearn, a simple `pip install islearn` suffices. We recommend installing ISLearn inside a virtual
environment (virtualenv):

```shell
python3.10 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install islearn
```

### Build 

ISLearn is built locally as follows:

```shell
git clone https://github.com/rindPHI/islearn.git
cd islearn/

python3.10 -m venv venv
source venv/bin/activate

pip install --upgrade pip
pip install --upgrade build
python3 -m build
```

Then, you will find the built wheel (`*.whl`) in the `dist/` directory.

### Testing & Development

For running the ISLearn tests, you additionally need `clang`, `racket`, and `graphviz`. On Alpine Linux, you
can install those by

```shell
apk update
apk upgrade
apk add clang racket graphviz
```

Then, you can run the ISLearn tests as follows:

```shell
git clone https://github.com/rindPHI/islearn.git
cd islearn/

python3.10 -m venv venv
source venv/bin/activate
pip install --upgrade pip

# Run tests
pip install -e .[test]
python3 -m pytest -n 16 tests
```
