Testing a hypothesis – non-stationary or time-reversible¶
We test the hypothesis that the GTR model is sufficient for a data set, compared with the GN (non-stationary general nucleotide model).
from cogent3 import get_app
loader = get_app("load_aligned", format="fasta", moltype="dna")
aln = loader("data/primate_brca1.fasta")
tree = "data/primate_brca1.tree"
null = get_app("model", "GTR", tree=tree, optimise_motif_probs=True)
alt = get_app("model", "GN", tree=tree, optimise_motif_probs=True)
hyp = get_app("hypothesis", null, alt)
result = hyp(aln)
type(result)
cogent3.app.result.hypothesis_result
result is a hypothesis_result object. The repr() displays the likelihood ratio test statistic, degrees of freedom and associated p-value>
result
| LR | df | pvalue |
|---|---|---|
| 9.3813 | 6 | 0.1532 |
| hypothesis | key | lnL | nfp | DLC | unique_Q |
|---|---|---|---|---|---|
| null | 'GTR' | -6992.5741 | 19 | True | True |
| alt | 'GN' | -6987.8834 | 25 | True | True |
In this case, we accept the null given the p-value is > 0.05. We use this object to demonstrate the properties of a hypothesis_result.
hypothesis_result has attributes and keys¶
Accessing the test statistics¶
result.LR, result.df, result.pvalue
(9.381277660886553, 6, 0.1532433450948613)
The null hypothesis¶
This model is accessed via the null attribute.
result.null
| key | lnL | nfp | DLC | unique_Q |
|---|---|---|---|---|
| 'GTR' | -6992.5741 | 19 | True | True |
result.null.lf
GTR
log-likelihood = -6992.5741
number of free parameters = 19
| A/C | A/G | A/T | C/G | C/T |
|---|---|---|---|---|
| 1.2296 | 5.2479 | 0.9473 | 2.3389 | 5.9667 |
| edge | parent | length |
|---|---|---|
| Galago | root | 0.1727 |
| HowlerMon | root | 0.0448 |
| Rhesus | edge.3 | 0.0215 |
| Orangutan | edge.2 | 0.0077 |
| Gorilla | edge.1 | 0.0025 |
| Human | edge.0 | 0.0060 |
| Chimpanzee | edge.0 | 0.0028 |
| edge.0 | edge.1 | 0.0000 |
| edge.1 | edge.2 | 0.0034 |
| edge.2 | edge.3 | 0.0119 |
| edge.3 | root | 0.0076 |
| A | C | G | T |
|---|---|---|---|
| 0.3792 | 0.1719 | 0.2066 | 0.2423 |
The alternate hypothesis¶
result.alt.lf
GN
log-likelihood = -6987.8834
number of free parameters = 25
| A>C | A>G | A>T | C>A | C>G | C>T | G>A | G>C | G>T | T>A |
|---|---|---|---|---|---|---|---|---|---|
| 0.8700 | 3.6670 | 0.9111 | 1.5925 | 2.1264 | 6.0324 | 8.2178 | 1.2288 | 0.6294 | 1.2499 |
| T>C |
|---|
| 3.4136 |
| edge | parent | length |
|---|---|---|
| Galago | root | 0.1735 |
| HowlerMon | root | 0.0450 |
| Rhesus | edge.3 | 0.0215 |
| Orangutan | edge.2 | 0.0078 |
| Gorilla | edge.1 | 0.0025 |
| Human | edge.0 | 0.0061 |
| Chimpanzee | edge.0 | 0.0028 |
| edge.0 | edge.1 | 0.0000 |
| edge.1 | edge.2 | 0.0033 |
| edge.2 | edge.3 | 0.0121 |
| edge.3 | root | 0.0077 |
| A | C | G | T |
|---|---|---|---|
| 0.3756 | 0.1768 | 0.2078 | 0.2398 |
Saving hypothesis results¶
You are advised to save these results as serialised data since this provides maximum flexibility for downstream analyses.
The following would write the result into a sqlitedb.
from cogent3 import get_app, open_data_store
output = open_data_store("path/to/myresults.sqlitedb", mode="w")
writer = get_app("write_db", data_store=output)
writer(result)