Pygor Tutorial¶
Welcome to the pygor3 Tutorial.
Pygor3 is an open source project and Python package that allows to analyze infer, evaluate and generate V(D)J sequences, by using IGoR’s.
Pygor3 could help you to get simple calculations and visualizations of the statistics in VDJ recombination
IgorModel¶
An IGoR model’s encapsulates the Bayesian network probabilistic parameters of a V(D)J recombination process. IGoR is shipped with a set of default models.
As an example lets load the recombination model for a human \(\beta\) T-cell receptor
[1]:
import pygor3 as p3
mdl_hb = p3.get_default_IgorModel("human", "tcr_beta")
Reading Parms filename from: /home/olivares/.local/share/igor/models/human/tcr_beta/models/model_parms.txt
Reading Marginals filename from: /home/olivares/.local/share/igor/models/human/tcr_beta/models/model_marginals.txt
[2]:
mdl_hb['d_3_del']
[2]:
<xarray.DataArray (d_gene: 3, d_5_del: 21, d_3_del: 21)>
array([[[0.00000e+00, 0.00000e+00, 0.00000e+00, ..., 3.19224e-01,
2.89631e-01, 2.11165e-01],
[0.00000e+00, 0.00000e+00, 6.86291e-08, ..., 1.38170e-01,
3.02534e-01, 0.00000e+00],
[0.00000e+00, 0.00000e+00, 1.09220e-03, ..., 4.41026e-02,
0.00000e+00, 0.00000e+00],
...,
[0.00000e+00, 0.00000e+00, 0.00000e+00, ..., 0.00000e+00,
0.00000e+00, 0.00000e+00],
[0.00000e+00, 0.00000e+00, 0.00000e+00, ..., 0.00000e+00,
0.00000e+00, 0.00000e+00],
[0.00000e+00, 0.00000e+00, 0.00000e+00, ..., 0.00000e+00,
0.00000e+00, 0.00000e+00]],
[[0.00000e+00, 0.00000e+00, 0.00000e+00, ..., 2.11468e-03,
5.71094e-03, 7.95666e-02],
[0.00000e+00, 0.00000e+00, 0.00000e+00, ..., 1.50811e-02,
8.09057e-02, 7.76708e-01],
[0.00000e+00, 0.00000e+00, 3.94418e-06, ..., 2.35577e-03,
5.88810e-02, 7.62736e-02],
...
[0.00000e+00, 0.00000e+00, 1.25405e-01, ..., 0.00000e+00,
0.00000e+00, 0.00000e+00],
[0.00000e+00, 0.00000e+00, 0.00000e+00, ..., 0.00000e+00,
0.00000e+00, 0.00000e+00],
[0.00000e+00, 0.00000e+00, 0.00000e+00, ..., 0.00000e+00,
0.00000e+00, 0.00000e+00]],
[[0.00000e+00, 0.00000e+00, 0.00000e+00, ..., 1.51398e-04,
1.89857e-02, 6.63111e-01],
[0.00000e+00, 0.00000e+00, 0.00000e+00, ..., 1.29730e-02,
2.58015e-02, 9.39715e-01],
[0.00000e+00, 0.00000e+00, 0.00000e+00, ..., 1.20776e-02,
1.25424e-01, 1.52907e-01],
...,
[0.00000e+00, 0.00000e+00, 1.62236e-01, ..., 0.00000e+00,
0.00000e+00, 0.00000e+00],
[0.00000e+00, 0.00000e+00, 0.00000e+00, ..., 0.00000e+00,
0.00000e+00, 0.00000e+00],
[0.00000e+00, 0.00000e+00, 0.00000e+00, ..., 0.00000e+00,
0.00000e+00, 0.00000e+00]]])
Coordinates:
* d_gene (d_gene) int64 0 1 2
lbl__d_gene (d_gene) object ' TRBD1*01' ' TRBD2*01' ' TRBD2*02'
seq__d_gene (d_gene) object 'GGGACAGGGGGC' ... 'GGGACTAGCGGGAGGG'
* d_5_del (d_5_del) int64 0 1 2 3 4 5 6 7 8 ... 13 14 15 16 17 18 19 20
lbl__d_5_del (d_5_del) int64 -4 -3 -2 -1 0 1 2 3 ... 9 10 11 12 13 14 15 16
* d_3_del (d_3_del) int64 0 1 2 3 4 5 6 7 8 ... 13 14 15 16 17 18 19 20
lbl__d_3_del (d_3_del) int64 -4 -3 -2 -1 0 1 2 3 ... 9 10 11 12 13 14 15 16
Attributes:
nickname: d_3_del
event_type: Deletion
seq_type: D_gene
seq_side: Three_prime
priority: 5
parents: ['d_gene', 'd_5_del']
childs: []- d_gene: 3
- d_5_del: 21
- d_3_del: 21
- 0.0 0.0 0.0 1.647e-08 0.004823 1.081e-09 ... 0.0 0.0 0.0 0.0 0.0 0.0
array([[[0.00000e+00, 0.00000e+00, 0.00000e+00, ..., 3.19224e-01, 2.89631e-01, 2.11165e-01], [0.00000e+00, 0.00000e+00, 6.86291e-08, ..., 1.38170e-01, 3.02534e-01, 0.00000e+00], [0.00000e+00, 0.00000e+00, 1.09220e-03, ..., 4.41026e-02, 0.00000e+00, 0.00000e+00], ..., [0.00000e+00, 0.00000e+00, 0.00000e+00, ..., 0.00000e+00, 0.00000e+00, 0.00000e+00], [0.00000e+00, 0.00000e+00, 0.00000e+00, ..., 0.00000e+00, 0.00000e+00, 0.00000e+00], [0.00000e+00, 0.00000e+00, 0.00000e+00, ..., 0.00000e+00, 0.00000e+00, 0.00000e+00]], [[0.00000e+00, 0.00000e+00, 0.00000e+00, ..., 2.11468e-03, 5.71094e-03, 7.95666e-02], [0.00000e+00, 0.00000e+00, 0.00000e+00, ..., 1.50811e-02, 8.09057e-02, 7.76708e-01], [0.00000e+00, 0.00000e+00, 3.94418e-06, ..., 2.35577e-03, 5.88810e-02, 7.62736e-02], ... [0.00000e+00, 0.00000e+00, 1.25405e-01, ..., 0.00000e+00, 0.00000e+00, 0.00000e+00], [0.00000e+00, 0.00000e+00, 0.00000e+00, ..., 0.00000e+00, 0.00000e+00, 0.00000e+00], [0.00000e+00, 0.00000e+00, 0.00000e+00, ..., 0.00000e+00, 0.00000e+00, 0.00000e+00]], [[0.00000e+00, 0.00000e+00, 0.00000e+00, ..., 1.51398e-04, 1.89857e-02, 6.63111e-01], [0.00000e+00, 0.00000e+00, 0.00000e+00, ..., 1.29730e-02, 2.58015e-02, 9.39715e-01], [0.00000e+00, 0.00000e+00, 0.00000e+00, ..., 1.20776e-02, 1.25424e-01, 1.52907e-01], ..., [0.00000e+00, 0.00000e+00, 1.62236e-01, ..., 0.00000e+00, 0.00000e+00, 0.00000e+00], [0.00000e+00, 0.00000e+00, 0.00000e+00, ..., 0.00000e+00, 0.00000e+00, 0.00000e+00], [0.00000e+00, 0.00000e+00, 0.00000e+00, ..., 0.00000e+00, 0.00000e+00, 0.00000e+00]]]) - d_gene(d_gene)int640 1 2
array([0, 1, 2])
- lbl__d_gene(d_gene)object' TRBD1*01' ' TRBD2*01' ' TRBD2*02'
array([' TRBD1*01', ' TRBD2*01', ' TRBD2*02'], dtype=object)
- seq__d_gene(d_gene)object'GGGACAGGGGGC' ... 'GGGACTAGCGGG...
array(['GGGACAGGGGGC', 'GGGACTAGCGGGGGGG', 'GGGACTAGCGGGAGGG'], dtype=object) - d_5_del(d_5_del)int640 1 2 3 4 5 6 ... 15 16 17 18 19 20
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]) - lbl__d_5_del(d_5_del)int64-4 -3 -2 -1 0 1 ... 12 13 14 15 16
array([-4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]) - d_3_del(d_3_del)int640 1 2 3 4 5 6 ... 15 16 17 18 19 20
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]) - lbl__d_3_del(d_3_del)int64-4 -3 -2 -1 0 1 ... 12 13 14 15 16
array([-4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16])
- nickname :
- d_3_del
- event_type :
- Deletion
- seq_type :
- D_gene
- seq_side :
- Three_prime
- priority :
- 5
- parents :
- ['d_gene', 'd_5_del']
- childs :
- []
Bayesian Network¶
To visualize the composition of the Bayesina network we can use
[3]:
mdl_hb.plot_Bayes_network()
[3]:
<AxesSubplot:>
Similarly we can do the same for a human \(\alpha\) T-cell receptor
[4]:
mdl_ha = p3.get_default_IgorModel("human", "tcr_alpha")
mdl_ha.plot_Bayes_network()
Reading Parms filename from: /home/olivares/.local/share/igor/models/human/tcr_alpha/models/model_parms.txt
Reading Marginals filename from: /home/olivares/.local/share/igor/models/human/tcr_alpha/models/model_marginals.txt
[4]:
<AxesSubplot:>
Notice that for a \(\beta\) T-cell receptor we get a VDJ model and for the \(\alpha\) only a VJ.
An IgorModel internally has two structure based in IGoR’s model files: - model_parms.txt that contains the information about the events name, nickname, priority, and the dependencies between each other. - model_marginals.txt which contains the conditional probabilities for each event. - CDR3 anchors files.
IgorModel_Parms¶
[5]:
print(mdl_hb)
.xdata['v_3_del', 'j_choice', 'd_5_del', 'vd_dinucl', 'dj_dinucl', 'd_gene', 'v_choice', 'd_3_del', 'j_5_del', 'dj_ins', 'vd_ins']
[6]:
mdl_hb.parms, mdl_hb.marginals
[6]:
(<pygor3.IgorIO.IgorModel_Parms at 0x7fac64302950>,
<pygor3.IgorIO.IgorModel_Marginals at 0x7fac6459d710>)
To get a pandas dataframe of the realizations of an event just use the nickname of the event.
[7]:
mdl_hb.parms['d_gene']
[7]:
| value | name | |
|---|---|---|
| id | ||
| 0 | GGGACAGGGGGC | TRBD1*01 |
| 1 | GGGACTAGCGGGGGGG | TRBD2*01 |
| 2 | GGGACTAGCGGGAGGG | TRBD2*02 |
[8]:
event = mdl_hb.parms.get_Event('d_gene')
event.to_dict()
[8]:
{'event_type': 'GeneChoice',
'seq_type': 'D_gene',
'seq_side': 'Undefined_side',
'priority': 6,
'realizations': [{'id': 0, 'value': 'GGGACAGGGGGC', 'name': ' TRBD1*01'},
{'id': 1, 'value': 'GGGACTAGCGGGGGGG', 'name': ' TRBD2*01'},
{'id': 2, 'value': 'GGGACTAGCGGGAGGG', 'name': ' TRBD2*02'}],
'name': 'GeneChoice_D_gene_Undefined_side_prio6_size3',
'nickname': 'd_gene'}
All parameters of the events have 3 columns id, value and name. For the GeneChoice events like ‘d_gene’(above) the value is the sequence and the name the description of the sequence and for Deletion events like ‘d_3_del’ the value is an integer of the posible deletions.
[9]:
mdl_hb.parms['d_3_del']
[9]:
| value | name | |
|---|---|---|
| id | ||
| 0 | -4 | |
| 1 | -3 | |
| 2 | -2 | |
| 3 | -1 | |
| 4 | 0 | |
| 5 | 1 | |
| 6 | 2 | |
| 7 | 3 | |
| 8 | 4 | |
| 9 | 5 | |
| 10 | 6 | |
| 11 | 7 | |
| 12 | 8 | |
| 13 | 9 | |
| 14 | 10 | |
| 15 | 11 | |
| 16 | 12 | |
| 17 | 13 | |
| 18 | 14 | |
| 19 | 15 | |
| 20 | 16 |
Notice that for Deletion event negative values mean palidromic insertions
Aditionally the anchors dataframe are stores in parms.df_V_anchors for CDR3 V anchors and parms.df_J_anchors for J with gene (gene name) as index in
[10]:
mdl_hb.parms.df_J_anchors
[10]:
| anchor_index | |
|---|---|
| gene | |
| K02545|TRBJ1-1*01|Homo sapiens|F|J-REGION|749..796|48 nt|3| | | | |48+0=48| | | | 17 |
| K02545|TRBJ1-2*01|Homo sapiens|F|J-REGION|886..933|48 nt|3| | | | |48+0=48| | | | 17 |
| M14158|TRBJ1-3*01|Homo sapiens|F|J-REGION|1499..1548|50 nt|2| | | | |50+0=50| | | | 19 |
| M14158|TRBJ1-4*01|Homo sapiens|F|J-REGION|2095..2145|51 nt|3| | | | |51+0=51| | | | 20 |
| M14158|TRBJ1-5*01|Homo sapiens|F|J-REGION|2368..2417|50 nt|2| | | | |50+0=50| | | | 19 |
| M14158|TRBJ1-6*01|Homo sapiens|F|J-REGION|2859..2911|53 nt|2| | | | |53+0=53| | | | 22 |
| L36092|TRBJ1-6*02|Homo sapiens|F|J-REGION|643043..643095|53 nt|2| | | | |53+0=53| | | | 22 |
| X02987|TRBJ2-1*01|Homo sapiens|F|J-REGION|800..849|50 nt|2| | | | |50+0=50| | | | 19 |
| X02987|TRBJ2-2*01|Homo sapiens|F|J-REGION|995..1045|51 nt|3| | | | |51+0=51| | | | 20 |
| X02987|TRBJ2-3*01|Homo sapiens|F|J-REGION|1282..1330|49 nt|1| | | | |49+0=49| | | | 18 |
| X02987|TRBJ2-4*01|Homo sapiens|F|J-REGION|1432..1481|50 nt|2| | | | |50+0=50| | | | 19 |
| X02987|TRBJ2-5*01|Homo sapiens|F|J-REGION|1553..1600|48 nt|3| | | | |48+0=48| | | | 17 |
| X02987|TRBJ2-6*01|Homo sapiens|F|J-REGION|1673..1725|53 nt|2| | | | |53+0=53| | | | 22 |
| M14159|TRBJ2-7*01|Homo sapiens|F|J-REGION|2316..2362|47 nt|2| | | | |47+0=47| | | | 16 |
| X02987|TRBJ2-7*02|Homo sapiens|ORF|J-REGION|1890..1936|47 nt|2| | | | |47+0=47| | | | 16 |
IgorModel_Marginals¶
IgorModel_Marginals contains the conditional probabilities for each event.
- ATTENTION
IGoR’s model_marginals.txt stores conditional probabilities of the defined events, not marginals probabilities.
The conditional probability as a numpy array can be access with
[11]:
mdl_hb.marginals['j_choice']
[11]:
array([[1.28586e-01, 1.04003e-01, 4.10916e-02, ..., 0.00000e+00,
5.36364e-02, 3.73186e-02],
[1.66126e-01, 7.89615e-02, 1.13701e-02, ..., 2.46074e-02,
6.49049e-02, 7.46140e-02],
[1.66156e-01, 7.90103e-02, 1.13744e-02, ..., 2.45354e-02,
6.49334e-02, 7.46554e-02],
...,
[1.24050e-01, 4.76394e-02, 0.00000e+00, ..., 3.52196e-02,
5.51254e-02, 6.01472e-02],
[3.64941e-06, 2.04721e-09, 0.00000e+00, ..., 3.41728e-01,
6.84855e-02, 8.22442e-02],
[1.06077e-01, 1.13773e-01, 4.03129e-02, ..., 2.20679e-02,
7.16947e-02, 5.81105e-02]])
DataArrays with parms and marginals¶
The mdl_hb object encapsulates the information about IGoR’s model, like the Bayes network and the corresponding conditional probabilities for each event and the information about the parms and marginals can be access directly from the xarray in IgorModel without using IgorModel_Parms and IgorModel_Marginals like
Conditional probabilities¶
To get the conditional probabilities associated with this model we can the DataArray as
IgorModel[‘event_nickname’]
For instance from this Bayesian network, we can see that for this model the choice of V (‘v_choice’) or the number of insertions between the V and D segments are independent of the rest of events. Hence,
\(P(\text{v_choice})\) = mdl_hb[‘v_choice’]
However, for events like ‘d_gene’ or ‘j_5_del’ there are some conditional dependencies, therefore the notation in pygor
\(P(\text{d_gene}| \text{v_choice}, \text{j_choice})\) = mdl_hb[‘d_gene’]
The get the dependencies information there is variable parents as attribute
[12]:
mdl_hb['v_choice']
[12]:
<xarray.DataArray (v_choice: 89)>
array([4.88741e-03, 9.32369e-03, 9.32259e-03, 1.30320e-02, 3.43430e-04,
8.50694e-03, 7.71250e-03, 6.12276e-04, 5.06104e-03, 4.59289e-05,
4.48245e-03, 8.16181e-03, 7.00053e-04, 7.77164e-03, 1.16174e-02,
1.16158e-02, 1.13872e-02, 1.06555e-02, 2.36870e-03, 2.28368e-02,
1.56715e-04, 4.58447e-03, 0.00000e+00, 4.88782e-05, 0.00000e+00,
1.54500e-02, 2.74050e-02, 5.18979e-03, 4.80289e-03, 1.62765e-01,
6.61229e-02, 2.07174e-02, 7.36226e-04, 2.47846e-37, 1.07472e-02,
2.03820e-02, 9.53245e-03, 9.53006e-03, 2.35607e-03, 2.39208e-03,
2.39208e-03, 2.45219e-03, 1.66904e-02, 2.98807e-03, 2.98804e-03,
1.59156e-02, 1.25468e-02, 9.06711e-03, 9.06936e-03, 9.76534e-02,
9.57746e-03, 9.17011e-03, 1.14472e-02, 1.14458e-02, 9.77554e-03,
1.89924e-02, 3.97949e-04, 1.70242e-03, 8.91336e-03, 6.73926e-03,
6.69167e-03, 2.94257e-02, 1.38534e-02, 4.19227e-03, 4.19275e-03,
2.48630e-03, 4.83399e-04, 1.29795e-07, 6.49975e-05, 1.94736e-02,
1.94728e-02, 0.00000e+00, 3.91595e-03, 3.96230e-02, 0.00000e+00,
0.00000e+00, 6.73805e-05, 2.90073e-02, 9.28490e-03, 5.32118e-03,
8.18432e-03, 1.07035e-03, 3.64764e-04, 3.26790e-03, 3.26801e-03,
3.26801e-03, 3.26809e-03, 8.23600e-04, 1.56399e-02])
Coordinates:
* v_choice (v_choice) int64 0 1 2 3 4 5 6 7 ... 81 82 83 84 85 86 87 88
lbl__v_choice (v_choice) object 'U66059|TRBV1*01|Homo sapiens|P|V-REGION...
seq__v_choice (v_choice) object 'GATACTGGAATTACCCAGACACCAAAATACCTGGTCACA...
Attributes:
nickname: v_choice
event_type: GeneChoice
seq_type: V_gene
seq_side: Undefined_side
priority: 7
parents: []
childs: ['v_3_del', 'd_gene', 'j_choice']- v_choice: 89
- 0.004887 0.009324 0.009323 0.01303 ... 0.003268 0.0008236 0.01564
array([4.88741e-03, 9.32369e-03, 9.32259e-03, 1.30320e-02, 3.43430e-04, 8.50694e-03, 7.71250e-03, 6.12276e-04, 5.06104e-03, 4.59289e-05, 4.48245e-03, 8.16181e-03, 7.00053e-04, 7.77164e-03, 1.16174e-02, 1.16158e-02, 1.13872e-02, 1.06555e-02, 2.36870e-03, 2.28368e-02, 1.56715e-04, 4.58447e-03, 0.00000e+00, 4.88782e-05, 0.00000e+00, 1.54500e-02, 2.74050e-02, 5.18979e-03, 4.80289e-03, 1.62765e-01, 6.61229e-02, 2.07174e-02, 7.36226e-04, 2.47846e-37, 1.07472e-02, 2.03820e-02, 9.53245e-03, 9.53006e-03, 2.35607e-03, 2.39208e-03, 2.39208e-03, 2.45219e-03, 1.66904e-02, 2.98807e-03, 2.98804e-03, 1.59156e-02, 1.25468e-02, 9.06711e-03, 9.06936e-03, 9.76534e-02, 9.57746e-03, 9.17011e-03, 1.14472e-02, 1.14458e-02, 9.77554e-03, 1.89924e-02, 3.97949e-04, 1.70242e-03, 8.91336e-03, 6.73926e-03, 6.69167e-03, 2.94257e-02, 1.38534e-02, 4.19227e-03, 4.19275e-03, 2.48630e-03, 4.83399e-04, 1.29795e-07, 6.49975e-05, 1.94736e-02, 1.94728e-02, 0.00000e+00, 3.91595e-03, 3.96230e-02, 0.00000e+00, 0.00000e+00, 6.73805e-05, 2.90073e-02, 9.28490e-03, 5.32118e-03, 8.18432e-03, 1.07035e-03, 3.64764e-04, 3.26790e-03, 3.26801e-03, 3.26801e-03, 3.26809e-03, 8.23600e-04, 1.56399e-02]) - v_choice(v_choice)int640 1 2 3 4 5 6 ... 83 84 85 86 87 88
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88]) - lbl__v_choice(v_choice)object'U66059|TRBV1*01|Homo sapiens|P|...
array(['U66059|TRBV1*01|Homo sapiens|P|V-REGION|91723..92006|284 nt|1| | | | |284+0=284| | |', 'U66059|TRBV10-1*01|Homo sapiens|F|V-REGION|214801..215087|287 nt|1| | | | |287+0=287| | |', 'AF009660|TRBV10-1*02|Homo sapiens|F|V-REGION|54913..55199|287 nt|1| | | | |287+0=287| | |', 'AF009660|TRBV10-1*04|Homo sapiens|F|V-REGION|54913..55199|287 nt|1| Found in Data! Indiv 7. Also found by BLAST | | | |287+0=287| | |', 'U66059|TRBV10-2*01|Homo sapiens|F|V-REGION|239867..240153|287 nt|1| | | | |287+0=287| | |', 'U03115|TRBV10-3*01|Homo sapiens|F|V-REGION|14880..15166|287 nt|1| | | | |287+0=287| | |', 'U17047|TRBV10-3*02|Homo sapiens|F|V-REGION|84..365|282 nt|1| | | | |282+0=282| Extended by 2 | |', 'M33233|TRBV11-1*01|Homo sapiens|F|V-REGION|747..1036|290 nt|1| | | | |290+0=290| | |', 'U66059|TRBV11-2*01|Homo sapiens|F|V-REGION|248813..249102|290 nt|1| | | | |290+0=290| | |', "M33235|TRBV11-2*02|Homo sapiens|[F]|V-REGION|171..455|285 nt|1| | | | |285+0=285| Edited and Extended, partial in 3'| |", 'X58796|TRBV11-2*03|Homo sapiens|(F)|V-REGION|81..365|285 nt|1| | | | |285+0=285| | |', 'U03115|TRBV11-3*01|Homo sapiens|F|V-REGION|25513..25802|290 nt|1| | | | |290+0=290| | |', "X07224|TRBV12-1*01|Homo sapiens|P|V-REGION|381..667|287 nt|1| | | | |287+0=287|partial in 5'| |", "X06936|TRBV12-2*01|Homo sapiens|P|V-REGION|391..677|287 nt|1| | | | |287+0=287|partial in 5'| |", 'X07192|TRBV12-3*01|Homo sapiens|F|V-REGION|426..715|290 nt|1| | | | |290+0=290| | |', 'K02546|TRBV12-4*01|Homo sapiens|F|V-REGION|158..447|290 nt|1| | | | |290+0=290| | |', 'M14264|TRBV12-4*02|Homo sapiens|(F)|V-REGION|58..345|288 nt|1| | | | |288+0=288| | |', 'X07223|TRBV12-5*01|Homo sapiens|F|V-REGION|392..681|290 nt|1| | | | |290+0=290| | |', 'U03115|TRBV13*01|Homo sapiens|F|V-REGION|6502..6788|287 nt|1| | | | |287+0=287| | |', 'X06154|TRBV14*01|Homo sapiens|F|V-REGION|283..572|290 nt|1| | | | |290+0=290| | |', ... 'X61443|TRBV7-2*02|Homo sapiens|F|V-REGION|192..475|284 nt|1| | | | |284+0=284| Extended by 6 | |', 'U07975|TRBV7-2*03|Homo sapiens|F|V-REGION|7806..8090|285 nt|1| | | | |285+0=285| Extended by 2 | |', 'X61443|TRBV7-2*05|Homo sapiens|F|V-REGION|192..475|284 nt|1| | | | |284+0=284| Discovered in Data, Possible Error, Not found in BLAST | |', 'X61440|TRBV7-3*01|Homo sapiens|F|V-REGION|748..1037|290 nt|1| | | | |290+0=290| | |', 'M97943|TRBV7-3*02|Homo sapiens|ORF|V-REGION|404..693|290 nt|1| | | | |290+0=290| | |', 'AF009660|TRBV7-3*03|Homo sapiens|ORF|V-REGION|39374..39663|290 nt|1| | | | |290+0=290| | |', 'X74843|TRBV7-3*04|Homo sapiens|(F)|V-REGION|140..426|287 nt|1| | | | |287+0=287| Extended by 3 | |', 'L36092|TRBV7-4*01|Homo sapiens|F|V-REGION|270051..270340|290 nt|1| | | | |290+0=290| | |', 'L36092|TRBV7-6*01|Homo sapiens|F|V-REGION|307097..307386|290 nt|1| | | | |290+0=290| | |', 'L36092|TRBV7-7*01|Homo sapiens|F|V-REGION|326549..326838|290 nt|1| | | | |290+0=290| | |', 'L36092|TRBV7-7*03|Homo sapiens|F|V-REGION|326549..326838|290 nt|1| | | | |290+0=290| Discovered Allele in Data | Found by BLAST |', 'M11953|TRBV7-8*01|Homo sapiens|F|V-REGION|215..504|290 nt|1| | | | |290+0=290| | |', 'X61441|TRBV7-8*02|Homo sapiens|F|V-REGION|497..786|290 nt|1| | | | |290+0=290| | |', 'L36092|TRBV7-9*01|Homo sapiens|F|V-REGION|364320..364609|290 nt|1| | | | |290+0=290| | |', 'M14261|TRBV7-9*04|Homo sapiens|(F)|V-REGION|58..342|285 nt|1| | | | |285+0=285| | |', 'M27385|TRBV7-9*05|Homo sapiens|(F)|V-REGION|34..321|288 nt|1| | | | |288+0=288| Edited | |', 'X74844|TRBV7-9*06|Homo sapiens|(F)|V-REGION|100..387|288 nt|1| | | | |288+0=288| Edited | |', 'L14854|TRBV7-9*07|Homo sapiens|(F)|V-REGION|1..203|203 nt|1| | | | |203+0=203| Edited | |', 'U66059|TRBV9*01|Homo sapiens|F|V-REGION|206836..207121|286 nt|1| | | | |286+0=286| | |'], dtype=object) - seq__v_choice(v_choice)object'GATACTGGAATTACCCAGACACCAAAATACC...
array(['GATACTGGAATTACCCAGACACCAAAATACCTGGTCACAGCAATGGGGAGTAAAAGGACAATGAAACGTGAGCATCTGGGACATGATTCTATGTATTGGTACAGACAGAAAGCTAAGAAATCCCTGGAGTTCATGTTTTACTACAACTGTAAGGAATTCATTGAAAACAAGACTGTGCCAAATCACTTCACACCTGAATGCCCTGACAGCTCTCGCTTATACCTTCATGTGGTCGCACTGCAGCAAGAAGACTCAGCTGCGTATCTCTGCACCAGCAGCCAAGA', 'GATGCTGAAATCACCCAGAGCCCAAGACACAAGATCACAGAGACAGGAAGGCAGGTGACCTTGGCGTGTCACCAGACTTGGAACCACAACAATATGTTCTGGTATCGACAAGACCTGGGACATGGGCTGAGGCTGATCCATTACTCATATGGTGTTCAAGACACTAACAAAGGAGAAGTCTCAGATGGCTACAGTGTCTCTAGATCAAACACAGAGGACCTCCCCCTCACTCTGGAGTCTGCTGCCTCCTCCCAGACATCTGTATATTTCTGCGCCAGCAGTGAGTC', 'GATGCTGAAATCACCCAGAGCCCAAGACACAAGATCACAGAGACAGGAAGGCAGGTGACCTTGGCGTGTCACCAGACTTGGAACCACAACAATATGTTCTGGTATCGACAAGACCTGGGACATGGGCTGAGGCTGATCCATTACTCATATGGTGTTCACGACACTAACAAAGGAGAAGTCTCAGATGGCTACAGTGTCTCTAGATCAAACACAGAGGACCTCCCCCTCACTCTGGAGTCTGCTGCCTCCTCCCAGACATCTGTATATTTCTGCGCCAGCAGTGAGTC', 'GATGCTGAAATCACCCAGAGCCCAAGACACAAGATCACAGAGACAGGAAGGCAGGTGACCTTGGCGTGTCACCAGACTTGGAACCACAACAATATGTTCTGGTATCGACAAGACCTGGGACATGGGCTGAGGCTGATCCATTACTCATATGGTGTTCACGACACTAACAAAGGAGAAGTCTCAGATGGCTACAGTGTCTCTAGATCAAACACAGAGGACCTCCCCCTCACTCTGTAGTCTGCTGCCTCCTCCCAGACATCTGTATATTTCTGCGCCAGCAGTGAGTC', 'GATGCTGGAATCACCCAGAGCCCAAGATACAAGATCACAGAGACAGGAAGGCAGGTGACCTTGATGTGTCACCAGACTTGGAGCCACAGCTATATGTTCTGGTATCGACAAGACCTGGGACATGGGCTGAGGCTGATCTATTACTCAGCAGCTGCTGATATTACAGATAAAGGAGAAGTCCCCGATGGCTATGTTGTCTCCAGATCCAAGACAGAGAATTTCCCCCTCACTCTGGAGTCAGCTACCCGCTCCCAGACATCTGTGTATTTCTGCGCCAGCAGTGAGTC', 'GATGCTGGAATCACCCAGAGCCCAAGACACAAGGTCACAGAGACAGGAACACCAGTGACTCTGAGATGTCACCAGACTGAGAACCACCGCTATATGTACTGGTATCGACAAGACCCGGGGCATGGGCTGAGGCTGATCCATTACTCATATGGTGTTAAAGATACTGACAAAGGAGAAGTCTCAGATGGCTATAGTGTCTCTAGATCAAAGACAGAGGATTTCCTCCTCACTCTGGAGTCCGCTACCAGCTCCCAGACATCTGTGTACTTCTGTGCCATCAGTGAGTC', 'GCTGGAATCACCCAGAGCCCAAGACACAAGGTCACAGAGACAGGAACACCAGTGACTCTGAGATGTCATCAGACTGAGAACCACCGCTATATGTACTGGTATCGACAAGACCCGGGGCATGGGCTGAGGCTGATCCATTACTCATATGGTGTTAAAGATACTGACAAAGGAGAAGTCTCAGATGGCTATAGTGTCTCTAGATCAAAGACAGAGGATTTCCTCCTCACTCTGGAGTCCGCTACCAGCTCCCAGACATCTGTGTACTTCTGTGCCATCAGTGAGTC', 'GAAGCTGAAGTTGCCCAGTCCCCCAGATATAAGATTACAGAGAAAAGCCAGGCTGTGGCTTTTTGGTGTGATCCTATTTCTGGCCATGCTACCCTTTACTGGTACCGGCAGATCCTGGGACAGGGCCCGGAGCTTCTGGTTCAATTTCAGGATGAGAGTGTAGTAGATGATTCACAGTTGCCTAAGGATCGATTTTCTGCAGAGAGGCTCAAAGGAGTAGACTCCACTCTCAAGATCCAGCCTGCAGAGCTTGGGGACTCGGCCATGTATCTCTGTGCCAGCAGCTTAGC', 'GAAGCTGGAGTTGCCCAGTCTCCCAGATATAAGATTATAGAGAAAAGGCAGAGTGTGGCTTTTTGGTGCAATCCTATATCTGGCCATGCTACCCTTTACTGGTACCAGCAGATCCTGGGACAGGGCCCAAAGCTTCTGATTCAGTTTCAGAATAACGGTGTAGTGGATGATTCACAGTTGCCTAAGGATCGATTTTCTGCAGAGAGGCTCAAAGGAGTAGACTCCACTCTCAAGATCCAGCCTGCAAAGCTTGAGGACTCGGCCGTGTATCTCTGTGCCAGCAGCTTAGA', 'GAAGCTGGAGTTGCCCAGTCTCCCAGATATAAGATTATAGAGAAAAGGCAGAGTGTGGCTTTTTGGTGCAATCCTATATCTGGCCATGCTACCCTTTACTGGTACCAGCAGATCCTGGGACAGGGCCCAAAGCTTCTGATTCAGTTTCAGAATAACGGTGTAGTGGATGATTCACAGTTGCCTAAGGATCGATTTTCTGCAGAGAGGCTCAAAGGAGTAGACTCCACTCTCAAGATCCAGCCTGCAAAGCTTGAGAACTCGGCCGTGTATCTCTGTGCCAGCAGCTTAGA', 'GAAGCTGGAGTTGCCCAGTCTCCCAGATATAAGATTATAGAGAAAAGGCAGAGTGTGGCTTTTTGGTGCAATCCTATATCTGGCCATGCTACCCTTTACTGGTACCAGCAGATCCTGGGACAGGGCCCAAAGCTTCTGATTCAGTTTCAGAATAACGGTGTAGTGGATGATTCACAGTTGCCTAAGGATCGATTTTCTGCAGAGAGGCTCAAAGGAGTAGACTCCACTCTCAAGATCCAACCTGCAAAGCTTGAGGACTCGGCCGTGTATCTCTGTGCCAGCAGCTTAGA', 'GAAGCTGGAGTGGTTCAGTCTCCCAGATATAAGATTATAGAGAAAAAACAGCCTGTGGCTTTTTGGTGCAATCCTATTTCTGGCCACAATACCCTTTACTGGTACCTGCAGAACTTGGGACAGGGCCCGGAGCTTCTGATTCGATATGAGAATGAGGAAGCAGTAGACGATTCACAGTTGCCTAAGGATCGATTTTCTGCAGAGAGGCTCAAAGGAGTAGACTCCACTCTCAAGATCCAGCCTGCAGAGCTTGGGGACTCGGCCGTGTATCTCTGTGCCAGCAGCTTAGA', 'GCTGGTGTTATCCAGTCACCCAGGCACAAAGTGACAGAGATGGGACAATCAGTAACTCTGAGATGCGAACCAATTTCAGGCCACAATGATCTTCTCTGGTACAGACAGACCTTTGTGCAGGGACTGGAATTGCTGAATTACTTCTGCAGCTGGACCCTCGTAGATGACTCAGGAGTGTCCAAGGATTGATTCTCAGCACAGATGCCTGATGTATCATTCTCCACTCTGAGGATCCAGCCCATGGAACCCAGGGACTTGGGCCTATATTTCTGTGCCAGCAGCTTTGC', 'GCTGGCATTATCCAGTCACCCAAGCATGAGGTGACAGAAATGGGACAAACAGTGACTCTGAGATGTGAGCCAATTTTTGGCCACAATTTCCTTTTCTGGTACAGAGATACCTTCGTGCAGGGACTGGAATTGCTGAGTTACTTCCGGAGCTGATCTATTATAGATAATGCAGGTATGCCCACAGAGCGATTCTCAGCTGAGAGGCCTGATGGATCATTCTCTACTCTGAAGATCCAGCCTGCAGAGCAGGGGGACTCGGCCGTGTATGTCTGTGCAAGTCGCTTAGC', 'GATGCTGGAGTTATCCAGTCACCCCGCCATGAGGTGACAGAGATGGGACAAGAAGTGACTCTGAGATGTAAACCAATTTCAGGCCACAACTCCCTTTTCTGGTACAGACAGACCATGATGCGGGGACTGGAGTTGCTCATTTACTTTAACAACAACGTTCCGATAGATGATTCAGGGATGCCCGAGGATCGATTCTCAGCTAAGATGCCTAATGCATCATTCTCCACTCTGAAGATCCAGCCCTCAGAACCCAGGGACTCAGCTGTGTACTTCTGTGCCAGCAGTTTAGC', 'GATGCTGGAGTTATCCAGTCACCCCGGCACGAGGTGACAGAGATGGGACAAGAAGTGACTCTGAGATGTAAACCAATTTCAGGACACGACTACCTTTTCTGGTACAGACAGACCATGATGCGGGGACTGGAGTTGCTCATTTACTTTAACAACAACGTTCCGATAGATGATTCAGGGATGCCCGAGGATCGATTCTCAGCTAAGATGCCTAATGCATCATTCTCCACTCTGAAGATCCAGCCCTCAGAACCCAGGGACTCAGCTGTGTACTTCTGTGCCAGCAGTTTAGC', 'GATGCTGGAGTTATCCAGTCACCCCGGCACGAGGTGACAGAGATGGGACAAGAAGTGACTCTGAGATGTAAACCAATTTCAGGACATGACTACCTTTTCTGGTACAGACAGACCATGATGCGGGGACTGGAGTTGCTCATTTACTTTAACAACAACGTTCCGATAGATGATTCAGGGATGCCCGAGGATCGATTCTCAGCTAAGATGCCTAATGCATCATTCTCCACTCTGAGGATCCAGCCCTCAGAACCCAGGGACTCAGCTGTGTACTTCTGTGCCAGCAGTTTAGC', 'GATGCTAGAGTCACCCAGACACCAAGGCACAAGGTGACAGAGATGGGACAAGAAGTAACAATGAGATGTCAGCCAATTTTAGGCCACAATACTGTTTTCTGGTACAGACAGACCATGATGCAAGGACTGGAGTTGCTGGCTTACTTCCGCAACCGGGCTCCTCTAGATGATTCGGGGATGCCGAAGGATCGATTCTCAGCAGAGATGCCTGATGCAACTTTAGCCACTCTGAAGATCCAGCCCTCAGAACCCAGGGACTCAGCTGTGTATTTTTGTGCTAGTGGTTTGGT', 'GCTGCTGGAGTCATCCAGTCCCCAAGACATCTGATCAAAGAAAAGAGGGAAACAGCCACTCTGAAATGCTATCCTATCCCTAGACACGACACTGTCTACTGGTACCAGCAGGGTCCAGGTCAGGACCCCCAGTTCCTCATTTCGTTTTATGAAAAGATGCAGAGCGATAAAGGAAGCATCCCTGATCGATTCTCAGCTCAACAGTTCAGTGACTATCATTCTGAACTGAACATGAGCTCCTTGGAGCTGGGGGACTCAGCCCTGTACTTCTGTGCCAGCAGCTTAGG', 'GAAGCTGGAGTTACTCAGTTCCCCAGCCACAGCGTAATAGAGAAGGGCCAGACTGTGACTCTGAGATGTGACCCAATTTCTGGACATGATAATCTTTATTGGTATCGACGTGTTATGGGAAAAGAAATAAAATTTCTGTTACATTTTGTGAAAGAGTCTAAACAGGATGAGTCCGGTATGCCCAACAATCGATTCTTAGCTGAAAGGACTGGAGGGACGTATTCTACTCTGAAGGTGCAGCCTGCAGAACTGGAGGATTCTGGAGTTTATTTCTGTGCCAGCAGCCAAGA', ... 'GGAGCTGGAGTCTCCCAGTCCCCCAGTAACAAGGTCACAGAGAAGGGAAAGGATGTAGAGCTCAGGTGTGATCCAATTTCAGGTCATACTGCCCTTTACTGGTACCGACAGAGGCTGGGGCAGGGCCTGGAGTTTTTAATTTACTTCCAAGGCAACAGTGCACCAGACAAATCAGGGCTGCCCAGTGATCGCTTCTCTGCAGAGAGGACTGGGGAATCCGTCTCCACTCTGACGATCCAGCGCACACAGCAGGAGGACTCGGCCGTGTATCTCTGTGCCAGCAGCTTAGC', 'GCTGGAGTCTCCCAGTCCCCCAGTAACAAGGTCACAGAGAAGGGAAAGGATGTAGAGCTCAGGTGTGATCCAATTTCAGGTCATACTGCCCTTTACTGGTACCGACAGAGGCTGGGGCAGGGCCTGGAGTTTTTAATTTACTTCCAAGGCAACAGTGCACCAGACAAATCAGGGCTGCCCAGTGATCGCTTCTCTGCAGAGAGGACTGGGGAATCCGTCTCCACTCTGACGATCCAGCGCACACAGCAGGAGGACTCGGCCGTGTATCTCTGTACCAGCAGCTTAGC', 'GGAGCTGGAGTCTCCCAGTCCCCCAGTAACAAGGTCACAGAGAAGGGAAAGGATGTAGAGCTCAGGTGTGATCCAATTTCAGGTCATACTGCCCTTTACTGGTACCGACAGAGGCTGGGGCAGGGCCTGGAGTTTTTAATTTACTTCCAAGGCAACAGTGCACCAGACAAATCAGGGCTGCCCAGTGATCGCTTCTCTGCAGAGAGGACTGGGGAATCCGTCTCCACTCTGACGATCCAGCGCACATAGCAGGAGGACTCGGCCGTGTATCTCTGTGCCAGCAGCTTAGC', 'GGTGCTGGAGTCTCCCAGACCCCCAGTAACAAGGTCACAGAGAAGGGAAAATATGTAGAGCTCAGGTGTGATCCAATTTCAGGTCATACTGCCCTTTACTGGTACCGACAAAGCCTGGGGCAGGGCCCAGAGTTTCTAATTTACTTCCAAGGCACGGGTGCGGCAGATGACTCAGGGCTGCCCAACGATCGGTTCTTTGCAGTCAGGCCTGAGGGATCCGTCTCTACTCTGAAGATCCAGCGCACAGAGCGGGGGGACTCAGCCGTGTATCTCTGTGCCAGCAGCTTAAC', 'GGTGCTGGAGTCTCCCAGACCCCCAGTAACAAGGTCACAGAGAAGGGAAAAGATGTAGAGCTCAGGTGTGATCCAATTTCAGGTCATACTGCCCTTTACTGGTACCGACAAAGCCTGGGGCAGGGCCCAGAGTTTCTAATTTACTTCCAAGGCACGGGTGCGGCAGATGACTCAGGGCTGCCCAAAGATCGGTTCTTTGCAGTCAGGCCTGAGGGATCCGTCTCTACTCTGAAGATCCAGCGCACAGAGCAGGGGGACTCAGCCGTGTATCTCCGTGCCAGCAGCTTAAC', 'GGTGCTGGAGTCTCCCAGACCCCCAGTAACAAGGTCACAGAGAAGGGAAAAGATGTAGAGCTCAGGTGTGATCCAATTTCAGGTCATACTGCCCTTTACTGGTACCGACAAAGCCTGGGGCAGGGCCCAGAGTTTCTAATTTACTTCCAAGGCACGGGTGCGGCAGATGACTCAGGGCTGCCCAAAGATCGGTTCTTTGCAGTCAGGCCTGAGGGATCCGTCTCTACTCTGAAGATCCAGCGCACAGAGCAGGGGGACTCAGCCGCGTATCTCCGTGCCAGCAGCTTAAC', 'GGTGCTGGAGTCTCCCAGACCCCCAGTAACAAGGTCACAGAGAAGGGAAAATATGTAGAGCTCAGGTGTGATCCAATTTCAGGTCATACTGCCCTTTACTGGTACCGACAAAGCCTGGGGCAGGGCCCAGAGTTTCTAATTTACTTCCAAGGCACGGGTGCGGCAGATGACTCAGGGCTGCCCAACGATCGGTTCTTTGCAGTCAGGCCTGAGGGATCCGTCTCTACTCTGAAGATCCAGCGCACAGAGCGGGGGGACTCTGCCGTGTATCTCTGTGCCAGCAGCTTAAC', 'GGTGCTGGAGTCTCCCAGTCCCCAAGGTACAAAGTCGCAAAGAGGGGACGGGATGTAGCTCTCAGGTGTGATTCAATTTCGGGTCATGTAACCCTTTATTGGTACCGACAGACCCTGGGGCAGGGCTCAGAGGTTCTGACTTACTCCCAGAGTGATGCTCAACGAGACAAATCAGGGCGGCCCAGTGGTCGGTTCTCTGCAGAGAGGCCTGAGAGATCCGTCTCCACTCTGAAGATCCAGCGCACAGAGCAGGGGGACTCAGCTGTGTATCTCTGTGCCAGCAGCTTAGC', 'GGTGCTGGAGTCTCCCAGTCTCCCAGGTACAAAGTCACAAAGAGGGGACAGGATGTAGCTCTCAGGTGTGATCCAATTTCGGGTCATGTATCCCTTTATTGGTACCGACAGGCCCTGGGGCAGGGCCCAGAGTTTCTGACTTACTTCAATTATGAAGCCCAACAAGACAAATCAGGGCTGCCCAATGATCGGTTCTCTGCAGAGAGGCCTGAGGGATCCATCTCCACTCTGACGATCCAGCGCACAGAGCAGCGGGACTCGGCCATGTATCGCTGTGCCAGCAGCTTAGC', 'GGTGCTGGAGTCTCCCAGTCTCCCAGGTACAAAGTCACAAAGAGGGGACAGGATGTAACTCTCAGGTGTGATCCAATTTCGAGTCATGCAACCCTTTATTGGTATCAACAGGCCCTGGGGCAGGGCCCAGAGTTTCTGACTTACTTCAATTATGAAGCTCAACCAGACAAATCAGGGCTGCCCAGTGATCGGTTCTCTGCAGAGAGGCCTGAGGGATCCATCTCCACTCTGACGATTCAGCGCACAGAGCAGCGGGACTCAGCCATGTATCGCTGTGCCAGCAGCTTAGC', 'GGTGCTGGAGTCTCCCAGTCTCCCAGGTACAAAGTCACAAAGAGGGGACAGGATGTAACTCTCAGGTGTGATCCAATTTCGAGTCATGCAACCCTTTATTGGTATCAACAGGCCCTGGGGCAGGGCCCAGAGTTTCTGACTTACTTCAATTATGAAGCTCAACCAGACAAATCAGGGCTGCCCAGTGATCGGTTCTCTGCAGAGAGGCCTGAGGGATCCATCTCCACTCTGACGATTCAGCGCACAGAGCAGCGGGACTCAGCCATGTATCGCTGTGCTAGCAGCTTAGC', 'GGTGCTGGAGTCTCCCAGTCCCCTAGGTACAAAGTCGCAAAGAGAGGACAGGATGTAGCTCTCAGGTGTGATCCAATTTCGGGTCATGTATCCCTTTTTTGGTACCAACAGGCCCTGGGGCAGGGGCCAGAGTTTCTGACTTATTTCCAGAATGAAGCTCAACTAGACAAATCGGGGCTGCCCAGTGATCGCTTCTTTGCAGAAAGGCCTGAGGGATCCGTCTCCACTCTGAAGATCCAGCGCACACAGCAGGAGGACTCCGCCGTGTATCTCTGTGCCAGCAGCTTAGC', 'GGTGCTGGAGTCTCCCAGTCCCCTAGGTACAAAGTCGCAAAGAGAGGACAGGATGTAGCTCTCAGGTGTGATCCAATTTCGGGTCATGTATCCCTTTTTTGGTACCAACAGGCCCTGGGGCAGGGGCCAGAGTTTCTGACTTATTTCCAGAATGAAGCTCAACTAGACAAATCGGGGCTGCCCAGTGATCGCTTCTTTGCAGAAAGGCCTGAGGGATCCGTCTCCACTCTGAAGATCCAGCGCACACAGAAGGAGGACTCCGCCGTGTATCTCTGTGCCAGCAGCTTAGC', 'GATACTGGAGTCTCCCAGAACCCCAGACACAAGATCACAAAGAGGGGACAGAATGTAACTTTCAGGTGTGATCCAATTTCTGAACACAACCGCCTTTATTGGTACCGACAGACCCTGGGGCAGGGCCCAGAGTTTCTGACTTACTTCCAGAATGAAGCTCAACTAGAAAAATCAAGGCTGCTCAGTGATCGGTTCTCTGCAGAGAGGCCTAAGGGATCTTTCTCCACCTTGGAGATCCAGCGCACAGAGCAGGGGGACTCGGCCATGTATCTCTGTGCCAGCAGCTTAGC', 'ATATCTGGAGTCTCCCACAACCCCAGACACAAGATCACAAAGAGGGGACAGAATGTAACTTTCAGGTGTGATCCAATTTCTGAACACAACCGCCTTTATTGGTACCGACAGAACCCTGGGCAGGGCCCAGAGTTTCTGACTTACTTCCAGAATGAAGCTCAACTGGAAAAATCAGGGCTGCTCAGTGATCGGATCTCTGCAGAGAGGCCTAAGGGATCTTTCTCCACCTTGGAGATCCAGCGCACAGAGCAGGGGGACTCGGCCATGTATCTCTGTGCCAGCAGCTTAGC', 'GATACTGGAGTCTCCCAGAACCCCAGACACAAGATCACAAAGAGGGGACAGAATGTAACTTTCAGGTGTGATCCAATTTCTGAACACAACCGCCTTTATTGGTACCGACAGACCCTGGGGCAGGGCCCAGAGTTTCTGACTTACTTCCAGAATGAAGCTCAACTAGAAAAATCAAGGCTGCTCAGTGATCGGTTCTCTGCAGAGAGGCCTAAGGGATCTCTCTCCACCTTGGAGATCCAGCGCACAGAGCAGGGGGACTCGGCCATGTATCTCTGTGCCAGCAGCTTAGC', 'GATACTGGAGTCTCCCAGAACCCCAGACACAAGATCACAAAGAGGGGACAGAATGTAACTTTCAGGTGTGATCCAATTTCTGAACACAACCGCCTTTATTGGTACCGACAGACCCTGGGGCAGGGCCCAGAGTTTCTGACTTACTTCCAGAATGAAGCTCAACTAGAAAAATCAAGGCTGCTCAGTGATCGGTTCTCTGCAGAGAGGCCTAAGGGATCTCTTTCCACCTTGGAGATCCAGCGCACAGAGCAGGGGGACTCGGCCATGTATCTCTGTGCCAGCAGCTTAGC', 'CACAACCGCCTTTATTGGTACCGACAGACCCTGGGGCAGGGCCCAGAGTTTCTGACTTACTTCCAGAATGAAGCTCAACTAGAAAAATCAAGGCTGCTCAGTGATCGGTTCTCTGCAGAGAGGCCTAAGGGATCTTTCTCCACCTTGGAGATCCAGCGCACAGAGGAGGGGGACTCGGCCATGTATCTCTGTGCCAGCAGCTTAGC', 'GATTCTGGAGTCACACAAACCCCAAAGCACCTGATCACAGCAACTGGACAGCGAGTGACGCTGAGATGCTCCCCTAGGTCTGGAGACCTCTCTGTGTACTGGTACCAACAGAGCCTGGACCAGGGCCTCCAGTTCCTCATTCAGTATTATAATGGAGAAGAGAGAGCAAAAGGAAACATTCTTGAACGATTCTCCGCACAACAGTTCCCTGACTTGCACTCTGAACTAAACCTGAGCTCTCTGGAGCTGGGGGACTCAGCTTTGTATTTCTGTGCCAGCAGCGTAG'], dtype=object)
- nickname :
- v_choice
- event_type :
- GeneChoice
- seq_type :
- V_gene
- seq_side :
- Undefined_side
- priority :
- 7
- parents :
- []
- childs :
- ['v_3_del', 'd_gene', 'j_choice']
You can plot directly from the xarray
[13]:
mdl_hb['v_choice'].plot()
[13]:
[<matplotlib.lines.Line2D at 0x7fac3b5ef210>]
Or use the a model method to get the plot with the corresponding labels.
[14]:
mdl_hb.plot_Event('v_choice')
[14]:
(<Figure size 1296x1080 with 1 Axes>,
<AxesSubplot:title={'center':'$P($v_choice$)$'}>)
[15]:
mdl_hb.plot_Event('d_gene')
[15]:
(<Figure size 720x1440 with 6 Axes>,
array([<AxesSubplot:title={'center':'$P($d_gene$ = $ TRBD1*01 $|$v_choice,j_choice$)$'}, xlabel='j_choice', ylabel='v_choice'>,
<AxesSubplot:title={'center':'$P($d_gene$ = $ TRBD2*01 $|$v_choice,j_choice$)$'}, xlabel='j_choice', ylabel='v_choice'>,
<AxesSubplot:title={'center':'$P($d_gene$ = $ TRBD2*02 $|$v_choice,j_choice$)$'}, xlabel='j_choice', ylabel='v_choice'>],
dtype=object))
Marginal Probabilities¶
With IGoR provide us the conditional probabilities of the events defined in the Bayesian network.
So we can calculate marginal probabilities, i.e.
\(P(\text{j_choice}) = \sum_{\text{v_choice}}P(\text{j_choice}, \text{v_choice})\)
and using the Bayes theorem
$ P(\text{j_choice}, \text{v_choice}) = P(\text{j_choice} | \text{v_choice}) \times `P(:nbsphinx-math:text{v_choice}`)$
we get,
\(P(\text{j_choice}) = \sum_{\text{v_choice}} P(\text{j_choice} | \text{v_choice}) \times P(\text{v_choice})\)
[16]:
import xarray as xr
P_marginal_j_choice = xr.dot(mdl_hb['j_choice'], mdl_hb['v_choice'])
P_marginal_j_choice, P_marginal_j_choice.plot()
[16]:
(<xarray.DataArray (j_choice: 15)>
array([0.11962186, 0.10538077, 0.02016188, 0.05684014, 0.09945672,
0.03726745, 0.0322361 , 0.12542452, 0.05034454, 0.09853381,
0.01948227, 0.07500318, 0.0190534 , 0.0709002 , 0.07029376])
Coordinates:
* j_choice (j_choice) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
lbl__j_choice (j_choice) object 'K02545|TRBJ1-1*01|Homo sapiens|F|J-REGI...
seq__j_choice (j_choice) object 'TGAACACTGAAGCTTTCTTTGGACAAGGCACCAGACTCA...,
[<matplotlib.lines.Line2D at 0x7fac1dacf7d0>])
But when a model is loaded with pygor3 the marginals are calculated automatically using a variable elimination process and are store in the Pmarginal variable
IgorModel.Pmarginal[‘event_nickname’]
For this case
$ P(\text{'j_choice'}) = $ mdl_hb.Pmarginal[‘j_choice’]
[17]:
mdl_hb.plot_Event_Marginal('d_gene')
mdl_hb.Pmarginal['d_gene']
[17]:
<xarray.DataArray (d_gene: 3)>
array([0.56140871, 0.2361217 , 0.20247255])
Coordinates:
* d_gene (d_gene) int64 0 1 2
lbl__d_gene (d_gene) object ' TRBD1*01' ' TRBD2*01' ' TRBD2*02'
seq__d_gene (d_gene) object 'GGGACAGGGGGC' ... 'GGGACTAGCGGGAGGG'- d_gene: 3
- 0.5614 0.2361 0.2025
array([0.56140871, 0.2361217 , 0.20247255])
- d_gene(d_gene)int640 1 2
array([0, 1, 2])
- lbl__d_gene(d_gene)object' TRBD1*01' ' TRBD2*01' ' TRBD2*02'
array([' TRBD1*01', ' TRBD2*01', ' TRBD2*02'], dtype=object)
- seq__d_gene(d_gene)object'GGGACAGGGGGC' ... 'GGGACTAGCGGG...
array(['GGGACAGGGGGC', 'GGGACTAGCGGGGGGG', 'GGGACTAGCGGGAGGG'], dtype=object)
[18]:
mdl_hb.plot_Event_Marginal('d_3_del')
[18]:
<AxesSubplot:xlabel='d 3 del', ylabel='P'>
If you modifiy the conditional probabilities, to re-calculate the marginals, use the method generate_Pmarginals()
[19]:
mdl_hb.generate_Pmarginals()
Joint Probabilities¶
Pygor3 also have a method to calculate the joint probabilities of events
IgorModel.get_P_joint([‘event_nickname_1’, ‘event_nickname_2’, …])
- WARNING
Be carefull with this function a the computer memory consumption could increase if more than 2 events are requested.
$P(\text{'v_choice'}, \text{'j_choice'}) = $mdl_hb.get_P_joint([‘v_choice’, ‘j_choice’])
[20]:
P_V_J = mdl_hb.get_P_joint(['v_choice', 'j_choice'])
P_V_J
[20]:
<xarray.DataArray (v_choice: 89, j_choice: 15)>
array([[6.28453463e-04, 5.08306114e-04, 2.00831906e-04, ...,
0.00000000e+00, 2.62143710e-04, 1.82391677e-04],
[1.54890932e-03, 7.36213918e-04, 1.06011576e-04, ...,
2.29432410e-04, 6.05154843e-04, 6.95679479e-04],
[1.54900602e-03, 7.36581450e-04, 1.06039046e-04, ...,
2.28733827e-04, 6.05348578e-04, 6.95983268e-04],
...,
[4.05407259e-04, 1.55690174e-04, 0.00000000e+00, ...,
1.15101267e-04, 1.80155267e-04, 1.96566942e-04],
[3.00565952e-09, 1.68608515e-12, 0.00000000e+00, ...,
2.81447867e-04, 5.64048061e-05, 6.77364474e-05],
[1.65903669e-03, 1.77940202e-03, 6.30491475e-04, ...,
3.45140763e-04, 1.12130116e-03, 9.08844648e-04]])
Coordinates:
* v_choice (v_choice) int64 0 1 2 3 4 5 6 7 ... 81 82 83 84 85 86 87 88
seq__v_choice (v_choice) object 'GATACTGGAATTACCCAGACACCAAAATACCTGGTCACA...
* j_choice (j_choice) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
seq__j_choice (j_choice) object 'TGAACACTGAAGCTTTCTTTGGACAAGGCACCAGACTCA...- v_choice: 89
- j_choice: 15
- 0.0006285 0.0005083 0.0002008 ... 0.0003451 0.001121 0.0009088
array([[6.28453463e-04, 5.08306114e-04, 2.00831906e-04, ..., 0.00000000e+00, 2.62143710e-04, 1.82391677e-04], [1.54890932e-03, 7.36213918e-04, 1.06011576e-04, ..., 2.29432410e-04, 6.05154843e-04, 6.95679479e-04], [1.54900602e-03, 7.36581450e-04, 1.06039046e-04, ..., 2.28733827e-04, 6.05348578e-04, 6.95983268e-04], ..., [4.05407259e-04, 1.55690174e-04, 0.00000000e+00, ..., 1.15101267e-04, 1.80155267e-04, 1.96566942e-04], [3.00565952e-09, 1.68608515e-12, 0.00000000e+00, ..., 2.81447867e-04, 5.64048061e-05, 6.77364474e-05], [1.65903669e-03, 1.77940202e-03, 6.30491475e-04, ..., 3.45140763e-04, 1.12130116e-03, 9.08844648e-04]]) - v_choice(v_choice)int640 1 2 3 4 5 6 ... 83 84 85 86 87 88
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88]) - seq__v_choice(v_choice)object'GATACTGGAATTACCCAGACACCAAAATACC...
array(['GATACTGGAATTACCCAGACACCAAAATACCTGGTCACAGCAATGGGGAGTAAAAGGACAATGAAACGTGAGCATCTGGGACATGATTCTATGTATTGGTACAGACAGAAAGCTAAGAAATCCCTGGAGTTCATGTTTTACTACAACTGTAAGGAATTCATTGAAAACAAGACTGTGCCAAATCACTTCACACCTGAATGCCCTGACAGCTCTCGCTTATACCTTCATGTGGTCGCACTGCAGCAAGAAGACTCAGCTGCGTATCTCTGCACCAGCAGCCAAGA', 'GATGCTGAAATCACCCAGAGCCCAAGACACAAGATCACAGAGACAGGAAGGCAGGTGACCTTGGCGTGTCACCAGACTTGGAACCACAACAATATGTTCTGGTATCGACAAGACCTGGGACATGGGCTGAGGCTGATCCATTACTCATATGGTGTTCAAGACACTAACAAAGGAGAAGTCTCAGATGGCTACAGTGTCTCTAGATCAAACACAGAGGACCTCCCCCTCACTCTGGAGTCTGCTGCCTCCTCCCAGACATCTGTATATTTCTGCGCCAGCAGTGAGTC', 'GATGCTGAAATCACCCAGAGCCCAAGACACAAGATCACAGAGACAGGAAGGCAGGTGACCTTGGCGTGTCACCAGACTTGGAACCACAACAATATGTTCTGGTATCGACAAGACCTGGGACATGGGCTGAGGCTGATCCATTACTCATATGGTGTTCACGACACTAACAAAGGAGAAGTCTCAGATGGCTACAGTGTCTCTAGATCAAACACAGAGGACCTCCCCCTCACTCTGGAGTCTGCTGCCTCCTCCCAGACATCTGTATATTTCTGCGCCAGCAGTGAGTC', 'GATGCTGAAATCACCCAGAGCCCAAGACACAAGATCACAGAGACAGGAAGGCAGGTGACCTTGGCGTGTCACCAGACTTGGAACCACAACAATATGTTCTGGTATCGACAAGACCTGGGACATGGGCTGAGGCTGATCCATTACTCATATGGTGTTCACGACACTAACAAAGGAGAAGTCTCAGATGGCTACAGTGTCTCTAGATCAAACACAGAGGACCTCCCCCTCACTCTGTAGTCTGCTGCCTCCTCCCAGACATCTGTATATTTCTGCGCCAGCAGTGAGTC', 'GATGCTGGAATCACCCAGAGCCCAAGATACAAGATCACAGAGACAGGAAGGCAGGTGACCTTGATGTGTCACCAGACTTGGAGCCACAGCTATATGTTCTGGTATCGACAAGACCTGGGACATGGGCTGAGGCTGATCTATTACTCAGCAGCTGCTGATATTACAGATAAAGGAGAAGTCCCCGATGGCTATGTTGTCTCCAGATCCAAGACAGAGAATTTCCCCCTCACTCTGGAGTCAGCTACCCGCTCCCAGACATCTGTGTATTTCTGCGCCAGCAGTGAGTC', 'GATGCTGGAATCACCCAGAGCCCAAGACACAAGGTCACAGAGACAGGAACACCAGTGACTCTGAGATGTCACCAGACTGAGAACCACCGCTATATGTACTGGTATCGACAAGACCCGGGGCATGGGCTGAGGCTGATCCATTACTCATATGGTGTTAAAGATACTGACAAAGGAGAAGTCTCAGATGGCTATAGTGTCTCTAGATCAAAGACAGAGGATTTCCTCCTCACTCTGGAGTCCGCTACCAGCTCCCAGACATCTGTGTACTTCTGTGCCATCAGTGAGTC', 'GCTGGAATCACCCAGAGCCCAAGACACAAGGTCACAGAGACAGGAACACCAGTGACTCTGAGATGTCATCAGACTGAGAACCACCGCTATATGTACTGGTATCGACAAGACCCGGGGCATGGGCTGAGGCTGATCCATTACTCATATGGTGTTAAAGATACTGACAAAGGAGAAGTCTCAGATGGCTATAGTGTCTCTAGATCAAAGACAGAGGATTTCCTCCTCACTCTGGAGTCCGCTACCAGCTCCCAGACATCTGTGTACTTCTGTGCCATCAGTGAGTC', 'GAAGCTGAAGTTGCCCAGTCCCCCAGATATAAGATTACAGAGAAAAGCCAGGCTGTGGCTTTTTGGTGTGATCCTATTTCTGGCCATGCTACCCTTTACTGGTACCGGCAGATCCTGGGACAGGGCCCGGAGCTTCTGGTTCAATTTCAGGATGAGAGTGTAGTAGATGATTCACAGTTGCCTAAGGATCGATTTTCTGCAGAGAGGCTCAAAGGAGTAGACTCCACTCTCAAGATCCAGCCTGCAGAGCTTGGGGACTCGGCCATGTATCTCTGTGCCAGCAGCTTAGC', 'GAAGCTGGAGTTGCCCAGTCTCCCAGATATAAGATTATAGAGAAAAGGCAGAGTGTGGCTTTTTGGTGCAATCCTATATCTGGCCATGCTACCCTTTACTGGTACCAGCAGATCCTGGGACAGGGCCCAAAGCTTCTGATTCAGTTTCAGAATAACGGTGTAGTGGATGATTCACAGTTGCCTAAGGATCGATTTTCTGCAGAGAGGCTCAAAGGAGTAGACTCCACTCTCAAGATCCAGCCTGCAAAGCTTGAGGACTCGGCCGTGTATCTCTGTGCCAGCAGCTTAGA', 'GAAGCTGGAGTTGCCCAGTCTCCCAGATATAAGATTATAGAGAAAAGGCAGAGTGTGGCTTTTTGGTGCAATCCTATATCTGGCCATGCTACCCTTTACTGGTACCAGCAGATCCTGGGACAGGGCCCAAAGCTTCTGATTCAGTTTCAGAATAACGGTGTAGTGGATGATTCACAGTTGCCTAAGGATCGATTTTCTGCAGAGAGGCTCAAAGGAGTAGACTCCACTCTCAAGATCCAGCCTGCAAAGCTTGAGAACTCGGCCGTGTATCTCTGTGCCAGCAGCTTAGA', 'GAAGCTGGAGTTGCCCAGTCTCCCAGATATAAGATTATAGAGAAAAGGCAGAGTGTGGCTTTTTGGTGCAATCCTATATCTGGCCATGCTACCCTTTACTGGTACCAGCAGATCCTGGGACAGGGCCCAAAGCTTCTGATTCAGTTTCAGAATAACGGTGTAGTGGATGATTCACAGTTGCCTAAGGATCGATTTTCTGCAGAGAGGCTCAAAGGAGTAGACTCCACTCTCAAGATCCAACCTGCAAAGCTTGAGGACTCGGCCGTGTATCTCTGTGCCAGCAGCTTAGA', 'GAAGCTGGAGTGGTTCAGTCTCCCAGATATAAGATTATAGAGAAAAAACAGCCTGTGGCTTTTTGGTGCAATCCTATTTCTGGCCACAATACCCTTTACTGGTACCTGCAGAACTTGGGACAGGGCCCGGAGCTTCTGATTCGATATGAGAATGAGGAAGCAGTAGACGATTCACAGTTGCCTAAGGATCGATTTTCTGCAGAGAGGCTCAAAGGAGTAGACTCCACTCTCAAGATCCAGCCTGCAGAGCTTGGGGACTCGGCCGTGTATCTCTGTGCCAGCAGCTTAGA', 'GCTGGTGTTATCCAGTCACCCAGGCACAAAGTGACAGAGATGGGACAATCAGTAACTCTGAGATGCGAACCAATTTCAGGCCACAATGATCTTCTCTGGTACAGACAGACCTTTGTGCAGGGACTGGAATTGCTGAATTACTTCTGCAGCTGGACCCTCGTAGATGACTCAGGAGTGTCCAAGGATTGATTCTCAGCACAGATGCCTGATGTATCATTCTCCACTCTGAGGATCCAGCCCATGGAACCCAGGGACTTGGGCCTATATTTCTGTGCCAGCAGCTTTGC', 'GCTGGCATTATCCAGTCACCCAAGCATGAGGTGACAGAAATGGGACAAACAGTGACTCTGAGATGTGAGCCAATTTTTGGCCACAATTTCCTTTTCTGGTACAGAGATACCTTCGTGCAGGGACTGGAATTGCTGAGTTACTTCCGGAGCTGATCTATTATAGATAATGCAGGTATGCCCACAGAGCGATTCTCAGCTGAGAGGCCTGATGGATCATTCTCTACTCTGAAGATCCAGCCTGCAGAGCAGGGGGACTCGGCCGTGTATGTCTGTGCAAGTCGCTTAGC', 'GATGCTGGAGTTATCCAGTCACCCCGCCATGAGGTGACAGAGATGGGACAAGAAGTGACTCTGAGATGTAAACCAATTTCAGGCCACAACTCCCTTTTCTGGTACAGACAGACCATGATGCGGGGACTGGAGTTGCTCATTTACTTTAACAACAACGTTCCGATAGATGATTCAGGGATGCCCGAGGATCGATTCTCAGCTAAGATGCCTAATGCATCATTCTCCACTCTGAAGATCCAGCCCTCAGAACCCAGGGACTCAGCTGTGTACTTCTGTGCCAGCAGTTTAGC', 'GATGCTGGAGTTATCCAGTCACCCCGGCACGAGGTGACAGAGATGGGACAAGAAGTGACTCTGAGATGTAAACCAATTTCAGGACACGACTACCTTTTCTGGTACAGACAGACCATGATGCGGGGACTGGAGTTGCTCATTTACTTTAACAACAACGTTCCGATAGATGATTCAGGGATGCCCGAGGATCGATTCTCAGCTAAGATGCCTAATGCATCATTCTCCACTCTGAAGATCCAGCCCTCAGAACCCAGGGACTCAGCTGTGTACTTCTGTGCCAGCAGTTTAGC', 'GATGCTGGAGTTATCCAGTCACCCCGGCACGAGGTGACAGAGATGGGACAAGAAGTGACTCTGAGATGTAAACCAATTTCAGGACATGACTACCTTTTCTGGTACAGACAGACCATGATGCGGGGACTGGAGTTGCTCATTTACTTTAACAACAACGTTCCGATAGATGATTCAGGGATGCCCGAGGATCGATTCTCAGCTAAGATGCCTAATGCATCATTCTCCACTCTGAGGATCCAGCCCTCAGAACCCAGGGACTCAGCTGTGTACTTCTGTGCCAGCAGTTTAGC', 'GATGCTAGAGTCACCCAGACACCAAGGCACAAGGTGACAGAGATGGGACAAGAAGTAACAATGAGATGTCAGCCAATTTTAGGCCACAATACTGTTTTCTGGTACAGACAGACCATGATGCAAGGACTGGAGTTGCTGGCTTACTTCCGCAACCGGGCTCCTCTAGATGATTCGGGGATGCCGAAGGATCGATTCTCAGCAGAGATGCCTGATGCAACTTTAGCCACTCTGAAGATCCAGCCCTCAGAACCCAGGGACTCAGCTGTGTATTTTTGTGCTAGTGGTTTGGT', 'GCTGCTGGAGTCATCCAGTCCCCAAGACATCTGATCAAAGAAAAGAGGGAAACAGCCACTCTGAAATGCTATCCTATCCCTAGACACGACACTGTCTACTGGTACCAGCAGGGTCCAGGTCAGGACCCCCAGTTCCTCATTTCGTTTTATGAAAAGATGCAGAGCGATAAAGGAAGCATCCCTGATCGATTCTCAGCTCAACAGTTCAGTGACTATCATTCTGAACTGAACATGAGCTCCTTGGAGCTGGGGGACTCAGCCCTGTACTTCTGTGCCAGCAGCTTAGG', 'GAAGCTGGAGTTACTCAGTTCCCCAGCCACAGCGTAATAGAGAAGGGCCAGACTGTGACTCTGAGATGTGACCCAATTTCTGGACATGATAATCTTTATTGGTATCGACGTGTTATGGGAAAAGAAATAAAATTTCTGTTACATTTTGTGAAAGAGTCTAAACAGGATGAGTCCGGTATGCCCAACAATCGATTCTTAGCTGAAAGGACTGGAGGGACGTATTCTACTCTGAAGGTGCAGCCTGCAGAACTGGAGGATTCTGGAGTTTATTTCTGTGCCAGCAGCCAAGA', ... 'GGAGCTGGAGTCTCCCAGTCCCCCAGTAACAAGGTCACAGAGAAGGGAAAGGATGTAGAGCTCAGGTGTGATCCAATTTCAGGTCATACTGCCCTTTACTGGTACCGACAGAGGCTGGGGCAGGGCCTGGAGTTTTTAATTTACTTCCAAGGCAACAGTGCACCAGACAAATCAGGGCTGCCCAGTGATCGCTTCTCTGCAGAGAGGACTGGGGAATCCGTCTCCACTCTGACGATCCAGCGCACACAGCAGGAGGACTCGGCCGTGTATCTCTGTGCCAGCAGCTTAGC', 'GCTGGAGTCTCCCAGTCCCCCAGTAACAAGGTCACAGAGAAGGGAAAGGATGTAGAGCTCAGGTGTGATCCAATTTCAGGTCATACTGCCCTTTACTGGTACCGACAGAGGCTGGGGCAGGGCCTGGAGTTTTTAATTTACTTCCAAGGCAACAGTGCACCAGACAAATCAGGGCTGCCCAGTGATCGCTTCTCTGCAGAGAGGACTGGGGAATCCGTCTCCACTCTGACGATCCAGCGCACACAGCAGGAGGACTCGGCCGTGTATCTCTGTACCAGCAGCTTAGC', 'GGAGCTGGAGTCTCCCAGTCCCCCAGTAACAAGGTCACAGAGAAGGGAAAGGATGTAGAGCTCAGGTGTGATCCAATTTCAGGTCATACTGCCCTTTACTGGTACCGACAGAGGCTGGGGCAGGGCCTGGAGTTTTTAATTTACTTCCAAGGCAACAGTGCACCAGACAAATCAGGGCTGCCCAGTGATCGCTTCTCTGCAGAGAGGACTGGGGAATCCGTCTCCACTCTGACGATCCAGCGCACATAGCAGGAGGACTCGGCCGTGTATCTCTGTGCCAGCAGCTTAGC', 'GGTGCTGGAGTCTCCCAGACCCCCAGTAACAAGGTCACAGAGAAGGGAAAATATGTAGAGCTCAGGTGTGATCCAATTTCAGGTCATACTGCCCTTTACTGGTACCGACAAAGCCTGGGGCAGGGCCCAGAGTTTCTAATTTACTTCCAAGGCACGGGTGCGGCAGATGACTCAGGGCTGCCCAACGATCGGTTCTTTGCAGTCAGGCCTGAGGGATCCGTCTCTACTCTGAAGATCCAGCGCACAGAGCGGGGGGACTCAGCCGTGTATCTCTGTGCCAGCAGCTTAAC', 'GGTGCTGGAGTCTCCCAGACCCCCAGTAACAAGGTCACAGAGAAGGGAAAAGATGTAGAGCTCAGGTGTGATCCAATTTCAGGTCATACTGCCCTTTACTGGTACCGACAAAGCCTGGGGCAGGGCCCAGAGTTTCTAATTTACTTCCAAGGCACGGGTGCGGCAGATGACTCAGGGCTGCCCAAAGATCGGTTCTTTGCAGTCAGGCCTGAGGGATCCGTCTCTACTCTGAAGATCCAGCGCACAGAGCAGGGGGACTCAGCCGTGTATCTCCGTGCCAGCAGCTTAAC', 'GGTGCTGGAGTCTCCCAGACCCCCAGTAACAAGGTCACAGAGAAGGGAAAAGATGTAGAGCTCAGGTGTGATCCAATTTCAGGTCATACTGCCCTTTACTGGTACCGACAAAGCCTGGGGCAGGGCCCAGAGTTTCTAATTTACTTCCAAGGCACGGGTGCGGCAGATGACTCAGGGCTGCCCAAAGATCGGTTCTTTGCAGTCAGGCCTGAGGGATCCGTCTCTACTCTGAAGATCCAGCGCACAGAGCAGGGGGACTCAGCCGCGTATCTCCGTGCCAGCAGCTTAAC', 'GGTGCTGGAGTCTCCCAGACCCCCAGTAACAAGGTCACAGAGAAGGGAAAATATGTAGAGCTCAGGTGTGATCCAATTTCAGGTCATACTGCCCTTTACTGGTACCGACAAAGCCTGGGGCAGGGCCCAGAGTTTCTAATTTACTTCCAAGGCACGGGTGCGGCAGATGACTCAGGGCTGCCCAACGATCGGTTCTTTGCAGTCAGGCCTGAGGGATCCGTCTCTACTCTGAAGATCCAGCGCACAGAGCGGGGGGACTCTGCCGTGTATCTCTGTGCCAGCAGCTTAAC', 'GGTGCTGGAGTCTCCCAGTCCCCAAGGTACAAAGTCGCAAAGAGGGGACGGGATGTAGCTCTCAGGTGTGATTCAATTTCGGGTCATGTAACCCTTTATTGGTACCGACAGACCCTGGGGCAGGGCTCAGAGGTTCTGACTTACTCCCAGAGTGATGCTCAACGAGACAAATCAGGGCGGCCCAGTGGTCGGTTCTCTGCAGAGAGGCCTGAGAGATCCGTCTCCACTCTGAAGATCCAGCGCACAGAGCAGGGGGACTCAGCTGTGTATCTCTGTGCCAGCAGCTTAGC', 'GGTGCTGGAGTCTCCCAGTCTCCCAGGTACAAAGTCACAAAGAGGGGACAGGATGTAGCTCTCAGGTGTGATCCAATTTCGGGTCATGTATCCCTTTATTGGTACCGACAGGCCCTGGGGCAGGGCCCAGAGTTTCTGACTTACTTCAATTATGAAGCCCAACAAGACAAATCAGGGCTGCCCAATGATCGGTTCTCTGCAGAGAGGCCTGAGGGATCCATCTCCACTCTGACGATCCAGCGCACAGAGCAGCGGGACTCGGCCATGTATCGCTGTGCCAGCAGCTTAGC', 'GGTGCTGGAGTCTCCCAGTCTCCCAGGTACAAAGTCACAAAGAGGGGACAGGATGTAACTCTCAGGTGTGATCCAATTTCGAGTCATGCAACCCTTTATTGGTATCAACAGGCCCTGGGGCAGGGCCCAGAGTTTCTGACTTACTTCAATTATGAAGCTCAACCAGACAAATCAGGGCTGCCCAGTGATCGGTTCTCTGCAGAGAGGCCTGAGGGATCCATCTCCACTCTGACGATTCAGCGCACAGAGCAGCGGGACTCAGCCATGTATCGCTGTGCCAGCAGCTTAGC', 'GGTGCTGGAGTCTCCCAGTCTCCCAGGTACAAAGTCACAAAGAGGGGACAGGATGTAACTCTCAGGTGTGATCCAATTTCGAGTCATGCAACCCTTTATTGGTATCAACAGGCCCTGGGGCAGGGCCCAGAGTTTCTGACTTACTTCAATTATGAAGCTCAACCAGACAAATCAGGGCTGCCCAGTGATCGGTTCTCTGCAGAGAGGCCTGAGGGATCCATCTCCACTCTGACGATTCAGCGCACAGAGCAGCGGGACTCAGCCATGTATCGCTGTGCTAGCAGCTTAGC', 'GGTGCTGGAGTCTCCCAGTCCCCTAGGTACAAAGTCGCAAAGAGAGGACAGGATGTAGCTCTCAGGTGTGATCCAATTTCGGGTCATGTATCCCTTTTTTGGTACCAACAGGCCCTGGGGCAGGGGCCAGAGTTTCTGACTTATTTCCAGAATGAAGCTCAACTAGACAAATCGGGGCTGCCCAGTGATCGCTTCTTTGCAGAAAGGCCTGAGGGATCCGTCTCCACTCTGAAGATCCAGCGCACACAGCAGGAGGACTCCGCCGTGTATCTCTGTGCCAGCAGCTTAGC', 'GGTGCTGGAGTCTCCCAGTCCCCTAGGTACAAAGTCGCAAAGAGAGGACAGGATGTAGCTCTCAGGTGTGATCCAATTTCGGGTCATGTATCCCTTTTTTGGTACCAACAGGCCCTGGGGCAGGGGCCAGAGTTTCTGACTTATTTCCAGAATGAAGCTCAACTAGACAAATCGGGGCTGCCCAGTGATCGCTTCTTTGCAGAAAGGCCTGAGGGATCCGTCTCCACTCTGAAGATCCAGCGCACACAGAAGGAGGACTCCGCCGTGTATCTCTGTGCCAGCAGCTTAGC', 'GATACTGGAGTCTCCCAGAACCCCAGACACAAGATCACAAAGAGGGGACAGAATGTAACTTTCAGGTGTGATCCAATTTCTGAACACAACCGCCTTTATTGGTACCGACAGACCCTGGGGCAGGGCCCAGAGTTTCTGACTTACTTCCAGAATGAAGCTCAACTAGAAAAATCAAGGCTGCTCAGTGATCGGTTCTCTGCAGAGAGGCCTAAGGGATCTTTCTCCACCTTGGAGATCCAGCGCACAGAGCAGGGGGACTCGGCCATGTATCTCTGTGCCAGCAGCTTAGC', 'ATATCTGGAGTCTCCCACAACCCCAGACACAAGATCACAAAGAGGGGACAGAATGTAACTTTCAGGTGTGATCCAATTTCTGAACACAACCGCCTTTATTGGTACCGACAGAACCCTGGGCAGGGCCCAGAGTTTCTGACTTACTTCCAGAATGAAGCTCAACTGGAAAAATCAGGGCTGCTCAGTGATCGGATCTCTGCAGAGAGGCCTAAGGGATCTTTCTCCACCTTGGAGATCCAGCGCACAGAGCAGGGGGACTCGGCCATGTATCTCTGTGCCAGCAGCTTAGC', 'GATACTGGAGTCTCCCAGAACCCCAGACACAAGATCACAAAGAGGGGACAGAATGTAACTTTCAGGTGTGATCCAATTTCTGAACACAACCGCCTTTATTGGTACCGACAGACCCTGGGGCAGGGCCCAGAGTTTCTGACTTACTTCCAGAATGAAGCTCAACTAGAAAAATCAAGGCTGCTCAGTGATCGGTTCTCTGCAGAGAGGCCTAAGGGATCTCTCTCCACCTTGGAGATCCAGCGCACAGAGCAGGGGGACTCGGCCATGTATCTCTGTGCCAGCAGCTTAGC', 'GATACTGGAGTCTCCCAGAACCCCAGACACAAGATCACAAAGAGGGGACAGAATGTAACTTTCAGGTGTGATCCAATTTCTGAACACAACCGCCTTTATTGGTACCGACAGACCCTGGGGCAGGGCCCAGAGTTTCTGACTTACTTCCAGAATGAAGCTCAACTAGAAAAATCAAGGCTGCTCAGTGATCGGTTCTCTGCAGAGAGGCCTAAGGGATCTCTTTCCACCTTGGAGATCCAGCGCACAGAGCAGGGGGACTCGGCCATGTATCTCTGTGCCAGCAGCTTAGC', 'CACAACCGCCTTTATTGGTACCGACAGACCCTGGGGCAGGGCCCAGAGTTTCTGACTTACTTCCAGAATGAAGCTCAACTAGAAAAATCAAGGCTGCTCAGTGATCGGTTCTCTGCAGAGAGGCCTAAGGGATCTTTCTCCACCTTGGAGATCCAGCGCACAGAGGAGGGGGACTCGGCCATGTATCTCTGTGCCAGCAGCTTAGC', 'GATTCTGGAGTCACACAAACCCCAAAGCACCTGATCACAGCAACTGGACAGCGAGTGACGCTGAGATGCTCCCCTAGGTCTGGAGACCTCTCTGTGTACTGGTACCAACAGAGCCTGGACCAGGGCCTCCAGTTCCTCATTCAGTATTATAATGGAGAAGAGAGAGCAAAAGGAAACATTCTTGAACGATTCTCCGCACAACAGTTCCCTGACTTGCACTCTGAACTAAACCTGAGCTCTCTGGAGCTGGGGGACTCAGCTTTGTATTTCTGTGCCAGCAGCGTAG'], dtype=object) - j_choice(j_choice)int640 1 2 3 4 5 6 7 8 9 10 11 12 13 14
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
- seq__j_choice(j_choice)object'TGAACACTGAAGCTTTCTTTGGACAAGGCAC...
array(['TGAACACTGAAGCTTTCTTTGGACAAGGCACCAGACTCACAGTTGTAG', 'CTAACTATGGCTACACCTTCGGTTCGGGGACCAGGTTAACCGTTGTAG', 'CTCTGGAAACACCATATATTTTGGAGAGGGAAGTTGGCTCACTGTTGTAG', 'CAACTAATGAAAAACTGTTTTTTGGCAGTGGAACCCAGCTCTCTGTCTTGG', 'TAGCAATCAGCCCCAGCATTTTGGTGATGGGACTCGACTCTCCATCCTAG', 'CTCCTATAATTCACCCCTCCACTTTGGGAATGGGACCAGGCTCACTGTGACAG', 'CTCCTATAATTCACCCCTCCACTTTGGGAACGGGACCAGGCTCACTGTGACAG', 'CTCCTACAATGAGCAGTTCTTCGGGCCAGGGACACGGCTCACCGTGCTAG', 'CGAACACCGGGGAGCTGTTTTTTGGAGAAGGCTCTAGGCTGACCGTACTGG', 'AGCACAGATACGCAGTATTTTGGCCCAGGCACCCGGCTGACAGTGCTCG', 'AGCCAAAAACATTCAGTACTTCGGCGCCGGGACCCGGCTCTCAGTGCTGG', 'ACCAAGAGACCCAGTACTTCGGGCCAGGCACGCGGCTCCTGGTGCTCG', 'CTCTGGGGCCAACGTCCTGACTTTCGGGGCCGGCAGCAGGCTGACCGTGCTGG', 'CTCCTACGAGCAGTACTTCGGGCCGGGCACCAGGCTCACGGTCACAG', 'CTCCTACGAGCAGTACGTCGGGCCGGGCACCAGGCTCACGGTCACAG'], dtype=object)
[21]:
P_V_J.plot()
[21]:
<matplotlib.collections.QuadMesh at 0x7fac1c1fc8d0>
[22]:
mdl_hb['j_choice'].plot()
[22]:
<matplotlib.collections.QuadMesh at 0x7fac1c0e7310>
Editing a model¶
A model can be edited manually if necessary. In cases like a long gene description or to add new anchors to the existing genomic templates
[23]:
mdl_hb.genomic_dataframe_dict
[23]:
{'V': name \
id
0 U66059|TRBV1*01|Homo sapiens|P|V-REGION|91723....
1 U66059|TRBV10-1*01|Homo sapiens|F|V-REGION|214...
2 AF009660|TRBV10-1*02|Homo sapiens|F|V-REGION|5...
3 AF009660|TRBV10-1*04|Homo sapiens|F|V-REGION|5...
4 U66059|TRBV10-2*01|Homo sapiens|F|V-REGION|239...
.. ...
84 M14261|TRBV7-9*04|Homo sapiens|(F)|V-REGION|58...
85 M27385|TRBV7-9*05|Homo sapiens|(F)|V-REGION|34...
86 X74844|TRBV7-9*06|Homo sapiens|(F)|V-REGION|10...
87 L14854|TRBV7-9*07|Homo sapiens|(F)|V-REGION|1....
88 U66059|TRBV9*01|Homo sapiens|F|V-REGION|206836...
value anchor_index
id
0 GATACTGGAATTACCCAGACACCAAAATACCTGGTCACAGCAATGG... NaN
1 GATGCTGAAATCACCCAGAGCCCAAGACACAAGATCACAGAGACAG... 270.0
2 GATGCTGAAATCACCCAGAGCCCAAGACACAAGATCACAGAGACAG... 270.0
3 GATGCTGAAATCACCCAGAGCCCAAGACACAAGATCACAGAGACAG... 270.0
4 GATGCTGGAATCACCCAGAGCCCAAGATACAAGATCACAGAGACAG... 270.0
.. ... ...
84 ATATCTGGAGTCTCCCACAACCCCAGACACAAGATCACAAAGAGGG... 273.0
85 GATACTGGAGTCTCCCAGAACCCCAGACACAAGATCACAAAGAGGG... 273.0
86 GATACTGGAGTCTCCCAGAACCCCAGACACAAGATCACAAAGAGGG... 273.0
87 CACAACCGCCTTTATTGGTACCGACAGACCCTGGGGCAGGGCCCAG... 189.0
88 GATTCTGGAGTCACACAAACCCCAAAGCACCTGATCACAGCAACTG... 270.0
[89 rows x 3 columns],
'D': name value
id
0 TRBD1*01 GGGACAGGGGGC
1 TRBD2*01 GGGACTAGCGGGGGGG
2 TRBD2*02 GGGACTAGCGGGAGGG,
'J': name \
id
0 K02545|TRBJ1-1*01|Homo sapiens|F|J-REGION|749....
1 K02545|TRBJ1-2*01|Homo sapiens|F|J-REGION|886....
2 M14158|TRBJ1-3*01|Homo sapiens|F|J-REGION|1499...
3 M14158|TRBJ1-4*01|Homo sapiens|F|J-REGION|2095...
4 M14158|TRBJ1-5*01|Homo sapiens|F|J-REGION|2368...
5 M14158|TRBJ1-6*01|Homo sapiens|F|J-REGION|2859...
6 L36092|TRBJ1-6*02|Homo sapiens|F|J-REGION|6430...
7 X02987|TRBJ2-1*01|Homo sapiens|F|J-REGION|800....
8 X02987|TRBJ2-2*01|Homo sapiens|F|J-REGION|995....
9 X02987|TRBJ2-3*01|Homo sapiens|F|J-REGION|1282...
10 X02987|TRBJ2-4*01|Homo sapiens|F|J-REGION|1432...
11 X02987|TRBJ2-5*01|Homo sapiens|F|J-REGION|1553...
12 X02987|TRBJ2-6*01|Homo sapiens|F|J-REGION|1673...
13 M14159|TRBJ2-7*01|Homo sapiens|F|J-REGION|2316...
14 X02987|TRBJ2-7*02|Homo sapiens|ORF|J-REGION|18...
value anchor_index
id
0 TGAACACTGAAGCTTTCTTTGGACAAGGCACCAGACTCACAGTTGTAG 17
1 CTAACTATGGCTACACCTTCGGTTCGGGGACCAGGTTAACCGTTGTAG 17
2 CTCTGGAAACACCATATATTTTGGAGAGGGAAGTTGGCTCACTGTT... 19
3 CAACTAATGAAAAACTGTTTTTTGGCAGTGGAACCCAGCTCTCTGT... 20
4 TAGCAATCAGCCCCAGCATTTTGGTGATGGGACTCGACTCTCCATC... 19
5 CTCCTATAATTCACCCCTCCACTTTGGGAATGGGACCAGGCTCACT... 22
6 CTCCTATAATTCACCCCTCCACTTTGGGAACGGGACCAGGCTCACT... 22
7 CTCCTACAATGAGCAGTTCTTCGGGCCAGGGACACGGCTCACCGTG... 19
8 CGAACACCGGGGAGCTGTTTTTTGGAGAAGGCTCTAGGCTGACCGT... 20
9 AGCACAGATACGCAGTATTTTGGCCCAGGCACCCGGCTGACAGTGCTCG 18
10 AGCCAAAAACATTCAGTACTTCGGCGCCGGGACCCGGCTCTCAGTG... 19
11 ACCAAGAGACCCAGTACTTCGGGCCAGGCACGCGGCTCCTGGTGCTCG 17
12 CTCTGGGGCCAACGTCCTGACTTTCGGGGCCGGCAGCAGGCTGACC... 22
13 CTCCTACGAGCAGTACTTCGGGCCGGGCACCAGGCTCACGGTCACAG 16
14 CTCCTACGAGCAGTACGTCGGGCCGGGCACCAGGCTCACGGTCACAG 16 }
First we make a copy of genomic_dataframe_dict
[24]:
import copy
genomic_dict = copy.deepcopy(mdl_hb.genomic_dataframe_dict)
genomic_dict['V']['name']
[24]:
id
0 U66059|TRBV1*01|Homo sapiens|P|V-REGION|91723....
1 U66059|TRBV10-1*01|Homo sapiens|F|V-REGION|214...
2 AF009660|TRBV10-1*02|Homo sapiens|F|V-REGION|5...
3 AF009660|TRBV10-1*04|Homo sapiens|F|V-REGION|5...
4 U66059|TRBV10-2*01|Homo sapiens|F|V-REGION|239...
...
84 M14261|TRBV7-9*04|Homo sapiens|(F)|V-REGION|58...
85 M27385|TRBV7-9*05|Homo sapiens|(F)|V-REGION|34...
86 X74844|TRBV7-9*06|Homo sapiens|(F)|V-REGION|10...
87 L14854|TRBV7-9*07|Homo sapiens|(F)|V-REGION|1....
88 U66059|TRBV9*01|Homo sapiens|F|V-REGION|206836...
Name: name, Length: 89, dtype: object
Then we change the long description names using v_genLabel function.
[25]:
genomic_dict['V']['name'] = p3.v_genLabel(genomic_dict['V']['name'])
genomic_dict['J']['name'] = p3.v_genLabel(genomic_dict['J']['name'])
Finally, update the the genomic_dataframe_dict with this new dict, by using the set_genomic_dataframe_dict method.
[26]:
mdl_hb.set_genomic_dataframe_dict(genomic_dict)
[27]:
mdl_hb.parms['v_choice']
[27]:
| value | name | |
|---|---|---|
| id | ||
| 0 | GATACTGGAATTACCCAGACACCAAAATACCTGGTCACAGCAATGG... | TRBV1*01 |
| 1 | GATGCTGAAATCACCCAGAGCCCAAGACACAAGATCACAGAGACAG... | TRBV10-1*01 |
| 2 | GATGCTGAAATCACCCAGAGCCCAAGACACAAGATCACAGAGACAG... | TRBV10-1*02 |
| 3 | GATGCTGAAATCACCCAGAGCCCAAGACACAAGATCACAGAGACAG... | TRBV10-1*04 |
| 4 | GATGCTGGAATCACCCAGAGCCCAAGATACAAGATCACAGAGACAG... | TRBV10-2*01 |
| ... | ... | ... |
| 84 | ATATCTGGAGTCTCCCACAACCCCAGACACAAGATCACAAAGAGGG... | TRBV7-9*04 |
| 85 | GATACTGGAGTCTCCCAGAACCCCAGACACAAGATCACAAAGAGGG... | TRBV7-9*05 |
| 86 | GATACTGGAGTCTCCCAGAACCCCAGACACAAGATCACAAAGAGGG... | TRBV7-9*06 |
| 87 | CACAACCGCCTTTATTGGTACCGACAGACCCTGGGGCAGGGCCCAG... | TRBV7-9*07 |
| 88 | GATTCTGGAGTCACACAAACCCCAAAGCACCTGATCACAGCAACTG... | TRBV9*01 |
89 rows × 2 columns
[28]:
mdl_hb['v_choice']
[28]:
<xarray.DataArray (v_choice: 89)>
array([4.88741e-03, 9.32369e-03, 9.32259e-03, 1.30320e-02, 3.43430e-04,
8.50694e-03, 7.71250e-03, 6.12276e-04, 5.06104e-03, 4.59289e-05,
4.48245e-03, 8.16181e-03, 7.00053e-04, 7.77164e-03, 1.16174e-02,
1.16158e-02, 1.13872e-02, 1.06555e-02, 2.36870e-03, 2.28368e-02,
1.56715e-04, 4.58447e-03, 0.00000e+00, 4.88782e-05, 0.00000e+00,
1.54500e-02, 2.74050e-02, 5.18979e-03, 4.80289e-03, 1.62765e-01,
6.61229e-02, 2.07174e-02, 7.36226e-04, 2.47846e-37, 1.07472e-02,
2.03820e-02, 9.53245e-03, 9.53006e-03, 2.35607e-03, 2.39208e-03,
2.39208e-03, 2.45219e-03, 1.66904e-02, 2.98807e-03, 2.98804e-03,
1.59156e-02, 1.25468e-02, 9.06711e-03, 9.06936e-03, 9.76534e-02,
9.57746e-03, 9.17011e-03, 1.14472e-02, 1.14458e-02, 9.77554e-03,
1.89924e-02, 3.97949e-04, 1.70242e-03, 8.91336e-03, 6.73926e-03,
6.69167e-03, 2.94257e-02, 1.38534e-02, 4.19227e-03, 4.19275e-03,
2.48630e-03, 4.83399e-04, 1.29795e-07, 6.49975e-05, 1.94736e-02,
1.94728e-02, 0.00000e+00, 3.91595e-03, 3.96230e-02, 0.00000e+00,
0.00000e+00, 6.73805e-05, 2.90073e-02, 9.28490e-03, 5.32118e-03,
8.18432e-03, 1.07035e-03, 3.64764e-04, 3.26790e-03, 3.26801e-03,
3.26801e-03, 3.26809e-03, 8.23600e-04, 1.56399e-02])
Coordinates:
* v_choice (v_choice) int64 0 1 2 3 4 5 6 7 ... 81 82 83 84 85 86 87 88
lbl__v_choice (v_choice) object 'TRBV1*01' 'TRBV10-1*01' ... 'TRBV9*01'
seq__v_choice (v_choice) object 'GATACTGGAATTACCCAGACACCAAAATACCTGGTCACA...
Attributes:
nickname: v_choice
event_type: GeneChoice
seq_type: V_gene
seq_side: Undefined_side
priority: 7
parents: []
childs: ['v_3_del', 'd_gene', 'j_choice']- v_choice: 89
- 0.004887 0.009324 0.009323 0.01303 ... 0.003268 0.0008236 0.01564
array([4.88741e-03, 9.32369e-03, 9.32259e-03, 1.30320e-02, 3.43430e-04, 8.50694e-03, 7.71250e-03, 6.12276e-04, 5.06104e-03, 4.59289e-05, 4.48245e-03, 8.16181e-03, 7.00053e-04, 7.77164e-03, 1.16174e-02, 1.16158e-02, 1.13872e-02, 1.06555e-02, 2.36870e-03, 2.28368e-02, 1.56715e-04, 4.58447e-03, 0.00000e+00, 4.88782e-05, 0.00000e+00, 1.54500e-02, 2.74050e-02, 5.18979e-03, 4.80289e-03, 1.62765e-01, 6.61229e-02, 2.07174e-02, 7.36226e-04, 2.47846e-37, 1.07472e-02, 2.03820e-02, 9.53245e-03, 9.53006e-03, 2.35607e-03, 2.39208e-03, 2.39208e-03, 2.45219e-03, 1.66904e-02, 2.98807e-03, 2.98804e-03, 1.59156e-02, 1.25468e-02, 9.06711e-03, 9.06936e-03, 9.76534e-02, 9.57746e-03, 9.17011e-03, 1.14472e-02, 1.14458e-02, 9.77554e-03, 1.89924e-02, 3.97949e-04, 1.70242e-03, 8.91336e-03, 6.73926e-03, 6.69167e-03, 2.94257e-02, 1.38534e-02, 4.19227e-03, 4.19275e-03, 2.48630e-03, 4.83399e-04, 1.29795e-07, 6.49975e-05, 1.94736e-02, 1.94728e-02, 0.00000e+00, 3.91595e-03, 3.96230e-02, 0.00000e+00, 0.00000e+00, 6.73805e-05, 2.90073e-02, 9.28490e-03, 5.32118e-03, 8.18432e-03, 1.07035e-03, 3.64764e-04, 3.26790e-03, 3.26801e-03, 3.26801e-03, 3.26809e-03, 8.23600e-04, 1.56399e-02]) - v_choice(v_choice)int640 1 2 3 4 5 6 ... 83 84 85 86 87 88
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88]) - lbl__v_choice(v_choice)object'TRBV1*01' ... 'TRBV9*01'
array(['TRBV1*01', 'TRBV10-1*01', 'TRBV10-1*02', 'TRBV10-1*04', 'TRBV10-2*01', 'TRBV10-3*01', 'TRBV10-3*02', 'TRBV11-1*01', 'TRBV11-2*01', 'TRBV11-2*02', 'TRBV11-2*03', 'TRBV11-3*01', 'TRBV12-1*01', 'TRBV12-2*01', 'TRBV12-3*01', 'TRBV12-4*01', 'TRBV12-4*02', 'TRBV12-5*01', 'TRBV13*01', 'TRBV14*01', 'TRBV15*01', 'TRBV15*02', 'TRBV15*03', 'TRBV16*01', 'TRBV17*01', 'TRBV18*01', 'TRBV19*01', 'TRBV2*01', 'TRBV2*03', 'TRBV20-1*01', 'TRBV23-1*01', 'TRBV24-1*01', 'TRBV25-1*01', 'TRBV26*01', 'TRBV27*01', 'TRBV28*01', 'TRBV29-1*01', 'TRBV29-1*02', 'TRBV3-1*01', 'TRBV3-1*02', 'TRBV3-2*01', 'TRBV3-2*02', 'TRBV30*01', 'TRBV30*02', 'TRBV30*04', 'TRBV4-1*01', 'TRBV4-2*01', 'TRBV4-3*01', 'TRBV4-3*02', 'TRBV5-1*01', 'TRBV5-3*01', 'TRBV5-3*02', 'TRBV5-4*01', 'TRBV5-4*02', 'TRBV5-5*01', 'TRBV5-6*01', 'TRBV5-7*01', 'TRBV5-8*01', 'TRBV6-1*01', 'TRBV6-2*01', 'TRBV6-3*01', 'TRBV6-4*01', 'TRBV6-5*01', 'TRBV6-6*01', 'TRBV6-6*02', 'TRBV6-7*01', 'TRBV6-8*01', 'TRBV6-9*01', 'TRBV7-1*01', 'TRBV7-2*01', 'TRBV7-2*02', 'TRBV7-2*03', 'TRBV7-2*05', 'TRBV7-3*01', 'TRBV7-3*02', 'TRBV7-3*03', 'TRBV7-3*04', 'TRBV7-4*01', 'TRBV7-6*01', 'TRBV7-7*01', 'TRBV7-7*03', 'TRBV7-8*01', 'TRBV7-8*02', 'TRBV7-9*01', 'TRBV7-9*04', 'TRBV7-9*05', 'TRBV7-9*06', 'TRBV7-9*07', 'TRBV9*01'], dtype=object) - seq__v_choice(v_choice)object'GATACTGGAATTACCCAGACACCAAAATACC...
array(['GATACTGGAATTACCCAGACACCAAAATACCTGGTCACAGCAATGGGGAGTAAAAGGACAATGAAACGTGAGCATCTGGGACATGATTCTATGTATTGGTACAGACAGAAAGCTAAGAAATCCCTGGAGTTCATGTTTTACTACAACTGTAAGGAATTCATTGAAAACAAGACTGTGCCAAATCACTTCACACCTGAATGCCCTGACAGCTCTCGCTTATACCTTCATGTGGTCGCACTGCAGCAAGAAGACTCAGCTGCGTATCTCTGCACCAGCAGCCAAGA', 'GATGCTGAAATCACCCAGAGCCCAAGACACAAGATCACAGAGACAGGAAGGCAGGTGACCTTGGCGTGTCACCAGACTTGGAACCACAACAATATGTTCTGGTATCGACAAGACCTGGGACATGGGCTGAGGCTGATCCATTACTCATATGGTGTTCAAGACACTAACAAAGGAGAAGTCTCAGATGGCTACAGTGTCTCTAGATCAAACACAGAGGACCTCCCCCTCACTCTGGAGTCTGCTGCCTCCTCCCAGACATCTGTATATTTCTGCGCCAGCAGTGAGTC', 'GATGCTGAAATCACCCAGAGCCCAAGACACAAGATCACAGAGACAGGAAGGCAGGTGACCTTGGCGTGTCACCAGACTTGGAACCACAACAATATGTTCTGGTATCGACAAGACCTGGGACATGGGCTGAGGCTGATCCATTACTCATATGGTGTTCACGACACTAACAAAGGAGAAGTCTCAGATGGCTACAGTGTCTCTAGATCAAACACAGAGGACCTCCCCCTCACTCTGGAGTCTGCTGCCTCCTCCCAGACATCTGTATATTTCTGCGCCAGCAGTGAGTC', 'GATGCTGAAATCACCCAGAGCCCAAGACACAAGATCACAGAGACAGGAAGGCAGGTGACCTTGGCGTGTCACCAGACTTGGAACCACAACAATATGTTCTGGTATCGACAAGACCTGGGACATGGGCTGAGGCTGATCCATTACTCATATGGTGTTCACGACACTAACAAAGGAGAAGTCTCAGATGGCTACAGTGTCTCTAGATCAAACACAGAGGACCTCCCCCTCACTCTGTAGTCTGCTGCCTCCTCCCAGACATCTGTATATTTCTGCGCCAGCAGTGAGTC', 'GATGCTGGAATCACCCAGAGCCCAAGATACAAGATCACAGAGACAGGAAGGCAGGTGACCTTGATGTGTCACCAGACTTGGAGCCACAGCTATATGTTCTGGTATCGACAAGACCTGGGACATGGGCTGAGGCTGATCTATTACTCAGCAGCTGCTGATATTACAGATAAAGGAGAAGTCCCCGATGGCTATGTTGTCTCCAGATCCAAGACAGAGAATTTCCCCCTCACTCTGGAGTCAGCTACCCGCTCCCAGACATCTGTGTATTTCTGCGCCAGCAGTGAGTC', 'GATGCTGGAATCACCCAGAGCCCAAGACACAAGGTCACAGAGACAGGAACACCAGTGACTCTGAGATGTCACCAGACTGAGAACCACCGCTATATGTACTGGTATCGACAAGACCCGGGGCATGGGCTGAGGCTGATCCATTACTCATATGGTGTTAAAGATACTGACAAAGGAGAAGTCTCAGATGGCTATAGTGTCTCTAGATCAAAGACAGAGGATTTCCTCCTCACTCTGGAGTCCGCTACCAGCTCCCAGACATCTGTGTACTTCTGTGCCATCAGTGAGTC', 'GCTGGAATCACCCAGAGCCCAAGACACAAGGTCACAGAGACAGGAACACCAGTGACTCTGAGATGTCATCAGACTGAGAACCACCGCTATATGTACTGGTATCGACAAGACCCGGGGCATGGGCTGAGGCTGATCCATTACTCATATGGTGTTAAAGATACTGACAAAGGAGAAGTCTCAGATGGCTATAGTGTCTCTAGATCAAAGACAGAGGATTTCCTCCTCACTCTGGAGTCCGCTACCAGCTCCCAGACATCTGTGTACTTCTGTGCCATCAGTGAGTC', 'GAAGCTGAAGTTGCCCAGTCCCCCAGATATAAGATTACAGAGAAAAGCCAGGCTGTGGCTTTTTGGTGTGATCCTATTTCTGGCCATGCTACCCTTTACTGGTACCGGCAGATCCTGGGACAGGGCCCGGAGCTTCTGGTTCAATTTCAGGATGAGAGTGTAGTAGATGATTCACAGTTGCCTAAGGATCGATTTTCTGCAGAGAGGCTCAAAGGAGTAGACTCCACTCTCAAGATCCAGCCTGCAGAGCTTGGGGACTCGGCCATGTATCTCTGTGCCAGCAGCTTAGC', 'GAAGCTGGAGTTGCCCAGTCTCCCAGATATAAGATTATAGAGAAAAGGCAGAGTGTGGCTTTTTGGTGCAATCCTATATCTGGCCATGCTACCCTTTACTGGTACCAGCAGATCCTGGGACAGGGCCCAAAGCTTCTGATTCAGTTTCAGAATAACGGTGTAGTGGATGATTCACAGTTGCCTAAGGATCGATTTTCTGCAGAGAGGCTCAAAGGAGTAGACTCCACTCTCAAGATCCAGCCTGCAAAGCTTGAGGACTCGGCCGTGTATCTCTGTGCCAGCAGCTTAGA', 'GAAGCTGGAGTTGCCCAGTCTCCCAGATATAAGATTATAGAGAAAAGGCAGAGTGTGGCTTTTTGGTGCAATCCTATATCTGGCCATGCTACCCTTTACTGGTACCAGCAGATCCTGGGACAGGGCCCAAAGCTTCTGATTCAGTTTCAGAATAACGGTGTAGTGGATGATTCACAGTTGCCTAAGGATCGATTTTCTGCAGAGAGGCTCAAAGGAGTAGACTCCACTCTCAAGATCCAGCCTGCAAAGCTTGAGAACTCGGCCGTGTATCTCTGTGCCAGCAGCTTAGA', 'GAAGCTGGAGTTGCCCAGTCTCCCAGATATAAGATTATAGAGAAAAGGCAGAGTGTGGCTTTTTGGTGCAATCCTATATCTGGCCATGCTACCCTTTACTGGTACCAGCAGATCCTGGGACAGGGCCCAAAGCTTCTGATTCAGTTTCAGAATAACGGTGTAGTGGATGATTCACAGTTGCCTAAGGATCGATTTTCTGCAGAGAGGCTCAAAGGAGTAGACTCCACTCTCAAGATCCAACCTGCAAAGCTTGAGGACTCGGCCGTGTATCTCTGTGCCAGCAGCTTAGA', 'GAAGCTGGAGTGGTTCAGTCTCCCAGATATAAGATTATAGAGAAAAAACAGCCTGTGGCTTTTTGGTGCAATCCTATTTCTGGCCACAATACCCTTTACTGGTACCTGCAGAACTTGGGACAGGGCCCGGAGCTTCTGATTCGATATGAGAATGAGGAAGCAGTAGACGATTCACAGTTGCCTAAGGATCGATTTTCTGCAGAGAGGCTCAAAGGAGTAGACTCCACTCTCAAGATCCAGCCTGCAGAGCTTGGGGACTCGGCCGTGTATCTCTGTGCCAGCAGCTTAGA', 'GCTGGTGTTATCCAGTCACCCAGGCACAAAGTGACAGAGATGGGACAATCAGTAACTCTGAGATGCGAACCAATTTCAGGCCACAATGATCTTCTCTGGTACAGACAGACCTTTGTGCAGGGACTGGAATTGCTGAATTACTTCTGCAGCTGGACCCTCGTAGATGACTCAGGAGTGTCCAAGGATTGATTCTCAGCACAGATGCCTGATGTATCATTCTCCACTCTGAGGATCCAGCCCATGGAACCCAGGGACTTGGGCCTATATTTCTGTGCCAGCAGCTTTGC', 'GCTGGCATTATCCAGTCACCCAAGCATGAGGTGACAGAAATGGGACAAACAGTGACTCTGAGATGTGAGCCAATTTTTGGCCACAATTTCCTTTTCTGGTACAGAGATACCTTCGTGCAGGGACTGGAATTGCTGAGTTACTTCCGGAGCTGATCTATTATAGATAATGCAGGTATGCCCACAGAGCGATTCTCAGCTGAGAGGCCTGATGGATCATTCTCTACTCTGAAGATCCAGCCTGCAGAGCAGGGGGACTCGGCCGTGTATGTCTGTGCAAGTCGCTTAGC', 'GATGCTGGAGTTATCCAGTCACCCCGCCATGAGGTGACAGAGATGGGACAAGAAGTGACTCTGAGATGTAAACCAATTTCAGGCCACAACTCCCTTTTCTGGTACAGACAGACCATGATGCGGGGACTGGAGTTGCTCATTTACTTTAACAACAACGTTCCGATAGATGATTCAGGGATGCCCGAGGATCGATTCTCAGCTAAGATGCCTAATGCATCATTCTCCACTCTGAAGATCCAGCCCTCAGAACCCAGGGACTCAGCTGTGTACTTCTGTGCCAGCAGTTTAGC', 'GATGCTGGAGTTATCCAGTCACCCCGGCACGAGGTGACAGAGATGGGACAAGAAGTGACTCTGAGATGTAAACCAATTTCAGGACACGACTACCTTTTCTGGTACAGACAGACCATGATGCGGGGACTGGAGTTGCTCATTTACTTTAACAACAACGTTCCGATAGATGATTCAGGGATGCCCGAGGATCGATTCTCAGCTAAGATGCCTAATGCATCATTCTCCACTCTGAAGATCCAGCCCTCAGAACCCAGGGACTCAGCTGTGTACTTCTGTGCCAGCAGTTTAGC', 'GATGCTGGAGTTATCCAGTCACCCCGGCACGAGGTGACAGAGATGGGACAAGAAGTGACTCTGAGATGTAAACCAATTTCAGGACATGACTACCTTTTCTGGTACAGACAGACCATGATGCGGGGACTGGAGTTGCTCATTTACTTTAACAACAACGTTCCGATAGATGATTCAGGGATGCCCGAGGATCGATTCTCAGCTAAGATGCCTAATGCATCATTCTCCACTCTGAGGATCCAGCCCTCAGAACCCAGGGACTCAGCTGTGTACTTCTGTGCCAGCAGTTTAGC', 'GATGCTAGAGTCACCCAGACACCAAGGCACAAGGTGACAGAGATGGGACAAGAAGTAACAATGAGATGTCAGCCAATTTTAGGCCACAATACTGTTTTCTGGTACAGACAGACCATGATGCAAGGACTGGAGTTGCTGGCTTACTTCCGCAACCGGGCTCCTCTAGATGATTCGGGGATGCCGAAGGATCGATTCTCAGCAGAGATGCCTGATGCAACTTTAGCCACTCTGAAGATCCAGCCCTCAGAACCCAGGGACTCAGCTGTGTATTTTTGTGCTAGTGGTTTGGT', 'GCTGCTGGAGTCATCCAGTCCCCAAGACATCTGATCAAAGAAAAGAGGGAAACAGCCACTCTGAAATGCTATCCTATCCCTAGACACGACACTGTCTACTGGTACCAGCAGGGTCCAGGTCAGGACCCCCAGTTCCTCATTTCGTTTTATGAAAAGATGCAGAGCGATAAAGGAAGCATCCCTGATCGATTCTCAGCTCAACAGTTCAGTGACTATCATTCTGAACTGAACATGAGCTCCTTGGAGCTGGGGGACTCAGCCCTGTACTTCTGTGCCAGCAGCTTAGG', 'GAAGCTGGAGTTACTCAGTTCCCCAGCCACAGCGTAATAGAGAAGGGCCAGACTGTGACTCTGAGATGTGACCCAATTTCTGGACATGATAATCTTTATTGGTATCGACGTGTTATGGGAAAAGAAATAAAATTTCTGTTACATTTTGTGAAAGAGTCTAAACAGGATGAGTCCGGTATGCCCAACAATCGATTCTTAGCTGAAAGGACTGGAGGGACGTATTCTACTCTGAAGGTGCAGCCTGCAGAACTGGAGGATTCTGGAGTTTATTTCTGTGCCAGCAGCCAAGA', ... 'GGAGCTGGAGTCTCCCAGTCCCCCAGTAACAAGGTCACAGAGAAGGGAAAGGATGTAGAGCTCAGGTGTGATCCAATTTCAGGTCATACTGCCCTTTACTGGTACCGACAGAGGCTGGGGCAGGGCCTGGAGTTTTTAATTTACTTCCAAGGCAACAGTGCACCAGACAAATCAGGGCTGCCCAGTGATCGCTTCTCTGCAGAGAGGACTGGGGAATCCGTCTCCACTCTGACGATCCAGCGCACACAGCAGGAGGACTCGGCCGTGTATCTCTGTGCCAGCAGCTTAGC', 'GCTGGAGTCTCCCAGTCCCCCAGTAACAAGGTCACAGAGAAGGGAAAGGATGTAGAGCTCAGGTGTGATCCAATTTCAGGTCATACTGCCCTTTACTGGTACCGACAGAGGCTGGGGCAGGGCCTGGAGTTTTTAATTTACTTCCAAGGCAACAGTGCACCAGACAAATCAGGGCTGCCCAGTGATCGCTTCTCTGCAGAGAGGACTGGGGAATCCGTCTCCACTCTGACGATCCAGCGCACACAGCAGGAGGACTCGGCCGTGTATCTCTGTACCAGCAGCTTAGC', 'GGAGCTGGAGTCTCCCAGTCCCCCAGTAACAAGGTCACAGAGAAGGGAAAGGATGTAGAGCTCAGGTGTGATCCAATTTCAGGTCATACTGCCCTTTACTGGTACCGACAGAGGCTGGGGCAGGGCCTGGAGTTTTTAATTTACTTCCAAGGCAACAGTGCACCAGACAAATCAGGGCTGCCCAGTGATCGCTTCTCTGCAGAGAGGACTGGGGAATCCGTCTCCACTCTGACGATCCAGCGCACATAGCAGGAGGACTCGGCCGTGTATCTCTGTGCCAGCAGCTTAGC', 'GGTGCTGGAGTCTCCCAGACCCCCAGTAACAAGGTCACAGAGAAGGGAAAATATGTAGAGCTCAGGTGTGATCCAATTTCAGGTCATACTGCCCTTTACTGGTACCGACAAAGCCTGGGGCAGGGCCCAGAGTTTCTAATTTACTTCCAAGGCACGGGTGCGGCAGATGACTCAGGGCTGCCCAACGATCGGTTCTTTGCAGTCAGGCCTGAGGGATCCGTCTCTACTCTGAAGATCCAGCGCACAGAGCGGGGGGACTCAGCCGTGTATCTCTGTGCCAGCAGCTTAAC', 'GGTGCTGGAGTCTCCCAGACCCCCAGTAACAAGGTCACAGAGAAGGGAAAAGATGTAGAGCTCAGGTGTGATCCAATTTCAGGTCATACTGCCCTTTACTGGTACCGACAAAGCCTGGGGCAGGGCCCAGAGTTTCTAATTTACTTCCAAGGCACGGGTGCGGCAGATGACTCAGGGCTGCCCAAAGATCGGTTCTTTGCAGTCAGGCCTGAGGGATCCGTCTCTACTCTGAAGATCCAGCGCACAGAGCAGGGGGACTCAGCCGTGTATCTCCGTGCCAGCAGCTTAAC', 'GGTGCTGGAGTCTCCCAGACCCCCAGTAACAAGGTCACAGAGAAGGGAAAAGATGTAGAGCTCAGGTGTGATCCAATTTCAGGTCATACTGCCCTTTACTGGTACCGACAAAGCCTGGGGCAGGGCCCAGAGTTTCTAATTTACTTCCAAGGCACGGGTGCGGCAGATGACTCAGGGCTGCCCAAAGATCGGTTCTTTGCAGTCAGGCCTGAGGGATCCGTCTCTACTCTGAAGATCCAGCGCACAGAGCAGGGGGACTCAGCCGCGTATCTCCGTGCCAGCAGCTTAAC', 'GGTGCTGGAGTCTCCCAGACCCCCAGTAACAAGGTCACAGAGAAGGGAAAATATGTAGAGCTCAGGTGTGATCCAATTTCAGGTCATACTGCCCTTTACTGGTACCGACAAAGCCTGGGGCAGGGCCCAGAGTTTCTAATTTACTTCCAAGGCACGGGTGCGGCAGATGACTCAGGGCTGCCCAACGATCGGTTCTTTGCAGTCAGGCCTGAGGGATCCGTCTCTACTCTGAAGATCCAGCGCACAGAGCGGGGGGACTCTGCCGTGTATCTCTGTGCCAGCAGCTTAAC', 'GGTGCTGGAGTCTCCCAGTCCCCAAGGTACAAAGTCGCAAAGAGGGGACGGGATGTAGCTCTCAGGTGTGATTCAATTTCGGGTCATGTAACCCTTTATTGGTACCGACAGACCCTGGGGCAGGGCTCAGAGGTTCTGACTTACTCCCAGAGTGATGCTCAACGAGACAAATCAGGGCGGCCCAGTGGTCGGTTCTCTGCAGAGAGGCCTGAGAGATCCGTCTCCACTCTGAAGATCCAGCGCACAGAGCAGGGGGACTCAGCTGTGTATCTCTGTGCCAGCAGCTTAGC', 'GGTGCTGGAGTCTCCCAGTCTCCCAGGTACAAAGTCACAAAGAGGGGACAGGATGTAGCTCTCAGGTGTGATCCAATTTCGGGTCATGTATCCCTTTATTGGTACCGACAGGCCCTGGGGCAGGGCCCAGAGTTTCTGACTTACTTCAATTATGAAGCCCAACAAGACAAATCAGGGCTGCCCAATGATCGGTTCTCTGCAGAGAGGCCTGAGGGATCCATCTCCACTCTGACGATCCAGCGCACAGAGCAGCGGGACTCGGCCATGTATCGCTGTGCCAGCAGCTTAGC', 'GGTGCTGGAGTCTCCCAGTCTCCCAGGTACAAAGTCACAAAGAGGGGACAGGATGTAACTCTCAGGTGTGATCCAATTTCGAGTCATGCAACCCTTTATTGGTATCAACAGGCCCTGGGGCAGGGCCCAGAGTTTCTGACTTACTTCAATTATGAAGCTCAACCAGACAAATCAGGGCTGCCCAGTGATCGGTTCTCTGCAGAGAGGCCTGAGGGATCCATCTCCACTCTGACGATTCAGCGCACAGAGCAGCGGGACTCAGCCATGTATCGCTGTGCCAGCAGCTTAGC', 'GGTGCTGGAGTCTCCCAGTCTCCCAGGTACAAAGTCACAAAGAGGGGACAGGATGTAACTCTCAGGTGTGATCCAATTTCGAGTCATGCAACCCTTTATTGGTATCAACAGGCCCTGGGGCAGGGCCCAGAGTTTCTGACTTACTTCAATTATGAAGCTCAACCAGACAAATCAGGGCTGCCCAGTGATCGGTTCTCTGCAGAGAGGCCTGAGGGATCCATCTCCACTCTGACGATTCAGCGCACAGAGCAGCGGGACTCAGCCATGTATCGCTGTGCTAGCAGCTTAGC', 'GGTGCTGGAGTCTCCCAGTCCCCTAGGTACAAAGTCGCAAAGAGAGGACAGGATGTAGCTCTCAGGTGTGATCCAATTTCGGGTCATGTATCCCTTTTTTGGTACCAACAGGCCCTGGGGCAGGGGCCAGAGTTTCTGACTTATTTCCAGAATGAAGCTCAACTAGACAAATCGGGGCTGCCCAGTGATCGCTTCTTTGCAGAAAGGCCTGAGGGATCCGTCTCCACTCTGAAGATCCAGCGCACACAGCAGGAGGACTCCGCCGTGTATCTCTGTGCCAGCAGCTTAGC', 'GGTGCTGGAGTCTCCCAGTCCCCTAGGTACAAAGTCGCAAAGAGAGGACAGGATGTAGCTCTCAGGTGTGATCCAATTTCGGGTCATGTATCCCTTTTTTGGTACCAACAGGCCCTGGGGCAGGGGCCAGAGTTTCTGACTTATTTCCAGAATGAAGCTCAACTAGACAAATCGGGGCTGCCCAGTGATCGCTTCTTTGCAGAAAGGCCTGAGGGATCCGTCTCCACTCTGAAGATCCAGCGCACACAGAAGGAGGACTCCGCCGTGTATCTCTGTGCCAGCAGCTTAGC', 'GATACTGGAGTCTCCCAGAACCCCAGACACAAGATCACAAAGAGGGGACAGAATGTAACTTTCAGGTGTGATCCAATTTCTGAACACAACCGCCTTTATTGGTACCGACAGACCCTGGGGCAGGGCCCAGAGTTTCTGACTTACTTCCAGAATGAAGCTCAACTAGAAAAATCAAGGCTGCTCAGTGATCGGTTCTCTGCAGAGAGGCCTAAGGGATCTTTCTCCACCTTGGAGATCCAGCGCACAGAGCAGGGGGACTCGGCCATGTATCTCTGTGCCAGCAGCTTAGC', 'ATATCTGGAGTCTCCCACAACCCCAGACACAAGATCACAAAGAGGGGACAGAATGTAACTTTCAGGTGTGATCCAATTTCTGAACACAACCGCCTTTATTGGTACCGACAGAACCCTGGGCAGGGCCCAGAGTTTCTGACTTACTTCCAGAATGAAGCTCAACTGGAAAAATCAGGGCTGCTCAGTGATCGGATCTCTGCAGAGAGGCCTAAGGGATCTTTCTCCACCTTGGAGATCCAGCGCACAGAGCAGGGGGACTCGGCCATGTATCTCTGTGCCAGCAGCTTAGC', 'GATACTGGAGTCTCCCAGAACCCCAGACACAAGATCACAAAGAGGGGACAGAATGTAACTTTCAGGTGTGATCCAATTTCTGAACACAACCGCCTTTATTGGTACCGACAGACCCTGGGGCAGGGCCCAGAGTTTCTGACTTACTTCCAGAATGAAGCTCAACTAGAAAAATCAAGGCTGCTCAGTGATCGGTTCTCTGCAGAGAGGCCTAAGGGATCTCTCTCCACCTTGGAGATCCAGCGCACAGAGCAGGGGGACTCGGCCATGTATCTCTGTGCCAGCAGCTTAGC', 'GATACTGGAGTCTCCCAGAACCCCAGACACAAGATCACAAAGAGGGGACAGAATGTAACTTTCAGGTGTGATCCAATTTCTGAACACAACCGCCTTTATTGGTACCGACAGACCCTGGGGCAGGGCCCAGAGTTTCTGACTTACTTCCAGAATGAAGCTCAACTAGAAAAATCAAGGCTGCTCAGTGATCGGTTCTCTGCAGAGAGGCCTAAGGGATCTCTTTCCACCTTGGAGATCCAGCGCACAGAGCAGGGGGACTCGGCCATGTATCTCTGTGCCAGCAGCTTAGC', 'CACAACCGCCTTTATTGGTACCGACAGACCCTGGGGCAGGGCCCAGAGTTTCTGACTTACTTCCAGAATGAAGCTCAACTAGAAAAATCAAGGCTGCTCAGTGATCGGTTCTCTGCAGAGAGGCCTAAGGGATCTTTCTCCACCTTGGAGATCCAGCGCACAGAGGAGGGGGACTCGGCCATGTATCTCTGTGCCAGCAGCTTAGC', 'GATTCTGGAGTCACACAAACCCCAAAGCACCTGATCACAGCAACTGGACAGCGAGTGACGCTGAGATGCTCCCCTAGGTCTGGAGACCTCTCTGTGTACTGGTACCAACAGAGCCTGGACCAGGGCCTCCAGTTCCTCATTCAGTATTATAATGGAGAAGAGAGAGCAAAAGGAAACATTCTTGAACGATTCTCCGCACAACAGTTCCCTGACTTGCACTCTGAACTAAACCTGAGCTCTCTGGAGCTGGGGGACTCAGCTTTGTATTTCTGTGCCAGCAGCGTAG'], dtype=object)
- nickname :
- v_choice
- event_type :
- GeneChoice
- seq_type :
- V_gene
- seq_side :
- Undefined_side
- priority :
- 7
- parents :
- []
- childs :
- ['v_3_del', 'd_gene', 'j_choice']
[29]:
mdl_hb.ErrorRate_dict
[29]:
{'error_type': 'SingleErrorRate', 'error_values': '0.000396072'}
New models¶
Pygor has a methods to create default VDJ and VJ models from a dataframes.
[30]:
new_V_gene_dict = {
'name':'my_pseudo_TRBV',
'value':'AAACCCTTTGGGACCCAGAGCCCAAGACACAAGATCACAGAGACAGGAAGGCAGGTGACCTTGGCGTGTCACCAGACTTGGAACCACAACAATATGTTCTGGTATCGACAAGACCTGGGACATGGGCTGAGGCTGATCCATTACTCATATGGTGTTCACGACACTAACAAAGGAGAAGTCTCAGATGGCTACAGTGTCTCTAGATCAAACACAGAGGACCTCCCCCTCACTCTGTAGTCTGCTGCCTCCTCCCAGACATCTGTATATTTCTGCGCCAGCAGTGAGTC',
'anchor_index': 270
}
[31]:
df_V = genomic_dict['V'].loc[10:15]
df_V = df_V.append(new_V_gene_dict, ignore_index=True)
df_V.index.name='id'
df_V
[31]:
| name | value | anchor_index | |
|---|---|---|---|
| id | |||
| 0 | TRBV11-2*03 | GAAGCTGGAGTTGCCCAGTCTCCCAGATATAAGATTATAGAGAAAA... | 273.0 |
| 1 | TRBV11-3*01 | GAAGCTGGAGTGGTTCAGTCTCCCAGATATAAGATTATAGAGAAAA... | 273.0 |
| 2 | TRBV12-1*01 | GCTGGTGTTATCCAGTCACCCAGGCACAAAGTGACAGAGATGGGAC... | 270.0 |
| 3 | TRBV12-2*01 | GCTGGCATTATCCAGTCACCCAAGCATGAGGTGACAGAAATGGGAC... | 270.0 |
| 4 | TRBV12-3*01 | GATGCTGGAGTTATCCAGTCACCCCGCCATGAGGTGACAGAGATGG... | 273.0 |
| 5 | TRBV12-4*01 | GATGCTGGAGTTATCCAGTCACCCCGGCACGAGGTGACAGAGATGG... | 273.0 |
| 6 | my_pseudo_TRBV | AAACCCTTTGGGACCCAGAGCCCAAGACACAAGATCACAGAGACAG... | 270.0 |
Now we can use this new genomic templates to create a new model
[32]:
new_mdl_0 = p3.IgorModel.make_default_VDJ(df_V, genomic_dict['D'], genomic_dict['J'])
new_mdl_0
[32]:
<pygor3.IgorIO.IgorModel at 0x7fac17fdd490>
[33]:
new_mdl_0.plot_Event('v_choice')
[33]:
(<Figure size 1296x1080 with 1 Axes>,
<AxesSubplot:title={'center':'$P($v_choice$)$'}>)
By default the new model will initiate with a uniform probability, that can be used to infer a new model from data
Exporting a Model¶
Conditional and marginal probabilities can be exported as pdf files with plots
[34]:
fln_output_prefix = "mdl_hb"
mdl_hb.export_plot_Pconditionals(fln_output_prefix+"_CP")
[35]:
mdl_hb.export_plot_Pmarginals(fln_output_prefix+"_MP")
A model can be exported in IGoR’s format with write_model method
[36]:
new_mdl_0.write_model('new_model_parms.txt', 'new_model_marginals.txt',
fln_V_gene_CDR3_anchors='new_V_anchors.csv',
fln_J_gene_CDR3_anchors='new_J_anchors.csv')
Writing model parms in file new_model_parms.txt
Writing model marginals in file new_model_marginals.txt
Writing gene anchor's in file new_V_anchors.csv
Writing gene anchor's in file new_J_anchors.csv
[37]:
!head new_model_parms.txt
@Event_list
#GeneChoice;V_gene;Undefined_side;7;v_choice
%TRBV11-2*03;GAAGCTGGAGTTGCCCAGTCTCCCAGATATAAGATTATAGAGAAAAGGCAGAGTGTGGCTTTTTGGTGCAATCCTATATCTGGCCATGCTACCCTTTACTGGTACCAGCAGATCCTGGGACAGGGCCCAAAGCTTCTGATTCAGTTTCAGAATAACGGTGTAGTGGATGATTCACAGTTGCCTAAGGATCGATTTTCTGCAGAGAGGCTCAAAGGAGTAGACTCCACTCTCAAGATCCAACCTGCAAAGCTTGAGGACTCGGCCGTGTATCTCTGTGCCAGCAGCTTAGA;0
%TRBV11-3*01;GAAGCTGGAGTGGTTCAGTCTCCCAGATATAAGATTATAGAGAAAAAACAGCCTGTGGCTTTTTGGTGCAATCCTATTTCTGGCCACAATACCCTTTACTGGTACCTGCAGAACTTGGGACAGGGCCCGGAGCTTCTGATTCGATATGAGAATGAGGAAGCAGTAGACGATTCACAGTTGCCTAAGGATCGATTTTCTGCAGAGAGGCTCAAAGGAGTAGACTCCACTCTCAAGATCCAGCCTGCAGAGCTTGGGGACTCGGCCGTGTATCTCTGTGCCAGCAGCTTAGA;1
%TRBV12-1*01;GCTGGTGTTATCCAGTCACCCAGGCACAAAGTGACAGAGATGGGACAATCAGTAACTCTGAGATGCGAACCAATTTCAGGCCACAATGATCTTCTCTGGTACAGACAGACCTTTGTGCAGGGACTGGAATTGCTGAATTACTTCTGCAGCTGGACCCTCGTAGATGACTCAGGAGTGTCCAAGGATTGATTCTCAGCACAGATGCCTGATGTATCATTCTCCACTCTGAGGATCCAGCCCATGGAACCCAGGGACTTGGGCCTATATTTCTGTGCCAGCAGCTTTGC;2
%TRBV12-2*01;GCTGGCATTATCCAGTCACCCAAGCATGAGGTGACAGAAATGGGACAAACAGTGACTCTGAGATGTGAGCCAATTTTTGGCCACAATTTCCTTTTCTGGTACAGAGATACCTTCGTGCAGGGACTGGAATTGCTGAGTTACTTCCGGAGCTGATCTATTATAGATAATGCAGGTATGCCCACAGAGCGATTCTCAGCTGAGAGGCCTGATGGATCATTCTCTACTCTGAAGATCCAGCCTGCAGAGCAGGGGGACTCGGCCGTGTATGTCTGTGCAAGTCGCTTAGC;3
%TRBV12-3*01;GATGCTGGAGTTATCCAGTCACCCCGCCATGAGGTGACAGAGATGGGACAAGAAGTGACTCTGAGATGTAAACCAATTTCAGGCCACAACTCCCTTTTCTGGTACAGACAGACCATGATGCGGGGACTGGAGTTGCTCATTTACTTTAACAACAACGTTCCGATAGATGATTCAGGGATGCCCGAGGATCGATTCTCAGCTAAGATGCCTAATGCATCATTCTCCACTCTGAAGATCCAGCCCTCAGAACCCAGGGACTCAGCTGTGTACTTCTGTGCCAGCAGTTTAGC;4
%TRBV12-4*01;GATGCTGGAGTTATCCAGTCACCCCGGCACGAGGTGACAGAGATGGGACAAGAAGTGACTCTGAGATGTAAACCAATTTCAGGACACGACTACCTTTTCTGGTACAGACAGACCATGATGCGGGGACTGGAGTTGCTCATTTACTTTAACAACAACGTTCCGATAGATGATTCAGGGATGCCCGAGGATCGATTCTCAGCTAAGATGCCTAATGCATCATTCTCCACTCTGAAGATCCAGCCCTCAGAACCCAGGGACTCAGCTGTGTACTTCTGTGCCAGCAGTTTAGC;5
%my_pseudo_TRBV;AAACCCTTTGGGACCCAGAGCCCAAGACACAAGATCACAGAGACAGGAAGGCAGGTGACCTTGGCGTGTCACCAGACTTGGAACCACAACAATATGTTCTGGTATCGACAAGACCTGGGACATGGGCTGAGGCTGATCCATTACTCATATGGTGTTCACGACACTAACAAAGGAGAAGTCTCAGATGGCTACAGTGTCTCTAGATCAAACACAGAGGACCTCCCCCTCACTCTGTAGTCTGCTGCCTCCTCCCAGACATCTGTATATTTCTGCGCCAGCAGTGAGTC;6
#GeneChoice;J_gene;Undefined_side;7;j_choice
[38]:
!head new_V_anchors.csv
gene;anchor_index
TRBV11-2*03;273
TRBV11-3*01;273
TRBV12-1*01;270
TRBV12-2*01;270
TRBV12-3*01;273
TRBV12-4*01;273
my_pseudo_TRBV;270
[39]:
mdl_0 = p3.IgorModel('new_model_parms.txt', 'new_model_marginals.txt',
fln_V_gene_CDR3_anchors='new_V_anchors.csv',
fln_J_gene_CDR3_anchors='new_J_anchors.csv')
mdl_0['v_choice']
Reading Parms filename from: new_model_parms.txt
Reading Marginals filename from: new_model_marginals.txt
[39]:
<xarray.DataArray (v_choice: 7)>
array([0.14285714, 0.14285714, 0.14285714, 0.14285714, 0.14285714,
0.14285714, 0.14285714])
Coordinates:
* v_choice (v_choice) int64 0 1 2 3 4 5 6
lbl__v_choice (v_choice) object 'TRBV11-2*03' ... 'my_pseudo_TRBV'
seq__v_choice (v_choice) object 'GAAGCTGGAGTTGCCCAGTCTCCCAGATATAAGATTATA...
Attributes:
nickname: v_choice
event_type: GeneChoice
seq_type: V_gene
seq_side: Undefined_side
priority: 7
parents: []
childs: ['j_choice', 'd_gene', 'v_3_del']- v_choice: 7
- 0.1429 0.1429 0.1429 0.1429 0.1429 0.1429 0.1429
array([0.14285714, 0.14285714, 0.14285714, 0.14285714, 0.14285714, 0.14285714, 0.14285714]) - v_choice(v_choice)int640 1 2 3 4 5 6
array([0, 1, 2, 3, 4, 5, 6])
- lbl__v_choice(v_choice)object'TRBV11-2*03' ... 'my_pseudo_TRBV'
array(['TRBV11-2*03', 'TRBV11-3*01', 'TRBV12-1*01', 'TRBV12-2*01', 'TRBV12-3*01', 'TRBV12-4*01', 'my_pseudo_TRBV'], dtype=object) - seq__v_choice(v_choice)object'GAAGCTGGAGTTGCCCAGTCTCCCAGATATA...
array(['GAAGCTGGAGTTGCCCAGTCTCCCAGATATAAGATTATAGAGAAAAGGCAGAGTGTGGCTTTTTGGTGCAATCCTATATCTGGCCATGCTACCCTTTACTGGTACCAGCAGATCCTGGGACAGGGCCCAAAGCTTCTGATTCAGTTTCAGAATAACGGTGTAGTGGATGATTCACAGTTGCCTAAGGATCGATTTTCTGCAGAGAGGCTCAAAGGAGTAGACTCCACTCTCAAGATCCAACCTGCAAAGCTTGAGGACTCGGCCGTGTATCTCTGTGCCAGCAGCTTAGA', 'GAAGCTGGAGTGGTTCAGTCTCCCAGATATAAGATTATAGAGAAAAAACAGCCTGTGGCTTTTTGGTGCAATCCTATTTCTGGCCACAATACCCTTTACTGGTACCTGCAGAACTTGGGACAGGGCCCGGAGCTTCTGATTCGATATGAGAATGAGGAAGCAGTAGACGATTCACAGTTGCCTAAGGATCGATTTTCTGCAGAGAGGCTCAAAGGAGTAGACTCCACTCTCAAGATCCAGCCTGCAGAGCTTGGGGACTCGGCCGTGTATCTCTGTGCCAGCAGCTTAGA', 'GCTGGTGTTATCCAGTCACCCAGGCACAAAGTGACAGAGATGGGACAATCAGTAACTCTGAGATGCGAACCAATTTCAGGCCACAATGATCTTCTCTGGTACAGACAGACCTTTGTGCAGGGACTGGAATTGCTGAATTACTTCTGCAGCTGGACCCTCGTAGATGACTCAGGAGTGTCCAAGGATTGATTCTCAGCACAGATGCCTGATGTATCATTCTCCACTCTGAGGATCCAGCCCATGGAACCCAGGGACTTGGGCCTATATTTCTGTGCCAGCAGCTTTGC', 'GCTGGCATTATCCAGTCACCCAAGCATGAGGTGACAGAAATGGGACAAACAGTGACTCTGAGATGTGAGCCAATTTTTGGCCACAATTTCCTTTTCTGGTACAGAGATACCTTCGTGCAGGGACTGGAATTGCTGAGTTACTTCCGGAGCTGATCTATTATAGATAATGCAGGTATGCCCACAGAGCGATTCTCAGCTGAGAGGCCTGATGGATCATTCTCTACTCTGAAGATCCAGCCTGCAGAGCAGGGGGACTCGGCCGTGTATGTCTGTGCAAGTCGCTTAGC', 'GATGCTGGAGTTATCCAGTCACCCCGCCATGAGGTGACAGAGATGGGACAAGAAGTGACTCTGAGATGTAAACCAATTTCAGGCCACAACTCCCTTTTCTGGTACAGACAGACCATGATGCGGGGACTGGAGTTGCTCATTTACTTTAACAACAACGTTCCGATAGATGATTCAGGGATGCCCGAGGATCGATTCTCAGCTAAGATGCCTAATGCATCATTCTCCACTCTGAAGATCCAGCCCTCAGAACCCAGGGACTCAGCTGTGTACTTCTGTGCCAGCAGTTTAGC', 'GATGCTGGAGTTATCCAGTCACCCCGGCACGAGGTGACAGAGATGGGACAAGAAGTGACTCTGAGATGTAAACCAATTTCAGGACACGACTACCTTTTCTGGTACAGACAGACCATGATGCGGGGACTGGAGTTGCTCATTTACTTTAACAACAACGTTCCGATAGATGATTCAGGGATGCCCGAGGATCGATTCTCAGCTAAGATGCCTAATGCATCATTCTCCACTCTGAAGATCCAGCCCTCAGAACCCAGGGACTCAGCTGTGTACTTCTGTGCCAGCAGTTTAGC', 'AAACCCTTTGGGACCCAGAGCCCAAGACACAAGATCACAGAGACAGGAAGGCAGGTGACCTTGGCGTGTCACCAGACTTGGAACCACAACAATATGTTCTGGTATCGACAAGACCTGGGACATGGGCTGAGGCTGATCCATTACTCATATGGTGTTCACGACACTAACAAAGGAGAAGTCTCAGATGGCTACAGTGTCTCTAGATCAAACACAGAGGACCTCCCCCTCACTCTGTAGTCTGCTGCCTCCTCCCAGACATCTGTATATTTCTGCGCCAGCAGTGAGTC'], dtype=object)
- nickname :
- v_choice
- event_type :
- GeneChoice
- seq_type :
- V_gene
- seq_side :
- Undefined_side
- priority :
- 7
- parents :
- []
- childs :
- ['j_choice', 'd_gene', 'v_3_del']
IgorModel can also be exported in separated csv files
[40]:
mdl_0.export_csv('initial_')
[41]:
!head initial_P__insertions.csv
Insertions;P(vd_ins);P(dj_ins)
0;0.024390243902439025;0.024390243902439025
1;0.024390243902439025;0.024390243902439025
2;0.024390243902439025;0.024390243902439025
3;0.024390243902439025;0.024390243902439025
4;0.024390243902439025;0.024390243902439025
5;0.024390243902439025;0.024390243902439025
6;0.024390243902439025;0.024390243902439025
7;0.024390243902439025;0.024390243902439025
8;0.024390243902439025;0.024390243902439025
[42]:
!head initial_P__j_choice__G__v_choice.csv
;TRBJ1-1*01;TRBJ1-2*01;TRBJ1-3*01;TRBJ1-4*01;TRBJ1-5*01;TRBJ1-6*01;TRBJ1-6*02;TRBJ2-1*01;TRBJ2-2*01;TRBJ2-3*01;TRBJ2-4*01;TRBJ2-5*01;TRBJ2-6*01;TRBJ2-7*01;TRBJ2-7*02
TRBV11-2*03;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667
TRBV11-3*01;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667
TRBV12-1*01;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667
TRBV12-2*01;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667
TRBV12-3*01;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667
TRBV12-4*01;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667
my_pseudo_TRBV;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667;0.06666666666666667
Entropy¶
$ H = - \sumx P(x) :nbsphinx-math:`log`(P(x)) $
[43]:
import numpy as np
-np.dot(mdl_hb['v_choice'], np.nan_to_num(np.log2(mdl_hb['v_choice']), neginf=0, nan=0) )
/home/olivares/anaconda3/envs/statbiophys-dev/lib/python3.7/site-packages/xarray/core/computation.py:739: RuntimeWarning: divide by zero encountered in log2
result_data = func(*input_data)
[43]:
5.252905287497762
[44]:
mdl_hb.get_entropy_event('v_choice')
[44]:
<xarray.DataArray ()> array(5.25291424)
- 5.253
array(5.25291424)
Mutual Information¶
[45]:
I_V_J = mdl_hb.get_mutual_information_events('v_choice', 'j_choice')
I_V_J
/home/olivares/anaconda3/envs/statbiophys-dev/lib/python3.7/site-packages/xarray/core/computation.py:739: RuntimeWarning: divide by zero encountered in log2
result_data = func(*input_data)
[45]:
<xarray.DataArray ()> array(0.0779008)
- 0.0779
array(0.0779008)
[46]:
da_mi = mdl_hb.get_mutual_information()
da_mi
/home/olivares/anaconda3/envs/statbiophys-dev/lib/python3.7/site-packages/xarray/core/computation.py:739: RuntimeWarning: divide by zero encountered in log2
result_data = func(*input_data)
/home/olivares/anaconda3/envs/statbiophys-dev/lib/python3.7/site-packages/xarray/core/computation.py:739: RuntimeWarning: divide by zero encountered in log2
result_data = func(*input_data)
/home/olivares/anaconda3/envs/statbiophys-dev/lib/python3.7/site-packages/xarray/core/computation.py:739: RuntimeWarning: divide by zero encountered in log2
result_data = func(*input_data)
/home/olivares/anaconda3/envs/statbiophys-dev/lib/python3.7/site-packages/xarray/core/computation.py:739: RuntimeWarning: divide by zero encountered in log2
result_data = func(*input_data)
/home/olivares/anaconda3/envs/statbiophys-dev/lib/python3.7/site-packages/xarray/core/computation.py:739: RuntimeWarning: divide by zero encountered in log2
result_data = func(*input_data)
/home/olivares/anaconda3/envs/statbiophys-dev/lib/python3.7/site-packages/xarray/core/computation.py:739: RuntimeWarning: divide by zero encountered in log2
result_data = func(*input_data)
/home/olivares/anaconda3/envs/statbiophys-dev/lib/python3.7/site-packages/xarray/core/computation.py:739: RuntimeWarning: divide by zero encountered in log2
result_data = func(*input_data)
/home/olivares/anaconda3/envs/statbiophys-dev/lib/python3.7/site-packages/xarray/core/computation.py:739: RuntimeWarning: divide by zero encountered in log2
result_data = func(*input_data)
[46]:
<xarray.DataArray 'mutual_information' (x: 9, y: 9)>
array([[0.00000000e+00, 2.74757454e-02, 7.79007955e-02, 3.34244341e-01,
2.42224927e-03, 3.84618632e-03, 3.11240332e-03, 0.00000000e+00,
0.00000000e+00],
[2.74757454e-02, 0.00000000e+00, 1.54509182e-01, 7.40373019e-04,
1.50139212e-01, 2.66774661e-01, 4.92915719e-03, 0.00000000e+00,
0.00000000e+00],
[7.79007955e-02, 1.54509182e-01, 0.00000000e+00, 7.48363481e-04,
9.41603438e-03, 8.72144895e-03, 3.15006638e-01, 0.00000000e+00,
0.00000000e+00],
[3.34244341e-01, 7.40373019e-04, 7.48363481e-04, 0.00000000e+00,
5.45436417e-05, 7.33990325e-05, 2.35845118e-05, 0.00000000e+00,
0.00000000e+00],
[2.42224927e-03, 1.50139212e-01, 9.41603438e-03, 5.45436417e-05,
0.00000000e+00, 4.49012987e-01, 2.87003080e-04, 0.00000000e+00,
0.00000000e+00],
[3.84618632e-03, 2.66774661e-01, 8.72144895e-03, 7.33990325e-05,
4.49012987e-01, 0.00000000e+00, 2.74201841e-04, 0.00000000e+00,
0.00000000e+00],
[3.11240332e-03, 4.92915719e-03, 3.15006638e-01, 2.35845118e-05,
2.87003080e-04, 2.74201841e-04, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00],
[0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00],
[0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00]])
Coordinates:
* x (x) <U8 'v_choice' 'd_gene' 'j_choice' ... 'vd_ins' 'dj_ins'
* y (y) <U8 'v_choice' 'd_gene' 'j_choice' ... 'vd_ins' 'dj_ins'- x: 9
- y: 9
- 0.0 0.02748 0.0779 0.3342 0.002422 0.003846 ... 0.0 0.0 0.0 0.0 0.0
array([[0.00000000e+00, 2.74757454e-02, 7.79007955e-02, 3.34244341e-01, 2.42224927e-03, 3.84618632e-03, 3.11240332e-03, 0.00000000e+00, 0.00000000e+00], [2.74757454e-02, 0.00000000e+00, 1.54509182e-01, 7.40373019e-04, 1.50139212e-01, 2.66774661e-01, 4.92915719e-03, 0.00000000e+00, 0.00000000e+00], [7.79007955e-02, 1.54509182e-01, 0.00000000e+00, 7.48363481e-04, 9.41603438e-03, 8.72144895e-03, 3.15006638e-01, 0.00000000e+00, 0.00000000e+00], [3.34244341e-01, 7.40373019e-04, 7.48363481e-04, 0.00000000e+00, 5.45436417e-05, 7.33990325e-05, 2.35845118e-05, 0.00000000e+00, 0.00000000e+00], [2.42224927e-03, 1.50139212e-01, 9.41603438e-03, 5.45436417e-05, 0.00000000e+00, 4.49012987e-01, 2.87003080e-04, 0.00000000e+00, 0.00000000e+00], [3.84618632e-03, 2.66774661e-01, 8.72144895e-03, 7.33990325e-05, 4.49012987e-01, 0.00000000e+00, 2.74201841e-04, 0.00000000e+00, 0.00000000e+00], [3.11240332e-03, 4.92915719e-03, 3.15006638e-01, 2.35845118e-05, 2.87003080e-04, 2.74201841e-04, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]]) - x(x)<U8'v_choice' 'd_gene' ... 'dj_ins'
array(['v_choice', 'd_gene', 'j_choice', 'v_3_del', 'd_3_del', 'd_5_del', 'j_5_del', 'vd_ins', 'dj_ins'], dtype='<U8') - y(y)<U8'v_choice' 'd_gene' ... 'dj_ins'
array(['v_choice', 'd_gene', 'j_choice', 'v_3_del', 'd_3_del', 'd_5_del', 'j_5_del', 'vd_ins', 'dj_ins'], dtype='<U8')
[47]:
mdl_hb.plot_mutual_information(da_mi)
[47]:
<AxesSubplot:>
[48]:
event_nickname1 = 'v_choice'
event_nickname2 = 'j_choice'
mdl = p3.get_default_IgorModel("human", "tcr_beta")
da_P_x_y = mdl.get_P_joint([event_nickname1, event_nickname2])
da_P_x = mdl.Pmarginal[event_nickname1]
da_P_y = mdl.Pmarginal[event_nickname2]
da_P_x_times_P_y = (da_P_x*da_P_y)
da_P_x_times_P_y
da_log_P_ratio = xr.zeros_like(da_P_x_y)
da_log_P_ratio.values = np.nan_to_num(
np.log2(da_P_x_y / da_P_x_times_P_y), nan=0.0, neginf=0.0
)
# da_log_Value.values = np_log_Value
xr.dot( da_P_x_y, da_log_P_ratio )
Reading Parms filename from: /home/olivares/.local/share/igor/models/human/tcr_beta/models/model_parms.txt
Reading Marginals filename from: /home/olivares/.local/share/igor/models/human/tcr_beta/models/model_marginals.txt
/home/olivares/anaconda3/envs/statbiophys-dev/lib/python3.7/site-packages/xarray/core/computation.py:739: RuntimeWarning: divide by zero encountered in log2
result_data = func(*input_data)
[48]:
<xarray.DataArray ()> array(0.0779008)
- 0.0779
array(0.0779008)
[49]:
da_mi = mdl_hb.get_mutual_information()
/home/olivares/anaconda3/envs/statbiophys-dev/lib/python3.7/site-packages/xarray/core/computation.py:739: RuntimeWarning: divide by zero encountered in log2
result_data = func(*input_data)
/home/olivares/anaconda3/envs/statbiophys-dev/lib/python3.7/site-packages/xarray/core/computation.py:739: RuntimeWarning: divide by zero encountered in log2
result_data = func(*input_data)
/home/olivares/anaconda3/envs/statbiophys-dev/lib/python3.7/site-packages/xarray/core/computation.py:739: RuntimeWarning: divide by zero encountered in log2
result_data = func(*input_data)
/home/olivares/anaconda3/envs/statbiophys-dev/lib/python3.7/site-packages/xarray/core/computation.py:739: RuntimeWarning: divide by zero encountered in log2
result_data = func(*input_data)
/home/olivares/anaconda3/envs/statbiophys-dev/lib/python3.7/site-packages/xarray/core/computation.py:739: RuntimeWarning: divide by zero encountered in log2
result_data = func(*input_data)
/home/olivares/anaconda3/envs/statbiophys-dev/lib/python3.7/site-packages/xarray/core/computation.py:739: RuntimeWarning: divide by zero encountered in log2
result_data = func(*input_data)
/home/olivares/anaconda3/envs/statbiophys-dev/lib/python3.7/site-packages/xarray/core/computation.py:739: RuntimeWarning: divide by zero encountered in log2
result_data = func(*input_data)
/home/olivares/anaconda3/envs/statbiophys-dev/lib/python3.7/site-packages/xarray/core/computation.py:739: RuntimeWarning: divide by zero encountered in log2
result_data = func(*input_data)
[50]:
mdl_hb.plot_mutual_information(da_mi)
[50]:
<AxesSubplot:>
[51]:
da_mi = mdl_hb.get_mutual_information()
/home/olivares/anaconda3/envs/statbiophys-dev/lib/python3.7/site-packages/xarray/core/computation.py:739: RuntimeWarning: divide by zero encountered in log2
result_data = func(*input_data)
/home/olivares/anaconda3/envs/statbiophys-dev/lib/python3.7/site-packages/xarray/core/computation.py:739: RuntimeWarning: divide by zero encountered in log2
result_data = func(*input_data)
/home/olivares/anaconda3/envs/statbiophys-dev/lib/python3.7/site-packages/xarray/core/computation.py:739: RuntimeWarning: divide by zero encountered in log2
result_data = func(*input_data)
/home/olivares/anaconda3/envs/statbiophys-dev/lib/python3.7/site-packages/xarray/core/computation.py:739: RuntimeWarning: divide by zero encountered in log2
result_data = func(*input_data)
/home/olivares/anaconda3/envs/statbiophys-dev/lib/python3.7/site-packages/xarray/core/computation.py:739: RuntimeWarning: divide by zero encountered in log2
result_data = func(*input_data)
/home/olivares/anaconda3/envs/statbiophys-dev/lib/python3.7/site-packages/xarray/core/computation.py:739: RuntimeWarning: divide by zero encountered in log2
result_data = func(*input_data)
/home/olivares/anaconda3/envs/statbiophys-dev/lib/python3.7/site-packages/xarray/core/computation.py:739: RuntimeWarning: divide by zero encountered in log2
result_data = func(*input_data)
/home/olivares/anaconda3/envs/statbiophys-dev/lib/python3.7/site-packages/xarray/core/computation.py:739: RuntimeWarning: divide by zero encountered in log2
result_data = func(*input_data)
[52]:
mdl_hb.plot_mutual_information(da_mi)
[52]:
<AxesSubplot:>
As can be seen in the Bayesian network, the number of deletions in V depends on choosen V
[ ]:
Select events by probability¶
Here we can see which combinations of V and D are not possible, for our model
[53]:
Pjoint_V_D = mdl_hb.get_P_joint(['v_choice', 'd_gene'])
da_tmp = Pjoint_V_D.where(Pjoint_V_D == 0)
df = da_tmp.to_dataframe('P_joint_V_D').dropna()
df #.to_csv('bibibi.csv', sep=';')
[53]:
| lbl__v_choice | seq__v_choice | lbl__d_gene | seq__d_gene | P_joint_V_D | ||
|---|---|---|---|---|---|---|
| v_choice | d_gene | |||||
| 22 | 0 | TRBV15*03 | GATGCCATGGTCATCCAGAACCCAAGATACCGGGTTACCCAGTTTG... | TRBD1*01 | GGGACAGGGGGC | 0.0 |
| 1 | TRBV15*03 | GATGCCATGGTCATCCAGAACCCAAGATACCGGGTTACCCAGTTTG... | TRBD2*01 | GGGACTAGCGGGGGGG | 0.0 | |
| 2 | TRBV15*03 | GATGCCATGGTCATCCAGAACCCAAGATACCGGGTTACCCAGTTTG... | TRBD2*02 | GGGACTAGCGGGAGGG | 0.0 | |
| 24 | 0 | TRBV17*01 | GAGCCTGGAGTCAGCCAGACCCCCAGACACAAGGTCACCAACATGG... | TRBD1*01 | GGGACAGGGGGC | 0.0 |
| 1 | TRBV17*01 | GAGCCTGGAGTCAGCCAGACCCCCAGACACAAGGTCACCAACATGG... | TRBD2*01 | GGGACTAGCGGGGGGG | 0.0 | |
| 2 | TRBV17*01 | GAGCCTGGAGTCAGCCAGACCCCCAGACACAAGGTCACCAACATGG... | TRBD2*02 | GGGACTAGCGGGAGGG | 0.0 | |
| 33 | 1 | TRBV26*01 | GATGCTGTAGTTACACAATTCCCAAGACACAGAATCATTGGGACAG... | TRBD2*01 | GGGACTAGCGGGGGGG | 0.0 |
| 2 | TRBV26*01 | GATGCTGTAGTTACACAATTCCCAAGACACAGAATCATTGGGACAG... | TRBD2*02 | GGGACTAGCGGGAGGG | 0.0 | |
| 67 | 0 | TRBV6-9*01 | AATGCTGGTGTCACTCAGACCCCAAAATTCCACATCCTGAAGACAG... | TRBD1*01 | GGGACAGGGGGC | 0.0 |
| 2 | TRBV6-9*01 | AATGCTGGTGTCACTCAGACCCCAAAATTCCACATCCTGAAGACAG... | TRBD2*02 | GGGACTAGCGGGAGGG | 0.0 | |
| 71 | 0 | TRBV7-2*03 | GCTGGAGTCTCCCAGTCCCCCAGTAACAAGGTCACAGAGAAGGGAA... | TRBD1*01 | GGGACAGGGGGC | 0.0 |
| 1 | TRBV7-2*03 | GCTGGAGTCTCCCAGTCCCCCAGTAACAAGGTCACAGAGAAGGGAA... | TRBD2*01 | GGGACTAGCGGGGGGG | 0.0 | |
| 2 | TRBV7-2*03 | GCTGGAGTCTCCCAGTCCCCCAGTAACAAGGTCACAGAGAAGGGAA... | TRBD2*02 | GGGACTAGCGGGAGGG | 0.0 | |
| 74 | 0 | TRBV7-3*02 | GGTGCTGGAGTCTCCCAGACCCCCAGTAACAAGGTCACAGAGAAGG... | TRBD1*01 | GGGACAGGGGGC | 0.0 |
| 1 | TRBV7-3*02 | GGTGCTGGAGTCTCCCAGACCCCCAGTAACAAGGTCACAGAGAAGG... | TRBD2*01 | GGGACTAGCGGGGGGG | 0.0 | |
| 2 | TRBV7-3*02 | GGTGCTGGAGTCTCCCAGACCCCCAGTAACAAGGTCACAGAGAAGG... | TRBD2*02 | GGGACTAGCGGGAGGG | 0.0 | |
| 75 | 0 | TRBV7-3*03 | GGTGCTGGAGTCTCCCAGACCCCCAGTAACAAGGTCACAGAGAAGG... | TRBD1*01 | GGGACAGGGGGC | 0.0 |
| 1 | TRBV7-3*03 | GGTGCTGGAGTCTCCCAGACCCCCAGTAACAAGGTCACAGAGAAGG... | TRBD2*01 | GGGACTAGCGGGGGGG | 0.0 | |
| 2 | TRBV7-3*03 | GGTGCTGGAGTCTCCCAGACCCCCAGTAACAAGGTCACAGAGAAGG... | TRBD2*02 | GGGACTAGCGGGAGGG | 0.0 |
[54]:
mdl_hb['v_choice'].loc[22] #, mdl_hb['v_choice'].loc[67]
[54]:
<xarray.DataArray ()>
array(0.)
Coordinates:
v_choice int64 22
lbl__v_choice object 'TRBV15*03'
seq__v_choice object 'GATGCCATGGTCATCCAGAACCCAAGATACCGGGTTACCCAGTTTGGAAA...
Attributes:
nickname: v_choice
event_type: GeneChoice
seq_type: V_gene
seq_side: Undefined_side
priority: 7
parents: []
childs: ['v_3_del', 'd_gene', 'j_choice']- 0.0
array(0.)
- v_choice()int6422
array(22)
- lbl__v_choice()object'TRBV15*03'
array('TRBV15*03', dtype=object) - seq__v_choice()object'GATGCCATGGTCATCCAGAACCCAAGATACC...
array('GATGCCATGGTCATCCAGAACCCAAGATACCGGGTTACCCAGTTTGGAAAGCCAGTGACCCTGAGTTGTTCTCAGACTTTGAACCATAACGTCATGTACTGGTACCAGCAGAAGTCAAGTCAGGCCCCAAAGCTGCTGTTCCACTACTATAACAAAGATTTTAACAATGAAGCAGACACCCCTGATAACTTCCAATCCAGGAGGCCGAACACTTCTTTCTGCTTTCTAGACATCCGCTCACCAGGCCTGGGGGACGCAGCCATGTACCAGTGTGCCACCAGCAGAGA', dtype=object)
- nickname :
- v_choice
- event_type :
- GeneChoice
- seq_type :
- V_gene
- seq_side :
- Undefined_side
- priority :
- 7
- parents :
- []
- childs :
- ['v_3_del', 'd_gene', 'j_choice']
[55]:
mdl_hb.genomic_dataframe_dict['V'].loc[22]
[55]:
name TRBV15*03
value GATGCCATGGTCATCCAGAACCCAAGATACCGGGTTACCCAGTTTG...
anchor_index 270.0
Name: 22, dtype: object
[ ]:
Use a default IGoR’s model to generate sequences¶
[56]:
df_seqs = p3.generate(10, mdl_hb)
df_seqs
Writing model parms in file ./igor_generating_3z_y24jv/dataIGoRBTnea5JaoG_mdldata/models/model_parms.txt
Writing model marginals in file ./igor_generating_3z_y24jv/dataIGoRBTnea5JaoG_mdldata/models/model_marginals.txt
Writing gene anchor's in file ./igor_generating_3z_y24jv/dataIGoRBTnea5JaoG_mdldata/ref_genome/V_gene_CDR3_anchors.csv
Writing gene anchor's in file ./igor_generating_3z_y24jv/dataIGoRBTnea5JaoG_mdldata/ref_genome/J_gene_CDR3_anchors.csv
Writing model parms in file ./igor_generating_3z_y24jv/dataIGoRBTnea5JaoG_mdldata//models/model_parms.txt
Writing model marginals in file ./igor_generating_3z_y24jv/dataIGoRBTnea5JaoG_mdldata//models/model_marginals.txt
Writing gene anchor's in file ./igor_generating_3z_y24jv/dataIGoRBTnea5JaoG_mdldata//ref_genome/V_gene_CDR3_anchors.csv
Writing gene anchor's in file ./igor_generating_3z_y24jv/dataIGoRBTnea5JaoG_mdldata//ref_genome/J_gene_CDR3_anchors.csv
/home/olivares/.local/bin/igor -set_wd ./igor_generating_3z_y24jv -batch dataIGoRBTnea5JaoG -set_custom_model ./igor_generating_3z_y24jv/dataIGoRBTnea5JaoG_mdldata//models/model_parms.txt ./igor_generating_3z_y24jv/dataIGoRBTnea5JaoG_mdldata//models/model_marginals.txt -set_CDR3_anchors --V ./igor_generating_3z_y24jv/dataIGoRBTnea5JaoG_mdldata//ref_genome/V_gene_CDR3_anchors.csv --J ./igor_generating_3z_y24jv/dataIGoRBTnea5JaoG_mdldata//ref_genome/J_gene_CDR3_anchors.csv -generate 10
[56]:
| nt_sequence | |
|---|---|
| seq_index | |
| 0 | AAGGCTGGAGTCACTCAAACTCCAAGATATCTGATCAAAACGAGAG... |
| 1 | GAAGCTGGAGTTACTCAGTTCCCCAGCCACAGCGTAATAGAGAAGG... |
| 2 | GGTGCTGTCGTCTCTCAACATCCGAGCTGGGTTATCTGTAAGAGTG... |
| 3 | GGAGCTGGAGTCTCCCAGTCCCCCAGTAACAAGGTCACAGAGAAGG... |
| 4 | GAAGCTGGAGTTGCCCAGTCTCCCAGATATAAGATTATAGAGAAAA... |
| 5 | GGTGCTGGAGTCTCCCAGACCCCCAGTAACAAGGTCACAGAGAAGG... |
| 6 | GATGCTGATGTTACCCAGACCCCAAGGAATAGGATCACAAAGACAG... |
| 7 | AAGGCTGGAGTCACTCAAACTCCAAGATATCTGATCAAAACGAGAG... |
| 8 | GAAACGGGAGTTACGCAGACACCAAGACACCTGGTCATGGGAATGA... |
| 9 | CATGCCAAAGTCACACAGACTCCAGGACATTTGGTCAAAGGAAAAG... |
[ ]:
One sequence evaluation¶
Let’s consider that we have a sequence of a TCR \(\beta\) receptor
[57]:
str_seq_hb = "ATCGAGTGCCGTTCCCTGGACTTTCAGGCCACAACTATGTTTTGGTATCGTCAGTTCCCGAAACAGAGTCTCATGCTGATGGCAACTTCCAATGAGGGCTCCAAGGCCACATACGAGCAAGGCGTCGAGAAGGACAAGTTTCTCATCAACCATGCAAGCCTGACCTTGTCCACTCTGACAGTGACCAGTGCCCATCCTGAAGACAGCAGCTTCTACATCTGCAGTGCTGGGGAAGGGCAGCCTGGAAACACCATATATTTTGGAG"
A classical approach would be make alignments of the VDJ segments and consider the maximun alignment of the segments as uniquely determinated construction.
[58]:
## TODO: SHOW NAIVE ALIGNMENT
[59]:
## TODO: Simple explantion of the inference process
[ ]:
Evaluate VDJ model¶
[60]:
mdl = p3.get_default_IgorModel("human", "tcr_beta")
Reading Parms filename from: /home/olivares/.local/share/igor/models/human/tcr_beta/models/model_parms.txt
Reading Marginals filename from: /home/olivares/.local/share/igor/models/human/tcr_beta/models/model_marginals.txt
[61]:
help(p3.evaluate)
Help on function evaluate in module pygor3.IgorIO:
evaluate(input_sequences: Union[str, pandas.core.frame.DataFrame, numpy.ndarray, pathlib.Path], mdl: pygor3.IgorIO.IgorModel, N_scenarios=None, igor_wd=None, airr_format=True, batch_clean=True)
Evaluate input sequences with provided model
:param input_sequences:Union[str, pd.DataFrame, np.ndarray, Path]
:param mdl:IgorModel
:param batch_clean: Remove all temporary files True by default.
[62]:
df_scenarios = p3.evaluate(str_seq_hb, mdl) # , N_scenarios=20
df_scenarios
Writing model parms in file ./igor_evaluating_1b6e6i7z/dataIGoRyNbbuLilFX_mdldata/models/model_parms.txt
Writing model marginals in file ./igor_evaluating_1b6e6i7z/dataIGoRyNbbuLilFX_mdldata/models/model_marginals.txt
Writing gene anchor's in file ./igor_evaluating_1b6e6i7z/dataIGoRyNbbuLilFX_mdldata/ref_genome/V_gene_CDR3_anchors.csv
Writing gene anchor's in file ./igor_evaluating_1b6e6i7z/dataIGoRyNbbuLilFX_mdldata/ref_genome/J_gene_CDR3_anchors.csv
/home/olivares/.local/bin/igor -set_wd ./igor_evaluating_1b6e6i7z -batch dataIGoRyNbbuLilFX -read_seqs ./igor_evaluating_1b6e6i7z/dataIGoRyNbbuLilFXinput_sequences.csv
/home/olivares/.local/bin/igor -set_wd ./igor_evaluating_1b6e6i7z -batch dataIGoRyNbbuLilFX -set_genomic --V ./igor_evaluating_1b6e6i7z/dataIGoRyNbbuLilFX_mdldata/ref_genome/genomicVs.fasta --D ./igor_evaluating_1b6e6i7z/dataIGoRyNbbuLilFX_mdldata/ref_genome/genomicDs.fasta --J ./igor_evaluating_1b6e6i7z/dataIGoRyNbbuLilFX_mdldata/ref_genome/genomicJs.fasta -set_CDR3_anchors --V ./igor_evaluating_1b6e6i7z/dataIGoRyNbbuLilFX_mdldata/ref_genome/V_gene_CDR3_anchors.csv --J ./igor_evaluating_1b6e6i7z/dataIGoRyNbbuLilFX_mdldata/ref_genome/J_gene_CDR3_anchors.csv -align --all
/home/olivares/.local/bin/igor -set_wd ./igor_evaluating_1b6e6i7z -batch dataIGoRyNbbuLilFX -set_custom_model ./igor_evaluating_1b6e6i7z/dataIGoRyNbbuLilFX_mdldata/models/model_parms.txt ./igor_evaluating_1b6e6i7z/dataIGoRyNbbuLilFX_mdldata/models/model_marginals.txt -evaluate -output --scenarios 10 --Pgen
./igor_evaluating_1b6e6i7z/aligns/dataIGoRyNbbuLilFX_indexed_CDR3s.csv
('',)
Incorrect number of bindings supplied. The current statement uses 5, and there are 1 supplied.
Loading Gene templates ...
V ./igor_evaluating_1b6e6i7z/dataIGoRyNbbuLilFX_mdldata/ref_genome/genomicVs.fasta
J ./igor_evaluating_1b6e6i7z/dataIGoRyNbbuLilFX_mdldata/ref_genome/genomicJs.fasta
D ./igor_evaluating_1b6e6i7z/dataIGoRyNbbuLilFX_mdldata/ref_genome/genomicDs.fasta
loading Anchors data ...
Loading Gene Anchors from ./igor_evaluating_1b6e6i7z/dataIGoRyNbbuLilFX_mdldata/ref_genome/V_gene_CDR3_anchors.csv
Loading Gene Anchors from ./igor_evaluating_1b6e6i7z/dataIGoRyNbbuLilFX_mdldata/ref_genome/J_gene_CDR3_anchors.csv
./igor_evaluating_1b6e6i7z/aligns/dataIGoRyNbbuLilFX_V_alignments.csv
['']
['']
['']
Alignments loaded in database in ./igor_evaluating_1b6e6i7z/dataIGoRyNbbuLilFX.db
Reading Parms filename from: ./igor_evaluating_1b6e6i7z/dataIGoRyNbbuLilFX_mdldata/models/model_parms.txt
Reading Marginals filename from: ./igor_evaluating_1b6e6i7z/dataIGoRyNbbuLilFX_mdldata/models/model_marginals.txt
Loading parms to database
('v_choice', 'v_3_del')
('v_choice', 'd_gene')
('v_choice', 'j_choice')
('d_gene', 'd_3_del')
('d_gene', 'd_5_del')
('j_choice', 'j_5_del')
('j_choice', 'd_gene')
('d_5_del', 'd_3_del')
Loading marginals to database
./igor_evaluating_1b6e6i7z/dataIGoRyNbbuLilFX_output/best_scenarios_counts.csv
events_name_nickname_dict {'GeneChoice_V_gene_Undefined_side_prio7_size89': 'v_choice', 'GeneChoice_J_gene_Undefined_side_prio7_size15': 'j_choice', 'GeneChoice_D_gene_Undefined_side_prio6_size3': 'd_gene', 'Deletion_V_gene_Three_prime_prio5_size21': 'v_3_del', 'Deletion_D_gene_Five_prime_prio5_size21': 'd_5_del', 'Deletion_J_gene_Five_prime_prio5_size23': 'j_5_del', 'Deletion_D_gene_Three_prime_prio5_size21': 'd_3_del', 'Insertion_VD_genes_Undefined_side_prio4_size31': 'vd_ins', 'DinucMarkov_VD_genes_Undefined_side_prio3_size16': 'vd_dinucl', 'Insertion_DJ_gene_Undefined_side_prio2_size31': 'dj_ins', 'DinucMarkov_DJ_gene_Undefined_side_prio1_size16': 'dj_dinucl'}
./igor_evaluating_1b6e6i7z/dataIGoRyNbbuLilFX_output/Pgen_counts.csv
./igor_evaluating_1b6e6i7z/dataIGoRyNbbuLilFX_output/Pgen_counts.csv
('',)
Incorrect number of bindings supplied. The current statement uses 2, and there are 1 supplied.
----- Marginals -----
d_3_del
d_5_del
d_gene
dj_dinucl
dj_ins
j_5_del
j_choice
v_3_del
v_choice
vd_dinucl
vd_ins
[62]:
| sequence_id | sequence | rev_comp | productive | v_call | d_call | j_call | sequence_alignment | germline_alignment | junction | junction_aa | v_cigar | d_cigar | j_cigar | v_score | v_identity | v_support | v_sequence_start | v_sequence_end | v_germline_start | v_germline_end | v_alignment_start | v_alignment_end | d_score | d_identity | d_support | d_sequence_start | d_sequence_end | d_germline_start | d_germline_end | d_alignment_start | d_alignment_end | j_score | j_identity | j_support | j_sequence_start | j_sequence_end | j_germline_start | j_germline_end | j_alignment_start | j_alignment_end | sequence_aa | vj_in_frame | stop_codon | complete_vdj | locus | sequence_alignment_aa | n1_length | np1 | np1_aa | np1_length | n2_length | np2 | np2_aa | np2_length | p3v_length | p5d_length | p3d_length | p5j_length | scenario_rank | scenario_proba_cond_seq | pgen | quality | quality_alignment | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | ATCGAGTGCCGTTCCCTGGACTTTCAGGCCACAACTATGTTTTGGT... | F | NaN | M11955|TRBV20-1*01|Homo sapiens|F|V-REGION|427... | TRBD1*01 | M14158|TRBJ1-3*01|Homo sapiens|F|J-REGION|1499... | GGTGCTGTCGTCTCTCAACATCCGAGCTGGGTTATCTGTAAGAGTG... | NaN | TGCAGTGCTGGGGAAGGGCAGCCTGGAAACACCATATATTTT | CSAGEGQPGNTIYF | 288M | 4M | 48M | 1440 | NaN | NaN | 2 | 288 | 61 | 286 | NaN | NaN | 20 | NaN | NaN | 292 | 294 | 7 | 10 | NaN | NaN | 240 | NaN | NaN | 7 | 53 | 4 | 51 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 5 | GGGGA | NaN | 5 | 4 | CAGC | NaN | 4 | 0 | 0 | 0 | 0 | 1 | 0.146984 | 4.871310e-13 | NaN | NaN |
| 1 | 0 | ATCGAGTGCCGTTCCCTGGACTTTCAGGCCACAACTATGTTTTGGT... | F | NaN | M11955|TRBV20-1*01|Homo sapiens|F|V-REGION|427... | TRBD2*02 | M14158|TRBJ1-3*01|Homo sapiens|F|J-REGION|1499... | GGTGCTGTCGTCTCTCAACATCCGAGCTGGGTTATCTGTAAGAGTG... | NaN | TGCAGTGCTGGGGAAGGGCAGCCTGGAAACACCATATATTTT | CSAGEGQPGNTIYF | 288M | 4M | 48M | 1440 | NaN | NaN | 2 | 288 | 61 | 286 | NaN | NaN | 20 | NaN | NaN | 292 | 294 | 14 | 17 | NaN | NaN | 240 | NaN | NaN | 7 | 53 | 4 | 51 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 5 | GGGGA | NaN | 5 | 4 | CAGC | NaN | 4 | 0 | 0 | 0 | 0 | 2 | 0.112527 | 4.871310e-13 | NaN | NaN |
| 2 | 0 | ATCGAGTGCCGTTCCCTGGACTTTCAGGCCACAACTATGTTTTGGT... | F | NaN | M11955|TRBV20-1*01|Homo sapiens|F|V-REGION|427... | TRBD1*01 | M14158|TRBJ1-3*01|Homo sapiens|F|J-REGION|1499... | GGTGCTGTCGTCTCTCAACATCCGAGCTGGGTTATCTGTAAGAGTG... | NaN | TGCAGTGCTGGGGAAGGGCAGCCTGGAAACACCATATATTTT | CSAGEGQPGNTIYF | 288M | 3M | 48M | 1440 | NaN | NaN | 2 | 288 | 61 | 286 | NaN | NaN | 15 | NaN | NaN | 296 | 297 | 6 | 8 | NaN | NaN | 240 | NaN | NaN | 3 | 49 | 4 | 51 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 9 | GGGGAAGGG | NaN | 9 | 1 | C | NaN | 1 | 0 | 0 | 0 | 0 | 3 | 0.062955 | 4.871310e-13 | NaN | NaN |
| 3 | 0 | ATCGAGTGCCGTTCCCTGGACTTTCAGGCCACAACTATGTTTTGGT... | F | NaN | M11955|TRBV20-1*01|Homo sapiens|F|V-REGION|427... | TRBD2*02 | M14158|TRBJ1-3*01|Homo sapiens|F|J-REGION|1499... | GGTGCTGTCGTCTCTCAACATCCGAGCTGGGTTATCTGTAAGAGTG... | NaN | TGCAGTGCTGGGGAAGGGCAGCCTGGAAACACCATATATTTT | CSAGEGQPGNTIYF | 288M | 3M | 48M | 1440 | NaN | NaN | 2 | 288 | 61 | 286 | NaN | NaN | 15 | NaN | NaN | 293 | 294 | 15 | 17 | NaN | NaN | 240 | NaN | NaN | 6 | 52 | 4 | 51 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 6 | GGGGAA | NaN | 6 | 3 | AGC | NaN | 4 | 0 | 0 | 1 | 0 | 4 | 0.060409 | 4.871310e-13 | NaN | NaN |
| 4 | 0 | ATCGAGTGCCGTTCCCTGGACTTTCAGGCCACAACTATGTTTTGGT... | F | NaN | M11955|TRBV20-1*01|Homo sapiens|F|V-REGION|427... | TRBD1*01 | M14158|TRBJ1-3*01|Homo sapiens|F|J-REGION|1499... | GGTGCTGTCGTCTCTCAACATCCGAGCTGGGTTATCTGTAAGAGTG... | NaN | TGCAGTGCTGGGGAAGGGCAGCCTGGAAACACCATATATTTT | CSAGEGQPGNTIYF | 288M | 4M | 48M | 1440 | NaN | NaN | 2 | 288 | 61 | 286 | NaN | NaN | 20 | NaN | NaN | 293 | 295 | 10 | 13 | NaN | NaN | 240 | NaN | NaN | 6 | 52 | 4 | 51 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 6 | GGGGAA | NaN | 6 | 3 | AGC | NaN | 3 | 0 | 0 | 0 | 0 | 5 | 0.034684 | 4.871310e-13 | NaN | NaN |
| 5 | 0 | ATCGAGTGCCGTTCCCTGGACTTTCAGGCCACAACTATGTTTTGGT... | F | NaN | M11955|TRBV20-1*01|Homo sapiens|F|V-REGION|427... | TRBD1*01 | M14158|TRBJ1-3*01|Homo sapiens|F|J-REGION|1499... | GGTGCTGTCGTCTCTCAACATCCGAGCTGGGTTATCTGTAAGAGTG... | NaN | TGCAGTGCTGGGGAAGGGCAGCCTGGAAACACCATATATTTT | CSAGEGQPGNTIYF | 287M | 4M | 48M | 1435 | NaN | NaN | 2 | 287 | 61 | 285 | NaN | NaN | 20 | NaN | NaN | 292 | 294 | 7 | 10 | NaN | NaN | 240 | NaN | NaN | 7 | 53 | 4 | 51 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 6 | TGGGGA | NaN | 6 | 4 | CAGC | NaN | 4 | 0 | 0 | 0 | 0 | 6 | 0.028806 | 4.871310e-13 | NaN | NaN |
| 6 | 0 | ATCGAGTGCCGTTCCCTGGACTTTCAGGCCACAACTATGTTTTGGT... | F | NaN | M11955|TRBV20-1*01|Homo sapiens|F|V-REGION|427... | TRBD2*02 | M14158|TRBJ1-3*01|Homo sapiens|F|J-REGION|1499... | GGTGCTGTCGTCTCTCAACATCCGAGCTGGGTTATCTGTAAGAGTG... | NaN | TGCAGTGCTGGGGAAGGGCAGCCTGGAAACACCATATATTTT | CSAGEGQPGNTIYF | 288M | 3M | 48M | 1440 | NaN | NaN | 2 | 288 | 61 | 286 | NaN | NaN | 15 | NaN | NaN | 293 | 294 | 11 | 13 | NaN | NaN | 240 | NaN | NaN | 6 | 52 | 4 | 51 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 6 | GGGGAA | NaN | 6 | 4 | CAGC | NaN | 4 | 0 | 0 | 0 | 0 | 7 | 0.023637 | 4.871310e-13 | NaN | NaN |
| 7 | 0 | ATCGAGTGCCGTTCCCTGGACTTTCAGGCCACAACTATGTTTTGGT... | F | NaN | M11955|TRBV20-1*01|Homo sapiens|F|V-REGION|427... | TRBD2*02 | M14158|TRBJ1-3*01|Homo sapiens|F|J-REGION|1499... | GGTGCTGTCGTCTCTCAACATCCGAGCTGGGTTATCTGTAAGAGTG... | NaN | TGCAGTGCTGGGGAAGGGCAGCCTGGAAACACCATATATTTT | CSAGEGQPGNTIYF | 287M | 4M | 48M | 1435 | NaN | NaN | 2 | 287 | 61 | 285 | NaN | NaN | 20 | NaN | NaN | 292 | 294 | 14 | 17 | NaN | NaN | 240 | NaN | NaN | 7 | 53 | 4 | 51 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 6 | TGGGGA | NaN | 6 | 4 | CAGC | NaN | 4 | 0 | 0 | 0 | 0 | 8 | 0.022053 | 4.871310e-13 | NaN | NaN |
| 8 | 0 | ATCGAGTGCCGTTCCCTGGACTTTCAGGCCACAACTATGTTTTGGT... | F | NaN | M11955|TRBV20-1*01|Homo sapiens|F|V-REGION|427... | TRBD2*02 | M14158|TRBJ1-3*01|Homo sapiens|F|J-REGION|1499... | GGTGCTGTCGTCTCTCAACATCCGAGCTGGGTTATCTGTAAGAGTG... | NaN | TGCAGTGCTGGGGAAGGGCAGCCTGGAAACACCATATATTTT | CSAGEGQPGNTIYF | 288M | 4M | 48M | 1440 | NaN | NaN | 2 | 288 | 61 | 286 | NaN | NaN | 20 | NaN | NaN | 288 | 290 | 11 | 14 | NaN | NaN | 240 | NaN | NaN | 11 | 57 | 4 | 51 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1 | G | NaN | 1 | 8 | AGGGCAGC | NaN | 8 | 0 | 0 | 0 | 0 | 9 | 0.017562 | 4.871310e-13 | NaN | NaN |
| 9 | 0 | ATCGAGTGCCGTTCCCTGGACTTTCAGGCCACAACTATGTTTTGGT... | F | NaN | M11955|TRBV20-1*01|Homo sapiens|F|V-REGION|427... | TRBD2*02 | M14158|TRBJ1-3*01|Homo sapiens|F|J-REGION|1499... | GGTGCTGTCGTCTCTCAACATCCGAGCTGGGTTATCTGTAAGAGTG... | NaN | TGCAGTGCTGGGGAAGGGCAGCCTGGAAACACCATATATTTT | CSAGEGQPGNTIYF | 288M | 3M | 48M | 1440 | NaN | NaN | 2 | 288 | 61 | 286 | NaN | NaN | 15 | NaN | NaN | 292 | 293 | 14 | 16 | NaN | NaN | 240 | NaN | NaN | 7 | 53 | 4 | 51 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 5 | GGGGA | NaN | 5 | 5 | GCAGC | NaN | 5 | 0 | 0 | 0 | 0 | 10 | 0.015878 | 4.871310e-13 | NaN | NaN |
[63]:
df_scenarios = p3.evaluate(str_seq_hb, mdl_hb, airr_format=False) # , N_scenarios=20
df_scenarios
Writing model parms in file ./igor_evaluating_mdsvldqo/dataIGoRQVzl4XnKII_mdldata/models/model_parms.txt
Writing model marginals in file ./igor_evaluating_mdsvldqo/dataIGoRQVzl4XnKII_mdldata/models/model_marginals.txt
Writing gene anchor's in file ./igor_evaluating_mdsvldqo/dataIGoRQVzl4XnKII_mdldata/ref_genome/V_gene_CDR3_anchors.csv
Writing gene anchor's in file ./igor_evaluating_mdsvldqo/dataIGoRQVzl4XnKII_mdldata/ref_genome/J_gene_CDR3_anchors.csv
/home/olivares/.local/bin/igor -set_wd ./igor_evaluating_mdsvldqo -batch dataIGoRQVzl4XnKII -read_seqs ./igor_evaluating_mdsvldqo/dataIGoRQVzl4XnKIIinput_sequences.csv
/home/olivares/.local/bin/igor -set_wd ./igor_evaluating_mdsvldqo -batch dataIGoRQVzl4XnKII -set_genomic --V ./igor_evaluating_mdsvldqo/dataIGoRQVzl4XnKII_mdldata/ref_genome/genomicVs.fasta --D ./igor_evaluating_mdsvldqo/dataIGoRQVzl4XnKII_mdldata/ref_genome/genomicDs.fasta --J ./igor_evaluating_mdsvldqo/dataIGoRQVzl4XnKII_mdldata/ref_genome/genomicJs.fasta -set_CDR3_anchors --V ./igor_evaluating_mdsvldqo/dataIGoRQVzl4XnKII_mdldata/ref_genome/V_gene_CDR3_anchors.csv --J ./igor_evaluating_mdsvldqo/dataIGoRQVzl4XnKII_mdldata/ref_genome/J_gene_CDR3_anchors.csv -align --all
/home/olivares/.local/bin/igor -set_wd ./igor_evaluating_mdsvldqo -batch dataIGoRQVzl4XnKII -set_custom_model ./igor_evaluating_mdsvldqo/dataIGoRQVzl4XnKII_mdldata/models/model_parms.txt ./igor_evaluating_mdsvldqo/dataIGoRQVzl4XnKII_mdldata/models/model_marginals.txt -evaluate -output --scenarios 10 --Pgen
igor_fln_generated_realizations_werr: ./igor_evaluating_mdsvldqo/dataIGoRQVzl4XnKII_output/best_scenarios_counts.csv
[63]:
| scenario_rank | scenario_proba_cond_seq | v_choice | j_choice | d_gene | v_3_del | d_5_del | j_5_del | d_3_del | vd_ins | vd_dinucl | dj_ins | dj_dinucl | Mismatches | Pgen_estimate | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| seq_index | |||||||||||||||
| 0 | 1 | 0.146984 | 29 | 2 | 0 | 9 | 9 | 6 | 7 | 5 | [2, 2, 2, 2, 0] | 4 | [1, 2, 0, 1] | [] | 4.871310e-13 |
| 0 | 2 | 0.112527 | 29 | 2 | 2 | 9 | 16 | 6 | 4 | 5 | [2, 2, 2, 2, 0] | 4 | [1, 2, 0, 1] | [] | 4.871310e-13 |
| 0 | 3 | 0.062955 | 29 | 2 | 0 | 9 | 8 | 6 | 9 | 9 | [2, 2, 2, 2, 0, 0, 2, 2, 2] | 1 | [1] | [] | 4.871310e-13 |
| 0 | 4 | 0.060409 | 29 | 2 | 2 | 9 | 17 | 6 | 3 | 6 | [2, 2, 2, 2, 0, 0] | 3 | [1, 2, 0] | [] | 4.871310e-13 |
| 0 | 5 | 0.034684 | 29 | 2 | 0 | 9 | 12 | 6 | 4 | 6 | [2, 2, 2, 2, 0, 0] | 3 | [1, 2, 0] | [] | 4.871310e-13 |
| 0 | 6 | 0.028806 | 29 | 2 | 0 | 10 | 9 | 6 | 7 | 6 | [3, 2, 2, 2, 2, 0] | 4 | [1, 2, 0, 1] | [] | 4.871310e-13 |
| 0 | 7 | 0.023637 | 29 | 2 | 2 | 9 | 13 | 6 | 8 | 6 | [2, 2, 2, 2, 0, 0] | 4 | [1, 2, 0, 1] | [] | 4.871310e-13 |
| 0 | 8 | 0.022053 | 29 | 2 | 2 | 10 | 16 | 6 | 4 | 6 | [3, 2, 2, 2, 2, 0] | 4 | [1, 2, 0, 1] | [] | 4.871310e-13 |
| 0 | 9 | 0.017562 | 29 | 2 | 2 | 9 | 13 | 6 | 7 | 1 | [2] | 8 | [1, 2, 0, 1, 2, 2, 2, 0] | [] | 4.871310e-13 |
| 0 | 10 | 0.015878 | 29 | 2 | 2 | 9 | 16 | 6 | 5 | 5 | [2, 2, 2, 2, 0] | 5 | [1, 2, 0, 1, 2] | [] | 4.871310e-13 |
Visualize scenarios¶
[64]:
for ii, ps_scenario in df_scenarios.iterrows():
mdl_hb.plot_scenario(ps_scenario) #, nt_lim=(200,340))
[65]:
import pandas as pd
dict_scenario_fict = {
'scenario_rank' : 1,
'scenario_proba_cond_seq' : 0.9999,
'v_choice' : 61,
'j_choice' : 1,
'd_gene' : 0,
'v_3_del': 0,
'd_5_del': 1,
'j_5_del': 8,
'd_3_del': 6,
'vd_ins' : 1,
'vd_dinucl': [1],
'dj_ins': 2,
'dj_dinucl': [0, 3],
'Mismatches' : [],
'norm_scenario_proba_cond_seq': 0.000225
}
ps_scenario_fict = pd.Series(dict_scenario_fict)
mdl_hb.plot_scenario(ps_scenario_fict)
[65]:
(<Figure size 1440x720 with 1 Axes>, <AxesSubplot:>)
[66]:
df_scenario_aln_fict = mdl_hb.get_df_scenario_aln_from_scenario(ps_scenario_fict)
df_scenario_aln_fict
mdl_hb.get_gene_segment_dict('V', ps_scenario)
[66]:
OrderedDict([('gene_template',
'GGTGCTGTCGTCTCTCAACATCCGAGCTGGGTTATCTGTAAGAGTGGAACCTCTGTGAAGATCGAGTGCCGTTCCCTGGACTTTCAGGCCACAACTATGTTTTGGTATCGTCAGTTCCCGAAACAGAGTCTCATGCTGATGGCAACTTCCAATGAGGGCTCCAAGGCCACATACGAGCAAGGCGTCGAGAAGGACAAGTTTCTCATCAACCATGCAAGCCTGACCTTGTCCACTCTGACAGTGACCAGTGCCCATCCTGAAGACAGCAGCTTCTACATCTGCAGTGCTAGAGA'),
('int_gene_5_del', 0),
('int_gene_3_del', 5),
('palindrome_5_end', ''),
('gene_ini', 0),
('gene_end', 288),
('gene_cut',
'GGTGCTGTCGTCTCTCAACATCCGAGCTGGGTTATCTGTAAGAGTGGAACCTCTGTGAAGATCGAGTGCCGTTCCCTGGACTTTCAGGCCACAACTATGTTTTGGTATCGTCAGTTCCCGAAACAGAGTCTCATGCTGATGGCAACTTCCAATGAGGGCTCCAAGGCCACATACGAGCAAGGCGTCGAGAAGGACAAGTTTCTCATCAACCATGCAAGCCTGACCTTGTCCACTCTGACAGTGACCAGTGCCCATCCTGAAGACAGCAGCTTCTACATCTGCAGTGCT'),
('palindrome_3_end', ''),
('gene_segment',
'GGTGCTGTCGTCTCTCAACATCCGAGCTGGGTTATCTGTAAGAGTGGAACCTCTGTGAAGATCGAGTGCCGTTCCCTGGACTTTCAGGCCACAACTATGTTTTGGTATCGTCAGTTCCCGAAACAGAGTCTCATGCTGATGGCAACTTCCAATGAGGGCTCCAAGGCCACATACGAGCAAGGCGTCGAGAAGGACAAGTTTCTCATCAACCATGCAAGCCTGACCTTGTCCACTCTGACAGTGACCAGTGCCCATCCTGAAGACAGCAGCTTCTACATCTGCAGTGCT'),
('gene_description',
"TRBV20-1*01 (v_choice: 29) (v_3_del: 9), 3'del : 5")])
[ ]:
[67]:
mdl_hb.parms['d_gene']
[67]:
| value | name | |
|---|---|---|
| id | ||
| 0 | GGGACAGGGGGC | TRBD1*01 |
| 1 | GGGACTAGCGGGGGGG | TRBD2*01 |
| 2 | GGGACTAGCGGGAGGG | TRBD2*02 |
[68]:
help(mdl_hb.plot_scenario)
Help on method plot_scenario in module pygor3.IgorIO:
plot_scenario(ps_scenario, nt_lim: Union[NoneType, tuple, list] = None, show_CDR3=True, ax=None) method of pygor3.IgorIO.IgorModel instance
Return matplotlib fig, ax
:param ps_scenario: Pandas Series scenario
:param nt_lim:Union[None,tuple,list] region limits to show the scenario alignment
default give boundaries around CDR3, if no anchors in model, show the whole scenario
:param show_CDR3: Show CDR3 lines default(=True)
Observables from scenarios¶
Predefine Functions for Scenarios¶
[69]:
fln_scenarios = "best_scenarios_counts.csv"
df_scenarios_many = mdl_hb.get_dataframe_scenarios(fln_scenarios)
igor_fln_generated_realizations_werr: best_scenarios_counts.csv
Pairwise Probabilities¶
[70]:
help(mdl_hb.get_P_from_scenarios_cols)
Help on method get_P_from_scenarios_cols in module pygor3.IgorIO:
get_P_from_scenarios_cols(df_scenarios, colname_list) method of pygor3.IgorIO.IgorModel instance
Return xarray with marginalize probabilities of listed columns in dataframe scenarios df_scenarios
:param df_scenarios: Scenarios with normalize probability. Loaded with self.get_dataframe_scenarios()
:param colname_list: List of variables preserve for marginalization
[71]:
Pjoint_V_J_scens = mdl_hb.get_P_from_scenarios_cols(df_scenarios_many, ['v_choice', 'j_choice'])
Pjoint_V_J_scens.plot(cmap='gnuplot2_r')
[71]:
<matplotlib.collections.QuadMesh at 0x7fac14e44f10>
[72]:
mdl_hb.get_P_joint(['v_choice', 'j_choice']).plot(cmap='gnuplot2_r')
[72]:
<matplotlib.collections.QuadMesh at 0x7fac14d4ef50>
Mutual Information from scenarios¶
[73]:
help(mdl_hb.get_mutual_information_events_from_df_scenarios)
Help on method get_mutual_information_events_from_df_scenarios in module pygor3.IgorIO:
get_mutual_information_events_from_df_scenarios(df_scenarios, event_nickname_x, event_nickname_y) method of pygor3.IgorIO.IgorModel instance
Return mutual information in log10 of the desired events
[74]:
mdl_hb.get_mutual_information_events_from_df_scenarios(df_scenarios_many, 'v_choice', 'j_choice')
mutual_information ( v_choice , j_choice ): 0.6392623620052954
[74]:
<xarray.DataArray 'norm_scenario_proba_cond_seq' ()> array(0.63926236)
- 0.6393
array(0.63926236)
[75]:
da_mi_scenarios = mdl_hb.get_mutual_information_from_df_scenarios(df_scenarios_many)
mutual_information ( v_choice , j_choice ): 0.6392623620052954
mutual_information ( v_choice , d_gene ): 0.08894391386660068
mutual_information ( v_choice , v_3_del ): 0.7021801087511944
mutual_information ( v_choice , d_5_del ): 0.35886454309758736
mutual_information ( v_choice , j_5_del ): 0.4667910640736827
mutual_information ( v_choice , d_3_del ): 0.3381390129055559
mutual_information ( v_choice , vd_ins ): 0.36950632876861056
mutual_information ( v_choice , dj_ins ): 0.34050716113668833
mutual_information ( j_choice , d_gene ): 0.16839619808892387
mutual_information ( j_choice , v_3_del ): 0.1373839342864651
mutual_information ( j_choice , d_5_del ): 0.11353204598699257
mutual_information ( j_choice , j_5_del ): 0.44550937210770414
mutual_information ( j_choice , d_3_del ): 0.11443530326748617
mutual_information ( j_choice , vd_ins ): 0.11203857283894578
mutual_information ( j_choice , dj_ins ): 0.09776360260846355
mutual_information ( d_gene , v_3_del ): 0.017241089783731553
mutual_information ( d_gene , d_5_del ): 0.36172414085319887
mutual_information ( d_gene , j_5_del ): 0.021864784540005937
mutual_information ( d_gene , d_3_del ): 0.22353112818847967
mutual_information ( d_gene , vd_ins ): 0.013194437391113211
mutual_information ( d_gene , dj_ins ): 0.01294449164566809
mutual_information ( v_3_del , d_5_del ): 0.06598437525781697
mutual_information ( v_3_del , j_5_del ): 0.09843108497391649
mutual_information ( v_3_del , d_3_del ): 0.06941838791192181
mutual_information ( v_3_del , vd_ins ): 0.10582382685367009
mutual_information ( v_3_del , dj_ins ): 0.07374102134403618
mutual_information ( d_5_del , j_5_del ): 0.08378406617127063
mutual_information ( d_5_del , d_3_del ): 0.4950198124493378
mutual_information ( d_5_del , vd_ins ): 0.10780812230061945
mutual_information ( d_5_del , dj_ins ): 0.07617536406698441
mutual_information ( j_5_del , d_3_del ): 0.08559953020573469
mutual_information ( j_5_del , vd_ins ): 0.07299823195715721
mutual_information ( j_5_del , dj_ins ): 0.10957597446165623
mutual_information ( d_3_del , vd_ins ): 0.06822031197593291
mutual_information ( d_3_del , dj_ins ): 0.08640307827393552
mutual_information ( vd_ins , dj_ins ): 0.06747319489954777
[76]:
da_mi_scenarios#.to_dataframe().dropna()
[76]:
<xarray.DataArray 'mutual_information' (x: 9, y: 9)>
array([[0. , 0.63926236, 0.08894391, 0.70218011, 0.35886454,
0.46679106, 0.33813901, 0.36950633, 0.34050716],
[0.63926236, 0. , 0.1683962 , 0.13738393, 0.11353205,
0.44550937, 0.1144353 , 0.11203857, 0.0977636 ],
[0.08894391, 0.1683962 , 0. , 0.01724109, 0.36172414,
0.02186478, 0.22353113, 0.01319444, 0.01294449],
[0.70218011, 0.13738393, 0.01724109, 0. , 0.06598438,
0.09843108, 0.06941839, 0.10582383, 0.07374102],
[0.35886454, 0.11353205, 0.36172414, 0.06598438, 0. ,
0.08378407, 0.49501981, 0.10780812, 0.07617536],
[0.46679106, 0.44550937, 0.02186478, 0.09843108, 0.08378407,
0. , 0.08559953, 0.07299823, 0.10957597],
[0.33813901, 0.1144353 , 0.22353113, 0.06941839, 0.49501981,
0.08559953, 0. , 0.06822031, 0.08640308],
[0.36950633, 0.11203857, 0.01319444, 0.10582383, 0.10780812,
0.07299823, 0.06822031, 0. , 0.06747319],
[0.34050716, 0.0977636 , 0.01294449, 0.07374102, 0.07617536,
0.10957597, 0.08640308, 0.06747319, 0. ]])
Coordinates:
* x (x) <U8 'v_choice' 'j_choice' 'd_gene' ... 'vd_ins' 'dj_ins'
* y (y) <U8 'v_choice' 'j_choice' 'd_gene' ... 'vd_ins' 'dj_ins'- x: 9
- y: 9
- 0.0 0.6393 0.08894 0.7022 0.3589 ... 0.07618 0.1096 0.0864 0.06747 0.0
array([[0. , 0.63926236, 0.08894391, 0.70218011, 0.35886454, 0.46679106, 0.33813901, 0.36950633, 0.34050716], [0.63926236, 0. , 0.1683962 , 0.13738393, 0.11353205, 0.44550937, 0.1144353 , 0.11203857, 0.0977636 ], [0.08894391, 0.1683962 , 0. , 0.01724109, 0.36172414, 0.02186478, 0.22353113, 0.01319444, 0.01294449], [0.70218011, 0.13738393, 0.01724109, 0. , 0.06598438, 0.09843108, 0.06941839, 0.10582383, 0.07374102], [0.35886454, 0.11353205, 0.36172414, 0.06598438, 0. , 0.08378407, 0.49501981, 0.10780812, 0.07617536], [0.46679106, 0.44550937, 0.02186478, 0.09843108, 0.08378407, 0. , 0.08559953, 0.07299823, 0.10957597], [0.33813901, 0.1144353 , 0.22353113, 0.06941839, 0.49501981, 0.08559953, 0. , 0.06822031, 0.08640308], [0.36950633, 0.11203857, 0.01319444, 0.10582383, 0.10780812, 0.07299823, 0.06822031, 0. , 0.06747319], [0.34050716, 0.0977636 , 0.01294449, 0.07374102, 0.07617536, 0.10957597, 0.08640308, 0.06747319, 0. ]]) - x(x)<U8'v_choice' 'j_choice' ... 'dj_ins'
array(['v_choice', 'j_choice', 'd_gene', 'v_3_del', 'd_5_del', 'j_5_del', 'd_3_del', 'vd_ins', 'dj_ins'], dtype='<U8') - y(y)<U8'v_choice' 'j_choice' ... 'dj_ins'
array(['v_choice', 'j_choice', 'd_gene', 'v_3_del', 'd_5_del', 'j_5_del', 'd_3_del', 'vd_ins', 'dj_ins'], dtype='<U8')
[77]:
mdl_hb.plot_mutual_information(da_mi_scenarios)
[77]:
<AxesSubplot:>
[78]:
mdl_hb.plot_mutual_information(da_mi)
[78]:
<AxesSubplot:>
Mean and Variance of values in scenarios dataframe¶
The simplest way to get the realization values of events in scenario is using the function IgorModel.get_realization_value_from_df_scenarios(df_scenarios, ‘event_nickname’)
[79]:
help(mdl_hb.get_realization_value_from_df_scenarios)
Help on method get_realization_value_from_df_scenarios in module pygor3.IgorIO:
get_realization_value_from_df_scenarios(df_scenarios, event_nickname) method of pygor3.IgorIO.IgorModel instance
[80]:
mdl_hb.get_realization_value_from_df_scenarios(df_scenarios_many, 'd_gene')
[80]:
seq_index
998 GGGACAGGGGGC
998 GGGACAGGGGGC
998 GGGACAGGGGGC
998 GGGACAGGGGGC
998 GGGACAGGGGGC
...
703 GGGACTAGCGGGGGGG
703 GGGACTAGCGGGGGGG
703 GGGACTAGCGGGGGGG
703 GGGACAGGGGGC
703 GGGACAGGGGGC
Length: 10000, dtype: object
[81]:
mdl_hb.get_realization_value_from_df_scenarios(df_scenarios_many, 'j_5_del')
[81]:
seq_index
998 15
998 15
998 15
998 15
998 15
..
703 4
703 5
703 4
703 4
703 4
Length: 10000, dtype: int64
If the values are assigned to a column in scenarios dataframe (df_scenarios_many) the weighted average can be calculated using the IgorModel.w_mean_df_scenarios
[82]:
ps_j_5_del = mdl_hb.get_realization_value_from_df_scenarios(df_scenarios_many, 'j_5_del')
df_scenarios_many['j_5_del_value'] = ps_j_5_del
# ps_j_5_del
[83]:
help(mdl_hb.w_mean_df_scenarios)
help(mdl_hb.w_variance_df_scenarios)
Help on method w_mean_df_scenarios in module pygor3.IgorIO:
w_mean_df_scenarios(column_name: str, df_scenarios: pandas.core.frame.DataFrame) method of pygor3.IgorIO.IgorModel instance
Return weighted mean with the normalized probabilities for each
scenario (norm_scenario_proba_cond_seq)
:param column_name: column name of df_scenario to calculate the average
:param df_scenarios: Scenarios with normalize probability. Loaded with self.get_dataframe_scenarios()
Help on method w_variance_df_scenarios in module pygor3.IgorIO:
w_variance_df_scenarios(colname_1: str, df_scenarios: pandas.core.frame.DataFrame) method of pygor3.IgorIO.IgorModel instance
Return weighted covariance with the normalized probabilities of the column names given for each
scenario (norm_scenario_proba_cond_seq)
:param colname_1: column name of df_scenario to calculate the weighted covariance
:param colname_2: column name of df_scenario to calculate the weighted covariance
:param df_scenarios: Scenarios with normalize probability. Loaded with self.get_dataframe_scenarios()
[84]:
mdl_hb.w_mean_df_scenarios('j_5_del_value', df_scenarios_many)
[84]:
4.157669017539739
[85]:
mdl_hb.w_variance_df_scenarios('j_5_del_value', df_scenarios_many)
[85]:
10.537416194979743
[86]:
help(mdl_hb.get_observable_from_df_scenarios)
Help on method get_observable_from_df_scenarios in module pygor3.IgorIO:
get_observable_from_df_scenarios(observable_function, df_scenarios: pandas.core.frame.DataFrame) method of pygor3.IgorIO.IgorModel instance
Return a pandas series with the calculated observable over the df_scenarios dataframe.
:param observable_function: This function should use the varibles with self.realization
:param df_scenarios: Scenarios dataframe loaded with self.get_dataframe_scenarios.
[87]:
def f_total_insertions(ps_scenario):
my_internal = np.zeros(len(mdl.parms['vd_ins']))
vd_ins = mdl.realization(ps_scenario, 'vd_ins')
dj_ins = mdl.realization(ps_scenario, 'dj_ins')
# my_internal[vd_ins.id] = 1
# return my_internal
return vd_ins.value + dj_ins.value
def f_total_deletions(ps_scenario):
v_3_del = mdl.realization(ps_scenario, 'v_3_del')
d_5_del = mdl.realization(ps_scenario, 'd_5_del')
d_3_del = mdl.realization(ps_scenario, 'd_3_del')
j_5_del = mdl.realization(ps_scenario, 'j_5_del')
return v_3_del.value + d_5_del.value + d_3_del.value + j_5_del.value
[88]:
mean_insertions = mdl.w_average_function_df_scenarios(f_total_insertions, df_scenarios_many)
mean_deletions = mdl.w_average_function_df_scenarios(f_total_deletions, df_scenarios_many)
[89]:
mean_insertions, mean_deletions
[89]:
(10.202437426325016, 15.645906964009958)
[90]:
def f_give_thename(ps_scenario):
v_choice = mdl_hb.realization(ps_scenario, 'v_choice')
return v_choice
[91]:
mdl_hb.get_observable_from_df_scenarios(f_give_thename, df_scenarios=df_scenarios_many)
[91]:
seq_index
998 TRBV7-4*01;GGTGCTGGAGTCTCCCAGTCCCCAAGGTACAAAGT...
998 TRBV7-4*01;GGTGCTGGAGTCTCCCAGTCCCCAAGGTACAAAGT...
998 TRBV7-4*01;GGTGCTGGAGTCTCCCAGTCCCCAAGGTACAAAGT...
998 TRBV7-4*01;GGTGCTGGAGTCTCCCAGTCCCCAAGGTACAAAGT...
998 TRBV7-4*01;GGTGCTGGAGTCTCCCAGTCCCCAAGGTACAAAGT...
...
703 TRBV4-1*01;ACTGAAGTTACCCAGACACCAAAACACCTGGTCAT...
703 TRBV4-1*01;ACTGAAGTTACCCAGACACCAAAACACCTGGTCAT...
703 TRBV4-1*01;ACTGAAGTTACCCAGACACCAAAACACCTGGTCAT...
703 TRBV4-1*01;ACTGAAGTTACCCAGACACCAAAACACCTGGTCAT...
703 TRBV4-1*01;ACTGAAGTTACCCAGACACCAAAACACCTGGTCAT...
Length: 10000, dtype: object
[92]:
df_scenarios_many['v_call'] = mdl_hb.get_observable_from_df_scenarios(f_give_thename, df_scenarios=df_scenarios_many)
df_scenarios_many
[92]:
| scenario_rank | scenario_proba_cond_seq | v_choice | j_choice | d_gene | v_3_del | d_5_del | j_5_del | d_3_del | vd_ins | vd_dinucl | dj_ins | dj_dinucl | Mismatches | norm_scenario_proba_cond_seq | j_5_del_value | v_call | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| seq_index | |||||||||||||||||
| 998 | 1 | 0.783497 | 77 | 4 | 0 | 9 | 6 | 19 | 9 | 4 | [1, 1, 2, 3] | 7 | [2, 2, 2, 2, 0, 2, 0] | [] | 0.000801 | 15 | TRBV7-4*01;GGTGCTGGAGTCTCCCAGTCCCCAAGGTACAAAGT... |
| 998 | 2 | 0.076884 | 77 | 4 | 0 | 9 | 6 | 19 | 10 | 4 | [1, 1, 2, 3] | 8 | [2, 2, 2, 2, 0, 2, 0, 2] | [] | 0.000079 | 15 | TRBV7-4*01;GGTGCTGGAGTCTCCCAGTCCCCAAGGTACAAAGT... |
| 998 | 3 | 0.043634 | 77 | 4 | 0 | 9 | 7 | 19 | 9 | 5 | [1, 1, 2, 3, 2] | 7 | [2, 2, 2, 2, 0, 2, 0] | [] | 0.000045 | 15 | TRBV7-4*01;GGTGCTGGAGTCTCCCAGTCCCCAAGGTACAAAGT... |
| 998 | 4 | 0.023517 | 77 | 4 | 0 | 10 | 6 | 19 | 9 | 5 | [1, 1, 1, 2, 3] | 7 | [2, 2, 2, 2, 0, 2, 0] | [] | 0.000024 | 15 | TRBV7-4*01;GGTGCTGGAGTCTCCCAGTCCCCAAGGTACAAAGT... |
| 998 | 5 | 0.021809 | 77 | 4 | 0 | 9 | 8 | 19 | 9 | 6 | [1, 1, 2, 3, 2, 0] | 7 | [2, 2, 2, 2, 0, 2, 0] | [] | 0.000022 | 15 | TRBV7-4*01;GGTGCTGGAGTCTCCCAGTCCCCAAGGTACAAAGT... |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 703 | 6 | 0.021822 | 45 | 12 | 1 | 6 | 11 | 8 | 7 | 6 | [2, 2, 1, 0, 3, 3] | 5 | [2, 0, 3, 1, 1] | [] | 0.000030 | 4 | TRBV4-1*01;ACTGAAGTTACCCAGACACCAAAACACCTGGTCAT... |
| 703 | 7 | 0.016150 | 45 | 12 | 1 | 5 | 11 | 9 | 7 | 5 | [2, 1, 0, 3, 3] | 6 | [2, 2, 0, 3, 1, 1] | [] | 0.000022 | 5 | TRBV4-1*01;ACTGAAGTTACCCAGACACCAAAACACCTGGTCAT... |
| 703 | 8 | 0.013252 | 45 | 12 | 1 | 5 | 11 | 8 | 8 | 5 | [2, 1, 0, 3, 3] | 6 | [2, 0, 3, 1, 1, 2] | [] | 0.000018 | 4 | TRBV4-1*01;ACTGAAGTTACCCAGACACCAAAACACCTGGTCAT... |
| 703 | 9 | 0.013172 | 45 | 12 | 0 | 6 | 11 | 8 | 4 | 8 | [2, 2, 1, 0, 3, 3, 2, 1] | 4 | [2, 0, 3, 1] | [] | 0.000018 | 4 | TRBV4-1*01;ACTGAAGTTACCCAGACACCAAAACACCTGGTCAT... |
| 703 | 10 | 0.010846 | 45 | 12 | 0 | 5 | 12 | 8 | 4 | 8 | [2, 1, 0, 3, 3, 2, 1, 2] | 4 | [2, 0, 3, 1] | [] | 0.000015 | 4 | TRBV4-1*01;ACTGAAGTTACCCAGACACCAAAACACCTGGTCAT... |
10000 rows × 17 columns
[93]:
ps_scenario = df_scenarios_many.iloc[0]
realiz = mdl_hb.realization(ps_scenario, 'v_choice')
realiz.value, realiz.id, realiz.name
[93]:
('GGTGCTGGAGTCTCCCAGTCCCCAAGGTACAAAGTCGCAAAGAGGGGACGGGATGTAGCTCTCAGGTGTGATTCAATTTCGGGTCATGTAACCCTTTATTGGTACCGACAGACCCTGGGGCAGGGCTCAGAGGTTCTGACTTACTCCCAGAGTGATGCTCAACGAGACAAATCAGGGCGGCCCAGTGGTCGGTTCTCTGCAGAGAGGCCTGAGAGATCCGTCTCCACTCTGAAGATCCAGCGCACAGAGCAGGGGGACTCAGCTGTGTATCTCTGTGCCAGCAGCTTAGC',
77,
'TRBV7-4*01')
[ ]:
[ ]:
[ ]: