Metadata-Version: 2.1
Name: codx
Version: 0.1.1
Summary: A package used to retrieve exon for protein sequences from RefSeqGene database
License: MIT
Author: Toan Phung
Author-email: toan.phungkhoiquoctoan@gmail.com
Requires-Python: >=3.9,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Dist: biopython (>=1.81,<2.0)
Requires-Dist: click (>=8.1.3,<9.0.0)
Requires-Dist: pandas (>=1.5.3,<2.0.0)
Requires-Dist: uniprotparser (>=1.0.9,<2.0.0)
Description-Content-Type: text/markdown

# CODX
---

`codx` is a python package that allow retrieval of exons data from NCBI RefSeqGene database.

## Installation

```bash
pip install codx
```

## Usage
---
The package uses gene id in order to retrieve exons data from NCBI RefSeqGene database. The gene id can be obtained from the Uniprot database using the accession id of the gene. The `get_geneids_from_uniprot` function can be used to obtain the gene id from RefSeqGene database of NCBI.


```python
# if you only have accession id, you must first use the get_geneids_from_uniprot function to get the gene id from Uniprot
from codx.components import get_geneids_from_uniprot

gene_ids = get_geneids_from_uniprot(["P35568", "P05019", "Q99490", "Q8NEJ0", "Q13322", "Q15323"])
# the result will be a set of gene ids that can be obtained from the Uniprot database using the list of Uniprot accession above
```



```python
# Import the create_db function to create a sqlite3 database with gene and exon data from NCBI
from codx.components import create_db


# 120892 is the gene id for LRRK2 gene
db = create_db(["120892"], entrez_email="your@email.com") # You need to provide an email address to use the NCBI API

# From the database object, you can retrieve a gene object using its gene name
gene = db.get_gene("LRRK2")

# From the gene objects you can retrieve exons data from the blocks attribute each exon object has its start and end location as well as the associated sequence
for exon in gene.blocks:
    print(exon.start, exon.end, exon.sequence)

# Using the gene object it is also possible to create all possible ordered combinations of exons
# This will be a generator object that yield a SeqRecord object for each combination
# There however may be a lot of combinations so depending on the gene, you may not want to use this with a very large gene unless there are no other options
for exon_combination in gene.shuffle_blocks():
    print(exon_combination)

# To create six frame translation of any sequence, you can use the three_frame_translation function twice, one with and one without the reverse complement option enable
# Each output is a dictionary with the translatable sequence as value and the frame as key
from codx.components import three_frame_translation
for exon_combination in gene.shuffle_blocks():
    three_frame = three_frame_translation(exon_combination.seq, only_start_at_atg=True)
    three_frame_complement = three_frame_translation(exon_combination.seq, only_start_at_atg=True, reverse_complement=True)

```

