API¶
Sequence Composition Metrics¶
-
seqm.polydict(seq, nuc='ACGT')¶ Computes largest homopolymer for all specified nucleotides.
- Parameters
seq (str) – Nucleotide sequence
Examples
>>> sequtils.polydict('AAAACCGT') {'A': 4, 'C': 2, 'G': 1, 'T': 1}
-
seqm.polylength(seq)¶ Calculate length of maximum homopolymer stretch within sequence.
- Parameters
seq (str) – Nucleotide sequence
Examples
>>> sequtils.polylength('AAAACCGT') 4
-
seqm.entropy(seq)¶ Calculate Shannon entropy of sequence.
- Parameters
seq (str) – Nucleotide sequence
Examples
>>> sequtils.entropy('AGGATAAG') 1.40 >>> sequtils.entropy('AAAACCGT') 1.75
-
seqm.gc_percent(seq)¶ Calculate fraction of GC bases within sequence.
- Parameters
seq (str) – Nucleotide sequence
Examples
>>> sequtils.gc_percent('AGGATAAG') 0.375
-
seqm.gc_skew(seq)¶ Calculate GC skew (g-c)/(g+c) for sequence. For homopolymer stretches with no GC, the skew will be rounded to zero.
- Parameters
seq (str) – Nucleotide sequence
Examples
>>> sequtils.gc_skew('AGGATAAG') 3.0
-
seqm.gc_shift(seq)¶ Calculate GC shift (a + t)/(g + c) for sequence. For homopolymer stretches with no GC, the shift will be rounded to the number of bases in the sequence.
- Parameters
seq (str) – Nucleotide sequence
Examples
>>> sequtils.gc_shift('AGGATAAG') 1.67
-
seqm.dna_weight(seq)¶ Return molecular weight of triphosphate dna sequence (g/mol).
See https://www.thermofisher.com/us/en/home/references/ambion-tech-support/rna-tools-and-calculators/dna-and-rna-molecular-weights-and-conversions.html for details on conversions.
- Parameters
seq (str) – Nucleotide sequence
Examples
>>> sequtils.dna_weight('AGGATAAG') 3968.59
-
seqm.rna_weight(seq)¶ Return molecular weight of triphosphate rna sequence (g/mol).
See https://www.thermofisher.com/us/en/home/references/ambion-tech-support/rna-tools-and-calculators/dna-and-rna-molecular-weights-and-conversions.html for details on conversions.
- Parameters
seq (str) – Nucleotide sequence
Examples
>>> sequtils.rna_weight('AGGATAAG') 4082.59
-
seqm.aa_weight(seq)¶ Return molecular weight of amino acid sequence (g/mol).
- Parameters
seq (str) – Nucleotide sequence
Examples
>>> sequtils.aa_weight('AGGATAAG') 700.8
-
seqm.zipsize(seq)¶ Calculate size of gzip-compressed sequence.
- Parameters
seq (str) – Sequence
Examples
>>> sequtils.zipsize('AGGATAAGAGATAGATTT') 39.31
Domain Conversion¶
-
seqm.revcomplement(seq)¶ Reverse complement sequence.
- Parameters
seq (str) – Nucleotide sequence
Examples
>>> sequtils.revcomplement('AACCTT') 'AAGGTT'
-
seqm.complement(seq)¶ Complement sequence.
- Parameters
seq (str) – Nucleotide sequence
Examples
>>> sequtils.complement('AACCTT') TTGGAA
-
seqm.aa(seq)¶ Return amino acid translation of sequence. Ends of the sequences that don’t produce a full codon will be clipped.
- Parameters
seq (str) – Nucleotide sequence
Examples
>>> sequtils.aa('ATGTAG') M*
-
seqm.likelihood(seq)¶ Translates quality scores sequence into error likelihoods.
- Parameters
seq (str) – Sequence of quality scores.
-
seqm.qscore(seq)¶ Translates quality score sequence into phred-scaled likelihoods.
- Parameters
seq (str) – Sequence of quality scores.
Sequence Similarity Metrics¶
-
seqm.hamming(seq1, seq2)¶ Calculate hamming distance between sequences.
- Parameters
seq1 (str) – Reference sequence
seq2 (str) – Sequence to compare
Examples
>>> hamming('AACCTT', 'AAGCCTT') 1
-
seqm.edit(seq1, seq2)¶ Wrapper around editdistance.eval for fast Levenshtein distance computation.
- Parameters
seq1 (str) – Reference sequence
seq2 (str) – Sequence to compare
Examples
>>> edit('banana', 'bahama') 2
Objects¶
-
class
seqm.Sequence(sequence)¶ Object for managing sequence structure and operating on sequences (i.e. getting amino acid sequence, reverse complement, gc content, etc …).
- Parameters
sequence (str) – Nucleotide sequence.
Examples
>>> seq = sequtils.Sequence('ACGTACGT') >>> seq.gc_content 0.25 >>> seq.revcomplement ACGTACGT >>> seq.dna_weight 3895.59
-
aa¶ Wrapper around
sequtils.aa()for thesequtils.Sequenceobject.
-
aa_weight¶ Wrapper around
sequtils.aa_weight()for thesequtils.Sequenceobject.
-
complement¶ Wrapper around
sequtils.complement()for thesequtils.Sequenceobject.
-
dna_weight¶ Wrapper around
sequtils.dna_weight()for thesequtils.Sequenceobject.
-
edit(other)¶ Wrapper around
sequtils.edit()for thesequtils.Sequenceobject.- Parameters
other (str, Sequence) – Sequence to compare.
-
entropy¶ Wrapper around
sequtils.entropy()for thesequtils.Sequenceobject.
-
gc_percent¶ Wrapper around
sequtils.gc_percent()for thesequtils.Sequenceobject.
-
gc_shift¶ Wrapper around
sequtils.gc_shift()for thesequtils.Sequenceobject.
-
gc_skew¶ Wrapper around
sequtils.gc_skew()for thesequtils.Sequenceobject.
-
hamming(other)¶ Wrapper around
sequtils.hamming()for thesequtils.Sequenceobject.- Parameters
other (str, Sequence) – Sequence to compare.
-
polydict¶ Wrapper around
sequtils.polydict()for thesequtils.Sequenceobject.
-
polylength¶ Wrapper around
sequtils.polylength()for thesequtils.Sequenceobject.
-
revcomplement¶ Wrapper around
sequtils.revcomplement()for thesequtils.Sequenceobject.
-
rna_weight¶ Wrapper around
sequtils.rna_weight()for thesequtils.Sequenceobject.
-
wrap(bases=60)¶ Wrapper around
sequtils.wrap()for thesequtils.Sequenceobject.- Parameters
bases (int) – Number of bases to include in line.
-
zipsize¶ Wrapper around
sequtils.zipsize()for thesequtils.Sequenceobject.