.. _dna-rna-seqs:

``Sequence``
============

The ``Sequence`` object contains classes that represent biological sequence data. These provide generic biological sequence manipulation functions, plus functions that are critical for the ``evolve`` module calculations.

.. warning:: Do not import sequence classes directly! It is expected that you will access them through ``MolType`` objects. The molecular types can be accessed via the ``cogent3.get_moltype()`` function. Sequence classes depend on information from the ``MolType`` that is **only** available after ``MolType`` has been imported. Sequences are intended to be immutable. This is not enforced by the code for performance reasons, but don't alter the ``MolType`` or the sequence data after creation.

DNA and RNA sequences
---------------------

.. authors, Gavin Huttley, Kristian Rother, Patrick Yannul, Tom Elliott, Tony Walters, Meg Pirrung

Creating a DNA sequence from a string
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

All sequence and alignment objects have a molecular type, or ``MolType`` which provides key properties for validating sequence characters. Here we use the ``DNA`` ``MolType`` to create a DNA sequence.

.. jupyter-execute::

    from cogent3 import DNA

    my_seq = DNA.make_seq("AGTACACTGGT")
    my_seq
    print(my_seq)
    str(my_seq)

Creating a RNA sequence from a string
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. jupyter-execute::

    from cogent3 import RNA

    rnaseq = RNA.make_seq("ACGUACGUACGUACGU")

Converting to FASTA format
^^^^^^^^^^^^^^^^^^^^^^^^^^

.. jupyter-execute::

    from cogent3 import DNA

    my_seq = DNA.make_seq("AGTACACTGGT")
    print(my_seq.to_fasta())

Convert a RNA sequence to FASTA format
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. jupyter-execute::

    from cogent3 import RNA

    rnaseq = RNA.make_seq("ACGUACGUACGUACGU")
    rnaseq.to_fasta()

Creating a named sequence
^^^^^^^^^^^^^^^^^^^^^^^^^

You can also use a convenience ``make_seq()`` function, providing the moltype as a string.

.. jupyter-execute::

    from cogent3 import make_seq

    my_seq = make_seq("AGTACACTGGT", "my_gene", moltype="dna")
    my_seq
    type(my_seq)

Setting or changing the name of a sequence
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. jupyter-execute::

    from cogent3 import make_seq

    my_seq = make_seq("AGTACACTGGT", moltype="dna")
    my_seq.name = "my_gene"
    print(my_seq.to_fasta())

Complementing a DNA sequence
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. jupyter-execute::

    from cogent3 import DNA

    my_seq = DNA.make_seq("AGTACACTGGT")
    print(my_seq.complement())

Reverse complementing a DNA sequence
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. jupyter-execute::

    print(my_seq.rc())

The ``rc`` method name is easier to type

.. jupyter-execute::

    print(my_seq.rc())

.. _translation:

Translate a ``DnaSequence`` to protein
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. jupyter-execute::

    from cogent3 import DNA

    my_seq = DNA.make_seq("GCTTGGGAAAGTCAAATGGAA", "protein-X")
    pep = my_seq.get_translation()
    type(pep)
    print(pep.to_fasta())

Converting a DNA sequence to RNA
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. jupyter-execute::

    from cogent3 import DNA

    my_seq = DNA.make_seq("ACGTACGTACGTACGT")
    print(my_seq.to_rna())

Convert an RNA sequence to DNA
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. jupyter-execute::

    from cogent3 import RNA

    rnaseq = RNA.make_seq("ACGUACGUACGUACGU")
    print(rnaseq.to_dna())

Testing complementarity
^^^^^^^^^^^^^^^^^^^^^^^

.. jupyter-execute::

    from cogent3 import DNA

    a = DNA.make_seq("AGTACACTGGT")
    a.can_pair(a.complement())
    a.can_pair(a.rc())

Joining two DNA sequences
^^^^^^^^^^^^^^^^^^^^^^^^^

.. jupyter-execute::

    from cogent3 import DNA

    my_seq = DNA.make_seq("AGTACACTGGT")
    extra_seq = DNA.make_seq("CTGAC")
    long_seq = my_seq + extra_seq
    long_seq
    str(long_seq)

Slicing DNA sequences
^^^^^^^^^^^^^^^^^^^^^

.. jupyter-execute::

    my_seq[1:6]

Getting 3rd positions from codons
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The easiest approach is to work off the ``cogent3`` ``ArrayAlignment`` object.

We'll do this by specifying the position indices of interest, creating a sequence ``Feature`` and using that to extract the positions.

.. jupyter-execute::

    from cogent3 import DNA

    seq = DNA.make_array_seq("ATGATGATGATG")
    pos3 = seq[2::3]
    assert str(pos3) == "GGGG"

Getting 1st and 2nd positions from codons
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In this instance we can use the annotatable sequence classes.

.. jupyter-execute::

    from cogent3 import DNA

    seq = DNA.make_seq("ATGATGATGATG")
    indices = [(i, i + 2) for i in range(len(seq))[::3]]
    pos12 = seq.add_feature("pos12", "pos12", indices)
    pos12 = pos12.get_slice()
    assert str(pos12) == "ATATATAT"

Return a randomized version of the sequence
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

::

   print rnaseq.shuffle()
   ACAACUGGCUCUGAUG

Remove gaps from a sequence
^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. jupyter-execute::

    from cogent3 import RNA

    s = RNA.make_seq("--AUUAUGCUAU-UAu--")
    print(s.degap())