Metadata-Version: 2.1
Name: treemaker
Version: 1.4
Summary: A python tool for generating a Newick formatted tree from alist of classifications
Home-page: https://github.com/SimonGreenhill/treemaker
Author: Simon J. Greenhill
Author-email: simon@simon.net.nz
License: BSD
Description: # treemaker
        
        A Python library for creating a Newick formatted tree from a set of classification strings (e.g. a taxonomy)
        
        [![Build Status](https://travis-ci.org/SimonGreenhill/treemaker.svg?branch=master)](https://travis-ci.org/SimonGreenhill/treemaker)
        [![Coverage Status](https://coveralls.io/repos/SimonGreenhill/treemaker/badge.svg?branch=master&service=github)](https://coveralls.io/github/SimonGreenhill/treemaker?branch=master)
        [![DOI](https://zenodo.org/badge/22704/SimonGreenhill/treemaker.svg)](https://zenodo.org/badge/latestdoi/22704/SimonGreenhill/treemaker)
        [![status](http://joss.theoj.org/papers/19eae6958062fc8a72d8a02efdaf8b23/status.svg)](http://joss.theoj.org/papers/19eae6958062fc8a72d8a02efdaf8b23)
        
        ```treemaker``` is a Python library to convert a text-based classification schema into a Newick file for use in phylogenetic and bioinformatic programs.
        
        Research in linguistics or cultural evolution often produces or uses tree taxonomies or classifications. However, these are usually not in a format readily available for use in programs that can understand and manipulate trees. For example, the global taxonomy of languages published by the [Ethnologue](https://www.ethnologue.com/) classifies languages into families and subgroups using a taxonomy string e.g. the language [Kalam](https://www.ethnologue.com/language/kmh) is classified as "Trans-New Guinea, Madang, Kalam-Kobon", while [Mauwake](https://www.ethnologue.com/language/mhl) is classified as "Trans-New Guinea, Madang, Croisilles, Pihom", and [Kare](https://www.ethnologue.com/language/kmf) is "Trans-New Guinea, Madang, Croisilles, Kare". This classification indicates that while all these languages are part of the Madang subgroup of the Trans-New Guinea language family, Kare and Mauwake are more closely related (as they belong to the Croisilles subgroup).
        
        Other publications use a tabular indented format to demarcate relationships, such as the example in Figure 1 from Stephen Wurm's classification of his proposed Yele-Solomons language phylum (Wurm 1975).
        
        Both the taxonomy string and tabular format however are hard to load into software packages that can analyse, compare, visualise and manipulate trees. ```treemaker``` aims to make this easy by converting taxonomic data into [Newick](https://en.wikipedia.org/wiki/Newick_format) and Nexus (Maddison 1997) formats commonly used by phylogenetic manipulation programs.
        
        ## Converting a Taxonomy to a Tree:
        
        ```treemaker``` can convert a text file with a taxonomy to a tree. These taxonomies can easily be obtained from Ethnologue or manually entered, such as this example from Wurm's (outdated) classification of Yele-Solomons in Figure 1:
        
        ```text
        Bilua       Yele-Solomons, Central Solomon
        Baniata     Yele-Solomons, Central Solomon
        Lavukaleve  Yele-Solomons, Central Solomon
        Savosavo    Yele-Solomons, Central Solomon
        Kazukuru    Yele-Solomons, Kazukuru
        Guliguli    Yele-Solomons, Kazukuru
        Dororo      Yele-Solomons, Kazukuru
        Yele        Yele-Solomons
        ```
        
        ``treemaker`` can then generate a Newick representation:
        
        ```text
        ((Baniata,Bilua,Lavukaleve,Savosavo),(Dororo,Guliguli,Kazukuru),Yele);
        ```
        
        ...which can then be loaded into phylogenetic programs to visualise or manipulate as in Figure 2.
        
        ```treemaker``` has been used to enable the analyses in (Bromham et al. 2018), and a number of forthcoming articles.
        
        
        ![Example of a language taxonomy in indented format from Wurm (1975).](wurm1975.png)
        
        ![Tree visualisation of the relationships between the putative Yele-Solomons languages.](tree.png)
        
        
        ## Installation:
        
        Installation is only a pip install away:
        
        ```shell
        pip install treemaker
        ```
        
        Or from git:
        
        ```shell
        git clone https://github.com/SimonGreenhill/treemaker/ treemaker
        cd treemaker
        python setup.py install
        ```
        
        ## Usage: Command line:
        
        Basic usage: 
        
        ```shell
        > treemaker
        
        usage: treemaker [-h] [-o OUTPUT] [-m {nexus,newick}] [--labels] input
        ```
        
        e.g. Given a text file:
        
        ```
        LangA   Indo-European, Germanic
        LangB   Indo-European, Germanic
        LangC   Indo-European, Romance
        LangD   Indo-European, Anatolian
        ```
        
        ... then you can build a taxonomy/classification tree from that as follows:
        
        ```shell
        > treemaker classification.txt
        (LangD,(LangA,LangB),LangC);
        
        # with nodelabels:
        > treemaker --labels classification.txt
        (LangD,(LangA,LangB)Germanic,LangC)Indo-European;
        
        > treemaker -m nexus classification.txt
        
        #NEXUS
        
        begin trees;
           tree root = (LangD,(LangA,LangB),LangC);
        end;
        ```
        
        To write to file:
        
        ```shell
        > treemaker classification.txt
        (LangD,(LangA,LangB),LangC);
        
        > treemaker classification.txt -o classification.nex
        ```
        
        
        ## Usage: Library:
        
        ```python
        from treemaker import TreeMaker
        ```
        
        ### generate a tree manually:
        
        ```python
        from treemaker import TreeMaker
        
        t = TreeMaker()
        t.add('A1', 'family a, subgroup 1')
        t.add('A2', 'family a, subgroup 2')
        t.add('B1a', 'family b, subgroup 1')
        t.add('B1b', 'family b, subgroup 1')
        t.add('B2', 'family b, subgroup 2')
        
        print(t.write())
        ```
        
        ### Add from a list:
        
        ```python
        from treemaker import TreeMaker
        
        taxa = [
            ('A1', 'family a, subgroup 1'),
            ('A2', 'family a, subgroup 2'),
            ('B1a', 'family b, subgroup 1'),
            ('B1b', 'family b, subgroup 1'),
            ('B2', 'family b, subgroup 2'),
        ]
        
        t = TreeMaker()
        t.add_from(taxa)
        
        print(t.write())
        
        ```
        
        ## API Documentation:
        
        The API is [documented here](https://simongreenhill.github.io/treemaker/build/html/index.html).
        
        ## Running treemaker's tests:
        
        To run treemaker's tests simply run:
        
        ```shell
        > make test
        # or
        > python setup.py test
        # or
        > python treemaker/test_treemaker.py
        ```
        
        
        ## Version History:
        
        * v1.4: fix bug with no terminating semicolon in nexus file output.
        * v1.3: add nodelabels support, add some rudimentary input checking.
        
        ## Support:
        
        For questions on how to use or update this, feel free to [open an issue](https://github.com/SimonGreenhill/treemaker/issues). I'll get to it as soon as I can. 
        
        ## Acknowledgements:
        
        Thank you to [Richard Littauer](https://github.com/RichardLitt), [Mitsuhiro Nakamura](https://github.com/mnacamura), and [Dillon Niederhut](https://github.com/deniederhut).
        
        ## References:
        
        * Bromham, Lindell, Xia Hua, Marcel Cardillo, Hilde Schneemann, & Simon J. Greenhill. 2018. “[Parasites and Politics: Why Cross-Cultural Studies Must Control for Relatedness, Proximity and Covariation](https://doi.org/10.1098/rsos.181100).” Open Science 5 (8). https://doi.org/10.1098/rsos.181100.
        * Maddison, D R, D L Swofford, & Wayne P. Maddison. 1997. “[Nexus: An Extensible File Format for Systematic Information](https://doi.org/10.1093/sysbio/46.4.590).” Systematic Biology 46 (4): 590–621. https://doi.org/10.1093/sysbio/46.4.590.
        * Wurm, S. A. 1975. “[The East Papuan Phylum in General](https://doi.org/http://dx.doi.org/10.15144/PL-C38).” In New Guinea Area Languages and Language Study: Papuan Languages and the New Guinea Linguistic Scene, edited by S. A. Wurm. Canberra: Pacific Linguistics. https://doi.org/http://dx.doi.org/10.15144/PL-C38.
        
Keywords: phylogenetics newick taxonomy
Platform: UNKNOWN
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: BSD License
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Description-Content-Type: text/markdown
