# -*- coding: utf-8 -*-
from setuptools import setup

packages = \
['pycashier']

package_data = \
{'': ['*']}

install_requires = \
['rich-click>=1.2.1,<2.0.0',
 'rich>=12.0.0,<13.0.0',
 'ruamel.yaml>=0.17.21,<0.18.0']

entry_points = \
{'console_scripts': ['pycashier = pycashier.cli:cli']}

setup_kwargs = {
    'name': 'pycashier',
    'version': '0.3.0',
    'description': 'cash in on expressed barcode tags',
    'long_description': '[![Forks][forks-shield]][forks-url]\n[![Stargazers][stars-shield]][stars-url]\n[![Issues][issues-shield]][issues-url]\n[![MIT License][license-shield]][license-url]\n\n# Pycashier\n\nTool for extracting and processing DNA barcode tags from Illumina sequencing.\n\nDefault parameters are designed for use by the [Brock Lab](https://github.com/brocklab) to process data generated from\nClonMapper lineage tracing experiments, but is extensible to other similarly designed tools.\n\n\n### Bioconda Dependencies\n- [cutadapt](https://github.com/marcelm/cutadapt) (sequence extraction)\n- [starcode](https://github.com/gui11aume/starcode) (sequence clustering)\n- [fastp](https://github.com/OpenGene/fastp) (merging/quality filtering)\n- [pysam](https://github.com/pysam-developers/pysam) (sam file conversion to fastq)\n\n### Pip/conda-forge Dependencies\n- [rich](https://github.com/Textualize/rich)\n- [rich-click](https://github.com/ewels/rich-click)\n- [ruamel.yaml](https://sourceforge.net/p/ruamel-yaml/code/ci/default/tree/)\n\n## Installation\nIt\'s recommended to use [conda](https://docs.conda.io/en/latest/)/[mamba](https://github.com/mamba-org/mamba) to install and manage the dependencies for this package.\n\n```bash\nconda install -c conda-forge -c bioconda cutadapt fastp pysam starcode pycashier\n```\n\nYou can also use the included `environment.yml` to create your environment and install everything you need.\n\n```bash\nconda env create -f https://raw.githubusercontent.com/brocklab/pycashier/main/environment.yml\nconda activate cashierenv\n```\n\nAdditionally you may install with pip. Though it will be up to you to ensure all the\ndependencies you would install from bioconda are on your path and installed correctly.\n`Pycashier` will check for them before running.\n\n```bash\npip install pycashier\n```\n\n## Usage\n\nAs of `v0.3.0` the interface of `pycashier` has changed. Previously a positional argument was used to indicate the source directory and additional flags would set the operation.\nNow `pycashier` uses `click` and a series of commands.\n\nAs always use though use `pycashier --help` and additionally `pycashier <COMMAND> --help` for the full list of parameters.\n\nSee below for a brief explanation of each command.\n\n### Extract\n\nThe primary use case of pycashier is extracting 20bp sequences from illumina generated fastq files.\nThis can be accomplished with the below command where `./fastqs` is a directory containing all of your fastq files.\n\n```bash\npycashier extract -i ./fastqs\n```\n\n`Pycashier` will attempt to extract file names from your `.fastq` files using the first string delimited by a period.\n\nFor example:\n- `sample1.fastq`: sample1\n- `sample2.metadata_pycashier.will.ignore.fastq`: sample2\n\nAs `pycashier extract` runs, two directories will be generated `./pipeline` and `./outs`, configurable with `-p/--pipeline` and `-o/--output` respectively.\n\nYour `pipeline` directory will contain all files and data generated while performing barcode extraction and clustering.\nWhile `outs` will contain a single `.tsv` for each sample with the final barcode counts.\n\nExpected output of `pycashier extract`:\n\n```bash\nfastqs\n└── sample.raw.fastq\npipeline\n├── qc\n│\xa0\xa0 ├── sample.html\n│\xa0\xa0 └── sample.json\n├── sample.q30.barcode.fastq\n├── sample.q30.barcodes.r3d1.tsv\n├── sample.q30.barcodes.tsv\n└── sample.q30.fastq\nouts\n└── sample.q30.barcodes.r3d1.min176_off1.tsv\n```\n\n*NOTE*: If you wish to provide `pycashier` with fastq files containing only your barcode you can supply the `--skip-trimming` flag.\n\n### Merge\n\nIn some cases your data may be from paired-end sequencing. If you have two fastq files per sample\nthat overlap on the barcode region they can be combined with `pycashier merge`.\nthat overlap on the barcode region they can be combined with `pycashier merge`.\n\n\n```bash\npycashier merge -i ./fastqgz\n```\n\nBy default your output will be in `mergedfastqs`. Which you can then pass back to `pycashier` with `pycashier extract -i mergedfastqs`.\n\nFor single read, files are `<sample>.fastq` now they should both contain R1 and R2 and additionally may be gzipped.\n\nFor example:\n- `sample.raw.R1.fastq.gz`,`sample.raw.R2.fastq.gz`: sample\n- `sample.R1.fastq`,`sample.R2.fastq`: sample\n- `sample.fastq`: fail, not R1 and R2\n\n\n### Scrna\n\nIf your DNA barcodes are expressed and detectable in 10X 3\'-based transcriptomic sequencing,\nthen you can extract these tags with `pycashier` and their associated umi/cell barcodes from the `cellranger` output.\n\nFor `pycashier scrna` we extract our reads from sam files.\nThis file can be generated using the output of `cellranger count`.\nFor each sample you would run:\n```\nsamtools view -f 4 $CELLRANGER_COUNT_OUTPUT/sample1/outs/possorted_genome_bam.bam > sams/sample1.unmapped.sam\n```\nThis will generate a sam file containing only the unmapped reads.\n\nThen similar to normal barcode extraction you can pass a directory of these unmapped sam files to pycashier and extract barcodes. You can also still specify extraction parameters that will be passed to cutadapt as usual.\n\n*Note*: The default parameters passed to cutadapt are unlinked adapters and minimum barcode length of 10 bp.\n\n```\npycashier scrna -i sams\n```\n\nWhen finished the `outs` directory will have a `.tsv` containing the following columns: Illumina Read Info, UMI Barcode, Cell Barcode, gRNA Barcode\n\n### Combine\n\nThis command can be used if you wish to generate a combined tsv from all files including headers and sample information.\nBy default it uses `./outs` for input and `./combined.tsv` for output.\n\n## Config File\n\nAs of `v0.3.0` you may generate and supply `pycashier` with a yaml config file using `-c/--config`.\nThe expected structure is each command followed by key value pairs of flags with hypens replaced by underscores:\n\n```yaml\nextract:\n  input: fastqs\n  threads: 10\n  unqualified_percent: 100\nmerge:\n  input: rawfastqgz\n  output: fastqs\n  fastp_args: "-t 1"\n```\n\nThe order of precedence for arguments is command line > config file > defaults.\n\nFor example if you were to use the above `config.yml` with `pycashier extract -c config.yml -t 15`.\nThe value used for threads would be 15.\nYou can confirm the parameter values as they will be printed prior to any execution.\n\nFor convenience, you can update/create your config file with `pycasher COMMAND --save-config [explicit|full] -c config.yml`.\n\n"Explicit" will only save parameters already included in the config file or specified at runtime.\n"Full" will include all parameters again maintaining preset values in config or specified at runtime.\n\n## Non-Configurable Defaults\n\nSee below for the non-configurable flags provided to external tools in each command. Refer to their documentation regarding the purpose of these flags.\n\n### Extract\n\n- `fastp`: `--dont_eval_duplication`\n- `cutadapt`: `--max-n=0 -n 2`\n\n### Merge\n\n- `fastp`: `-m -c -G -Q -L`\n\n### Scrna\n\n- `cutadapt`: `--max-n=0 -n 2`\n\n## Usage notes\nPycashier will **NOT** overwrite intermediary files. If there is an issue in the process, please delete either the pipeline directory or the requisite intermediary files for the sample you wish to reprocess. This will allow the user to place new fastqs within the source directory or a project folder without reprocessing all samples each time.\n- If there are reads from multiple lanes they should first be concatenated with `cat sample*R1*.fastq.gz > sample.R1.fastq.gz`\n- Naming conventions:\n    - Sample names are extracted from files using the first string delimited with a period. Please take this into account when naming sam or fastq files.\n    - Each processing step will append information to the input file name to indicate changes, again delimited with periods.\n\n\n## Acknowledgments\n\n[Cashier](https://github.com/russelldurrett/cashier) is a tool developed by Russell Durrett for the analysis and extraction of expressed barcode tags.\nThis version like it\'s predecessor wraps around several command line bioinformatic tools to pull out expressed barcode tags.\n\n\n\n[forks-shield]: https://img.shields.io/github/forks/brocklab/pycashier.svg?style=flat\n[forks-url]: https://github.com/brocklab/pycashier/network/members\n[stars-shield]: https://img.shields.io/github/stars/brocklab/pycashier.svg?style=flat\n[stars-url]: https://github.com/brocklab/pycashier/stargazers\n[issues-shield]: https://img.shields.io/github/issues/brocklab/pycashier.svg?style=flat\n[issues-url]: https://github.com/brocklab/pycashier/issues\n[license-shield]: https://img.shields.io/github/license/brocklab/pycashier.svg?style=flat\n[license-url]: https://github.com/brocklab/pycashier/blob/main/LICENSE\n',
    'author': 'Daylin Morgan',
    'author_email': 'daylinmorgan@gmail.com',
    'maintainer': None,
    'maintainer_email': None,
    'url': 'https://github.com/brocklab/pycashier/',
    'packages': packages,
    'package_data': package_data,
    'install_requires': install_requires,
    'entry_points': entry_points,
    'python_requires': '>=3.7,<3.10',
}


setup(**setup_kwargs)
