Metadata-Version: 1.0
Name: pfdo_run
Version: 2.0.2
Summary: Run arbitrary CLI on each nested dir of an inputdir
Home-page: https://github.com/FNNDSC/pfdo_med2image
Author: FNNDSC
Author-email: dev@babymri.org
License: MIT
Description: pfdo_run 2.0.2
        ==================
        
        .. image:: https://badge.fury.io/py/pfdo_med2image.svg
            :target: https://badge.fury.io/py/pfdo_med2image
        
        .. image:: https://travis-ci.org/FNNDSC/pfdo_med2image.svg?branch=master
            :target: https://travis-ci.org/FNNDSC/pfdo_med2image
        
        .. image:: https://img.shields.io/badge/python-3.5%2B-blue.svg
            :target: https://badge.fury.io/py/pfdo_med2image
        
        .. contents:: Table of Contents
        
        
        Quick Overview
        --------------
        
        -  ``pfdo_run`` demonstrates how to use ``pftree`` to transverse directory trees and execute some user specified CLI operation at each directory level (that optionally contains files of interest).
        
        Overview
        --------
        
        ``pfdo_run`` leverages the ``pfree`` callback coding contract to target a specific directory with specific files in an arbitrary file tree. At each target directory, a user specified CLI is executed on the files contents at that nested target directory.
        
        Installation
        ------------
        
        Dependencies
        ~~~~~~~~~~~~
        
        The following dependencies are installed on your host system/python3 virtual env (they will also be automatically installed if pulled from pypi):
        
        -  ``pfmisc`` (various misc modules and classes for the pf* family of objects)
        -  ``pftree`` (create a dictionary representation of a filesystem hierarchy)
        -  ``pfdo``   (the base module that does the core interfacing with ``pftree``)
        
        Using ``PyPI``
        ~~~~~~~~~~~~~~
        
        The best method of installing this script and all of its dependencies is
        by fetching it from PyPI
        
        .. code:: bash
        
                pip3 install pfdo_run
        
        CLI specification
        -----------------
        
        Any text in the CLI prefixed with a percent char ``%`` is interpreted in one of two ways.
        
        First, any CLI to the ``pfdo_run`` itself can be accessed via ``%``. Thus, for example a ``%outputDir`` in the ``--exec`` string will be expanded to the ``outputDir`` of the ``pfdo_run``.
        
        Secondly, three internal '%' variables are available:
        
        * ``%inputWorkingDir``  - the current input tree working directory
        * ``%outputWorkingDir`` - the current output tree working directory
        * ``%inputWorkingFile`` - the current file being processed
        
        These internal variables allow for contextual specification of values. For example, a simple CLI touch command could be specified as
        
        .. code:: bash
        
            --exec "touch %outputWorkingDir/%inputWorkingFile"
        
        or a command to convert an input ``png`` to an output ``jpg`` using the ImageMagick ``convert`` utility
        
        .. code:: bash
        
            --exec "convert %inputWorkingDir/%inputWorkingFile
                            %outputWorkingDir/%inputWorkingFile.jpg"
        
        Special Functions
        -----------------
        
        Furthermore, ``pfdo_run`` offers the ability to apply some interal functions to a tag. The template for specifying a function to apply is:
        
        .. code::
        
            %_<functionName>[|arg1|arg2|...]_<tag>
        
        thus, a function is identified by a ``<functionName>`` that is prefixed and suffixed by an underscore ``_`` and appears in front of the tag to process. Possible args to the ``<functionName>`` are separated by pipe ``|`` characters.
        
        For example a string snippet that contains
        
        .. code:: bash
        
            %_strrepl|.|-_inputWorkingFile.txt
        
        will replace all occurences of ``.`` in the ``%inputWorkingFile`` with ``-``. Also of interest, the trailing ``.txt`` is preserved in the final pattern for the result.
        
        The following functions are available:
        
        .. code:: html
        
            %_md5[|<len>]_<tagName>
            Apply an 'md5' hash to the value referenced by <tagName> and optionally
            return only the first <len> characters.
        
            %_strmsk|<mask>_<tagName>
            Apply a simple mask pattern to the value referenced by <tagName>. Chars
            that are "*" in the mask are passed through unchanged. The mask and its
            target should be the same length.
        
            %_strrepl|<target>|<replace>_<tagName>
            Replace the string <target> with <replace> in the value referenced by
            <tagName>.
        
            %_rmext_<tagName>
            Remove the "extension" of the value referenced by <tagName>. This
            of course only makes sense if the <tagName> denotes something with
            an extension!
        
            %_name_<tag>
            Replace the value referenced by <tag> with a name generated by the
            faker module.
        
        Functions cannot currently be nested.
        
        Command line arguments
        ----------------------
        
        .. code:: html
        
        
            -I|--inputDir <inputDir>
            Input base directory to traverse.
        
            -O|--outputDir <outputDir>
            The output root directory that will contain a tree structure identical
            to the input directory, and each "leaf" node will contain the analysis
            results.
        
            --exec <CLIcmdToExec>
            The command line expression to apply at each directory node of the
            input tree. See the CLI SPECIFICATION section for more information.
        
            [-i|--inputFile <inputFile>]
            An optional <inputFile> specified relative to the <inputDir>. If
            specified, then do not perform a directory walk, but convert only
            this file.
        
            [-f|--fileFilter <someFilter1,someFilter2,...>]
            An optional comma-delimated string to filter out files of interest
            from the <inputDir> tree. Each token in the expression is applied in
            turn over the space of files in a directory location, and only files
            that contain this token string in their filename are preserved.
        
            [-d|--dirFilter <someFilter1,someFilter2,...>]
            Similar to the `fileFilter` but applied over the space of leaf node
            in directory paths. A directory must contain at least one file
            to be considered.
        
            If a directory leaf node contains a string that corresponds to any of
            the filter tokens, a special "hit" is recorded in the file hit list,
            "%d-<leafnode>". For example, a directory of
        
                                /some/dir/in/the/inputspace/here1234
        
            with a `dirFilter` of `1234` will create a "special" hit entry of
            "%d-here1234" to tag this directory for processing.
        
            In addition, if a directory is filtered through, all the files in
            that directory will be added to the filtered file list. If no files
            are to be added, passing an explicit file filter with an "empty"
            single string argument, i.e. `--fileFilter " "`, is advised.
        
            [--analyzeFileIndex <someIndex>]
            An optional string to control which file(s) in a specific directory
            to which the analysis is applied. The default is "-1" which implies
            *ALL* files in a given directory. Other valid <someIndex> are:
        
                    'm':   only the "middle" file in the returned file list
                    "f":   only the first file in the returned file list
                    "l":   only the last file in the returned file list
                    "<N>": the file at index N in the file list. If this index
                           is out of bounds, no analysis is performed.
        
                    "-1":  all files.
        
            [--outputLeafDir <outputLeafDirFormat>]
            If specified, will apply the <outputLeafDirFormat> to the output
            directories containing data. This is useful to blanket describe
            final output directories with some descriptive text, such as
            'anon' or 'preview'.
        
            This is a formatting spec, so
        
                    --outputLeafDir 'preview-%s'
        
            where %%s is the original leaf directory node, will prefix each
            final directory containing output with the text 'preview-' which
            can be useful in describing some features of the output set.
        
            [--threads <numThreads>]
            If specified, break the innermost analysis loop into <numThreads>
            threads.
        
            [--noJobLogging]
            If specified, then suppress the logging of per-job output. Usually
            each job that is run will have, in the output directory, three
            additional files:
        
                %inputWorkingFile-returncode
                %inputWorkingFile-stderr
                %inputWorkingFile-stdout
        
            By specifying this option, the above files are not recorded.
        
            [-x|--man]
            Show full help.
        
            [-y|--synopsis]
            Show brief help.
        
            [--json]
            If specified, output a JSON dump of final return.
        
            [--followLinks]
            If specified, follow symbolic links.
        
            -v|--verbosity <level>
            Set the app verbosity level.
        
                0: No internal output;
                1: Run start / stop output notification;
                2: As with level '1' but with simpleProgress bar in 'pftree';
                3: As with level '2' but with list of input dirs/files in 'pftree';
                5: As with level '3' but with explicit file logging for
                        - read
                        - analyze
                        - write
        
        
        Examples
        --------
        
        Perform a ``pfdo_run`` down some input directory and convert all input ``jpg`` files to ``png`` in the output tree:
        
        .. code:: bash
        
            pfdo_run                                                \\
                -I /var/www/html/data --filter jpg                  \\
                -O /var/www/html/png                                \\
                --exec "convert %inputWorkingDir/%inputWorkingFile
                %outputWorkingDir/%_rmext_inputWorkingFile.png"     \\
                --threads 0 --printElapsedTime
        
        The above will find all files in the tree structure rooted at ``/var/www/html/data`` that also contain the string ``jpg`` anywhere in the filename. For each file found, a ``convert`` conversion will be called, storing a converted file in the same tree location in the output directory as the original input.
        
        Note the special construct, ``%_remext_inputWorkingFile.png`` -- the ``%_rmext_`` designates a built in funtion to apply to the tag value. In this case, to "remove the extension" from the ``%inputWorkingFile`` string.
        
        Consider an example where only one file in a branched inputdir
        space is to be preserved:
        
        .. code:: bash
        
            pfdo_run                                                \\
                -I (pwd)/raw -O (pwd)/out                           \\
                -d 100307 -f " "                                    \\
                --exec "cp %inputWorkingDir/brain.mgz
                %outputWorkingDir/brain.mgz"                        \\
                --threads 0 --verbosity 3 --noJobLogging
        
        Here, the input directory space is pruned for a directory leaf node that contains the string 100307. The exec command essentially copies the file `brain.mgz` in that target directory to the corresponding location in the output tree.
        
        Finally the elapsed time and a JSON output are printed.
        
        
Platform: UNKNOWN
