Metadata-Version: 2.1
Name: molstruct
Version: 1.0.1
Summary: Convert chemical molecule data CSV files to structured data formats
Home-page: https://github.com/lszeremeta/molstruct
Author: Łukasz Szeremeta
Author-email: l.szeremeta.dev@gmail.com
License: MIT
Description: # <img src="https://raw.githubusercontent.com/lszeremeta/molstruct/master/logo/molstruct.png" alt="Molstruct logo" width="300">
        
        Converts chemical molecule data [Comma Separated Values (CSV)](https://en.wikipedia.org/wiki/Comma-separated_values) files to structured data formats - [JSON-LD](https://json-ld.org/), [RDFa](http://rdfa.info/) and [Microdata](https://schema.org/docs/gs.html). Supported
        CSV columns: `identifier`, `name`, `inChIKey`, `inChI`, `smiles`, `url`, `iupacName`, `molecularFormula`, `molecularWeight`, `monoisotopicMolecularWeight`, `description`, `disambiguatingDescription`, `image`, `additionalType`, `alternateName` and `sameAs`.  Works from CLI on Python 3.2 and above. Molstruct is lightweight. No additional dependencies are required.
        
        ## What are structured data
        
        Structured data are additional data placed on websites. They are not visible to ordinary internet users, but can be easily processed by machines. There are 3 formats that we can use to save structured data - [JSON-LD](https://json-ld.org/), [RDFa](http://rdfa.info/) and [Microdata](https://www.w3.org/TR/microdata/). Molstruct supports them all and use [MolecularEntitly](https://bioschemas.org/types/MolecularEntity/) type.
        
        ## Where to find a CSV file with molecule data
        
        There are many possibilities. The easiest way is to download a CSV file from one of the chemical databases, e.g. [DrugBank](https://www.drugbank.ca/releases/latest#open-data). You can also create the CSV file yourself.
        
        ## Installation
        
        You can install the Molstruct from [PyPI](https://pypi.org/project/molstruct/):
        
            pip install molstruct
        
        Python 3.2 and above are supported. No additional dependencies are required.
        
        ## Usage
        
            usage: molstruct [-h] (-jh | -j | -r | -m) [-i IDENTIFIER] [-n NAME] [-ink INCHIKEY]
                             [-in INCHI] [-s SMILES] [-u URL] [-iu IUPACNAME]
                             [-f MOLECULARFORMULA] [-w MOLECULARWEIGHT]
                             [-mw MONOISOTOPICMOLECULARWEIGHT] [-d DESCRIPTION]
                             [-dd DISAMBIGUATINGDESCRIPTION] [-img IMAGE] [-at ADDITIONALTYPE]
                             [-an ALTERNATENAME] [-sa SAMEAS] [-c] [-l LIMIT]
                             file
        
        ### Positional arguments
        
            file                  CSV file with molecule data to convert
        
        ### Optional arguments
        
              -h, --help            show this help message and exit
              -jh, --jsonldhtml     JSON-LD with HTML output
              -j, --jsonld          JSON-LD output
              -r, --rdfa            RDFa output
              -m, --microdata       Microdata output
              -i IDENTIFIER, --identifier IDENTIFIER
                                    identifier column name (identifier by default), Text
              -n NAME, --name NAME  name column name (name by default), Text
              -ink INCHIKEY, --inChIKey INCHIKEY
                                    inChIKey column name (inChIKey by default), Text
              -in INCHI, --inChI INCHI
                                    inChI column name (inChI by default), Text
              -s SMILES, --smiles SMILES
                                    smiles column name (smiles by default), Text
              -u URL, --url URL     url column name (url by default), URL type
              -iu IUPACNAME, --iupacName IUPACNAME
                                    iupacName column name (iupacName by default), Text
              -f MOLECULARFORMULA, --molecularFormula MOLECULARFORMULA
                                    molecularFormula column name (molecularFormula by
                                    default), Text
              -w MOLECULARWEIGHT, --molecularWeight MOLECULARWEIGHT
                                    molecularWeight column name (molecularWeight by
                                    default), Mass e.g. 0.01 mg)
              -mw MONOISOTOPICMOLECULARWEIGHT, --monoisotopicMolecularWeight MONOISOTOPICMOLECULARWEIGHT
                                    monoisotopicMolecularWeight column name
                                    (monoisotopicMolecularWeight by default), Mass e.g.
                                    0.01 mg
              -d DESCRIPTION, --description DESCRIPTION
                                    description column name (description by default), Text
              -dd DISAMBIGUATINGDESCRIPTION, --disambiguatingDescription DISAMBIGUATINGDESCRIPTION
                                    disambiguatingDescription column name
                                    (disambiguatingDescription by default), Text
              -img IMAGE, --image IMAGE
                                    image column name (image by default), URL
              -at ADDITIONALTYPE, --additionalType ADDITIONALTYPE
                                    additionalType column name (additionalType by
                                    default), URL
              -an ALTERNATENAME, --alternateName ALTERNATENAME
                                    alternateName column name (alternateName by default),
                                    Text
              -sa SAMEAS, --sameAs SAMEAS
                                    sameAs column name (sameAs by default), URL
              -c, --columns         Use only columns with renamed names
              -l LIMIT, --limit LIMIT
                                    Maximum number of results
        
        Available options may vary depending on the version. To display all available options with their descriptions use ``molstruct -h``.
        
        ## Examples
        
            molstruct --rdfa data.csv
        Returns simple HTML with added RDFa. Assumes that the column names in CSV file are the default ones.
        
            molstruct --microdata -f "formula" data.csv
        Returns simple HTML with added Microdata. Assumes that the column names in CSV file are the default ones but replaces default `molecularformula` column name by `formula`.
        
            molstruct --microdata --columns --id "CAS" --name "Common name" --inChIKey "Standard InChI Key" --limit 50 "drugbank vocabulary.csv"
        Returns simple HTML with added Microdata. When generating a file, only selected columns will be taken into account. A limit of 50 molecules has been specified.
        
            molstruct --microdata --columns --id "CAS" --name "Common name" --inChIKey "Standard InChI Key" --limit 50 "drugbank vocabulary.csv" > output.html
        Do the same as example above but save results to `output.html`.
        
        ## Contribution
        
        Would you like to improve this project? Great! We are waiting for your help and suggestions. If you are new in open source contributions, read [How to Contribute to Open Source](https://opensource.guide/how-to-contribute/).
        
        ## License
        
        Distributed under [MIT license](https://github.com/lszeremeta/molstruct/blob/master/LICENSE).
        
        ## See also
        
        These projects can also be useful:
        
        * [SDFEater](https://github.com/lszeremeta/SDFEater) - Always hungry SDF chemical file format parser with many output formats
        * [MEgen](https://github.com/domel/MEgen) - Convenient online form to generate structured data about molecules
        
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.2
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Topic :: Internet
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Chemistry
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Classifier: Topic :: Text Processing
Classifier: Topic :: Text Processing :: General
Classifier: Topic :: Text Processing :: Markup
Classifier: Topic :: Utilities
Requires-Python: >=3.2
Description-Content-Type: text/markdown
