Metadata-Version: 2.1
Name: rdkit_to_params
Version: 1.0.2
Summary: Create or modify Rosetta params files (topology files) from scratch, RDKit mols or another params file.
Home-page: https://github.com/matteoferla/rdkit_to_params
Author: Matteo Ferla
Author-email: matteo.ferla@gmail.com
License: MIT
Description: # RDKit to params
        Create or modify Rosetta params files (topology files) from scratch, RDKit mols or another params file.
        
        > RDKit is actually an optional module, but most of the useful functionality comes from it!
        
        To install from pip type:
        
            pip install rdkit-to-params
            
        To install the latest version (probably the same) from GitHub
        
            git clone https://github.com/matteoferla/rdkit_to_params.git
            pip install .
        
        ## Legal thingamabob
        The author, Matteo Ferla, is not affiliated with either Rosetta or RDKit and the presence of the latter's
        name in the package's title is completely coincidental.
        And yes, I am copying my legal mumbojumbo from South Park.
        
        ## Rationale
        This is a fresh rewrite of ``mol_to_params.py``. For three reasons:
        
        * I cannot share my 2to3 port and modded module-version of ``mol_to_params.py`` due to licence.
        * I want to modify `params` files and more as opposed to use a standalone script.
        * RDKit does not save ``mol2`` files, yet knows about atom names and Gasteiger-Massilli charges and more...
        
        It sounds mad, but did not actually take too long.
        
        ## Roundtrip
        Native amino acid params files can be found in the Rosetta folder
        `rosetta/main/database/chemical/residue_type_sets/fa_standard/residue_types/l-caa`
        Let's do a roundtrip changing an atomname:
        
            import pyrosetta
            pyrosetta.init(extra_options='-mute all') # required for test
            from rdkit_to_params import Params
            
            p = Params.load('PHE.params')
            p.IO_STRING[0].name3 = 'PHX'
            p.IO_STRING[0].name1 = 'Z'
            p.AA = 'UNK'  #If it's not one of the twenty (plus extras), UNK!
            del p.ROTAMER_AA[0]
            p.rename_atom(' CB ', ' CX ')
            p.dump('fake.params')
            p.test().dump_pdb('test.pdb')
            
        `p.test()` returns a pyrosetta pose. The static method `params_to_pose('something.params', name3)` accepts a params file
            
            import nglview
            pose = Params.params_to_pose('some_topology_I_found.params', name3)
            view = nglview.show_rosetta(pose)
            view
        
        ## From mol object
        ### Requirements
        For the sake of sanity, `EmbedMolecule`, `Chem.AddHs(mol)` or any other operation is assumed to have been done beforehand.
        And that the user is going to do `Chem.MolToPDBFile(mol)` or `Chem.MolToPDBBlock(mol)` or use the bound methods of `Params`,
        `dump_pdb` and `dump_pdb_conf` (see below).
        
        The molecule should preferably be **not** Kekulised.
        3letter name of residue is either from the title row (``_Name``) if a 3letter word or from the PDBInfo or 'LIG'.
        
        Dummy atom (*/R) is assumed to be a CONNECT —ligand only atm.
        
        Here is a conversion to an amino acid from a SMILES (quickest way):
        
            import pyrosetta
            pyrosetta.init(extra_options='-mute all')
            from rdkit_to_params import Params
            p = Params.from_smiles('*C(=O)C(Cc1ccccc1)[NH]*', #recognised as amino acid.
                    name='PHX', #optional.
                    atomnames={3: 'CZ'} #optional, rando atom name as see in previous edit
                    )
            print(p.is_aminoacid()) # True
            p.dump('fake.params')
            p.test().dump_pdb('test.pdb')
            Chem.MolToPDBFile(mol, 'ref.pdb')
                    
        Here is a conversion to a ligand the circuitous way, just for fun:
        
            import pyrosetta
            pyrosetta.init(extra_options='-mute all')
            # note that pyrosetta needs to be started before rdkit.
            from rdkit_to_params import Params
            # make the molecule in RDKit or chemdraw or download it or whatever.
            mol = Chem.MolFromSmiles('NC(C(=O)O)Cc1ccccc1')
            mol = AllChem.AddHs(mol)
            AllChem.EmbedMolecule(mol)
            AllChem.MMFFOptimizeMolecule(mol)
            # add names to the mol beforehand
            Params.add_names(mol, names=['N', 'CA', 'C', 'O', 'OXT', 'CB'], name='PHZ')
            # parameterise
            p = Params.from_mol(mol, name='PHZ')
            p.test().dump_pdb('test.pdb')
            Chem.MolToPDBFile('ref.pdb')
         
        The class method `add_names` is based upon atom index
        (which is derived from the SMILES or sdf/mol file unless atoms have been replaced).
        The instance method `rename_by_template` accepts a substructure and a list of atom names in the order they are in the substructure.
            
        Note that conformer generation is not fully automatic and is not done by default.
        
            # make your conformers as you desire
            AllChem.EmbedMultipleConfs(mol, numConfs=10) # or whatever you choose. This is a somewhat important decision.
            AllChem.AlignMolConformers(mol) # I do not know if the conformers need to be aligned for Rosetta
            # params time!
            p = Params.from_mol(mol, name='LIG') 
            p.dump_pdb_conf('LIG_conf.pdb')
            p.PDB_ROTAMERS.append('LIG_conf.pdb')
            p.dump('my_params.params')
            
        Note `dump_pdb` and `dump_pdb_conf` will save the molecule(s) without the dummy atoms, to stop this add `stripped=False`.
        
        ## From SMILES string
        The above is actually a bit convoluted for example purposes as `Params.from_smiles`, accepts a SMILES string.
        
        ## From SMILES string and PDB for names    
        In some cases one has a PDB file with a ligand that needs parameterising.
        Assuming one has also the smiles of the ligand (PubChem has an super easy search), one can do
        
            p = Params.from_smiles_w_pdbfile(pdb_file, smiles, 'XXX') # the name has to match.
            
        The smiles does not need to match full. It can contain more atoms or even one`*` (CONNECT).
        The smiles gets parameterised. So be suse to add correct charges properly —hydrogens are added.
        It could be used for scaffold hopping, but if position matters so much,
        you may be interested in [Fragmenstein](https://github.com/matteoferla/Fragmenstein).
        
        For more see [autogenerated documentation](sphinx-documentation.md). Sphinx with markdown cannot deal with typehinting,
        so checking the code might be clearer.
        
        ### DIY
        
        If you have two mol objects from whatever routes, the basic operation is:
        
            p = Params.load_mol(mol, generic=False, name='LIG')
            p.rename_from_template(template) # or whatever middle step
            
            p.convert_mol()
        
        Note that `convert_mol` should be called once and is already called in the two `from_XXX` classmethods.
        
            p = Params.from_mol(...)
            p.convert_mol() # No!!!
            p.mol # is the mol...
            p2 = Params.load_mol(p.mol)
            p2.convert_mol() # Yes
        
        ## Constraints
        
        The selfstanding class `Constraints` is for generating constraint files, which are a must with covalent attachments
        in order to stop janky topologies.
        The class is instantiated with a pair of SMILES, each with at least a real atom and with one attachment point,
        the first is the ligand and the second is its peptide target. The names of the heavy atoms and the Rosetta residue "numbers".
        
            from rdkit_to_params import Constraints
            c = Constraints(smiles=('*C(=N)', '*SC'), names= ['*', 'CX', 'NY', '*', 'SG', 'CB'], ligand_res= '1B', target_res='145A')
            c.dump('con.con')
            # individual strings can be accessed
            c.atom_pair_constraint
            c.angle_constraint
            c.dihedral_constaint
            c.custom_constaint # if you want to add your own before `str`, `.dumps`, `.dump`.
        
        Do note that to make covalent links work in Rosetta, NGL and a few other places you need a LINK record, here is a f-string
        for it:
        
            f'LINK        {target_atom: >4} {target_resn: >3} {p_chain[:1]} {target_resi: >3}                '+\
            f'{ligand_atom: >4} {ligand_resn: >3} {ligand_chain[:1]} {ligand_resi: >3}     1555   1555  1.8\n'
        
        This is not to be confused with CCP4 REFMAC's `LINKR`, which are however easy to covert.
        Alternatively, you can add it after importing the pose, _cf._ `pose.residue(lig_pos).connect_map`.
        
        ## Bond order
        It is worth mentioning that the bond order specified in the topology file in the `BOND_ORDER` lines is mostly ignored 
        and the bond order is derived from the rosetta types that get assigned. 
        To extract and correct a ligand, consider the following
        
            
            # pose to string
            buffer = pyrosetta.rosetta.std.stringbuf()
            pose.dump_pdb(pyrosetta.rosetta.std.ostream(buffer))
            pdbblock = buffer.str()
            # get the residue
            mol = Chem.MolFromPDBBlock(pdbblock, proximityBonding=False, removeHs=False)
            ligand = Chem.SplitMolByPDBResidues(mol, whiteList=[params.NAME])[params.NAME]
            # fix bond order
            template = AllChem.DeleteSubstructs(params.mol, Chem.MolFromSmiles('*'))
            AllChem.AssignBondOrdersFromTemplate(template, ligand)
        
        ## Caveat: I do not know many things!
        
        ### Chemical
        I suspect I am doing stuff weirdly and I am meant to create ligands via ``pyrosetta.rosetta.core.chemical`` and not via params files... If this is so let me know. I don't mind knowing I made a mistake!
        
        ### Generic
        I like this generic atom type business, but I am not sure how to use them in RL.
        ``from_mol(mol, generic=True)`` will make generic atom types.
        I made several guesses with the classic atom types and I am sure many things are wrong...
        
        ### Rings and cis-trans
        I don't really understand what `CUT_BOND` does. It has to do with rings, 
        `ADD_RING` is not implemented in the `from_mol` conversion as I think it's an old command.
        Does a cis-trans tautomer bond (say `C(=O)-C=O`) gets a `CHI` entry? I am assuming no, but not sure.
        
        ## To Do
        I have not coded yet, because I forgot:
        * ~~an auto-assignment of `NBR_ATOM` and `NBR_RADIUS` for `from_mol`.~~
        * add rotamer line in `from_mol`
        * change option to override starting atom.
        * tweak the logic of `NAME` after some thinking.
        * ~~output constrain file for the CONNECT atom.~~
        * make a webpage to do the conversion from mol/sdf/pdb/SMILES —suggestions for free JS molecule editor?
        
        The `from_mol` class method recognises `*[NH]CC(~O)*` and assigns it as a backbone properly.
        However, `Chem.MolFromSmiles('*[NH]CC(~O)*')` cannot be embedded, so is a bit of a horrible one for users to use.
        Cystathionine and similar amino acids are the problem as I cannot simply make an amino acid backbone be recognised,
        however if protonated as is the case '[NH1]C[CH0](=O)'
        Maybe the `CC(=O)NCC(=O)NC` option may be a better choice after all.
Platform: UNKNOWN
Requires-Python: >3.6
Description-Content-Type: text/markdown
