rhapsody.features package

This subpackage contains modules for computing features from multiple sources, e.g. Uniprot sequences, PDB structures, Pfam domains and EVmutation precomputed data.

rhapsody.features.queryUniprot(*args, n_attempts=3, dt=1, **kwargs)[source]

Redefine prody function to check for no internet connection

class rhapsody.features.UniprotMapping(acc, recover_pickle=False, **kwargs)[source]

Bases: object

__init__(acc, recover_pickle=False, **kwargs)[source]

Initialize self. See help(type(self)) for accurate signature.

refresh()[source]

Refresh imported Uniprot records and mappings, and delete precomputed alignments.

getFullRecord()[source]

Returns the output from queryUniprot()

getPDBrecords()[source]

Returns a dictionary containing only the ‘dbReference’ records relative to PDB, extracted from the full Uniprot record.

getPDBmappings(PDBID=None)[source]

Returns a list of dictionaries, with mappings of the Uniprot sequence onto single PDB chains. For each PDB chain, the residue intervals retrieved from the Uniprot database are parsed into a list of tuples (‘chain_sel’) corresponding to endpoints of individual segments. NB: ‘@’ stands for ‘all chains’, following Uniprot naming convention.

alignSinglePDB(PDBID, chain='longest')[source]

Aligns the Uniprot sequence with the sequence from the given PDB entry.

alignCustomPDB(PDB, chain='all', title=None, recover=False)[source]

Aligns the Uniprot sequence with the sequence from the given PDB.

alignAllPDBs(chain='longest')[source]

Aligns the Uniprot sequence with the sequences of all PDBs in the Uniprot record.

mapSingleResidue(resid, check_aa=False, depth='best')[source]

Map a single amino acid in a Uniprot sequence to PDBs. If ‘check_aa’ is True, it will return only PDB residues with the wild-type amino acid. If ‘depth’ is ‘matching’, it will use info from Uniprot record to determine which PDBs contain the given residue, and if ‘depth’ is ‘best’ only the longest chain will be considered and printed, to save time. If ‘depth’ is all, it will perform a thorough search among all PDBs (slow). The matching PDB residues will be sorted, in descending order, according to the identity of the relative chain with the Uniprot sequence.

mapSingleRes2CustomPDBs(resid, check_aa=False)[source]

Map an amino acid in the Uniprot sequence to aligned custom PDBs. If ‘check_aa’ is True, it will return only PDB residues with the wild-type amino acid.

setAlignAlgorithm(align_algorithm=1, gap_open_penalty=- 0.5, gap_ext_penalty=- 0.1, refresh=True)[source]

Set the Biopython alignment algorithm used for aligning Uniprot sequence to PDB sequences. All precomputed alignments will be deleted.

savePickle(filename=None, folder=None, store_custom_PDBs=False)[source]
recoverPickle(filename=None, folder=None, days=30, **kwargs)[source]
resetTimestamp()[source]
calcEvolProperties(resid='all', refresh=False, folder=None, max_cols=None, max_seqs=25000, **kwargs)[source]

Computes Evol properties, i.e. Shannon entropy, Mutual Information and Direct Information, from Pfam Multiple Sequence Alignments, for a given residue.

rhapsody.features.mapSAVs2PDB(SAV_coords, custom_PDB=None, refresh=False, status_file=None, status_prefix=None)[source]
rhapsody.features.seqScanning(Uniprot_coord, sequence=None)[source]

Returns a list of SAVs. If the string ‘Uniprot_coord’ is just a Uniprot ID, the list will contain all possible amino acid substitutions at all positions in the sequence. If ‘Uniprot_coord’ also includes a specific position, the list will only contain all possible amino acid variants at that position. If ‘sequence’ is ‘None’ (default), the sequence will be downloaded from Uniprot.

rhapsody.features.printSAVlist(input_SAVs, filename)[source]
class rhapsody.features.PDBfeatures(PDB, n_modes='all', recover_pickle=False, **kwargs)[source]

Bases: object

A class for deriving structural and dynamical properties from a PDB structure.

Parameters
  • PDB (Atomic, str) – an object or a PDB code identifying a PDB structure.

  • n_modes (int, str) – number of GNM/ANM modes to be computed.

  • recover_pickle (bool) – whether or not to recover precomputed pickle, if found

__init__(PDB, n_modes='all', recover_pickle=False, **kwargs)[source]

Initialize self. See help(type(self)) for accurate signature.

getPDB()[source]

Returns the parsed PDB structure as an AtomGroup object.

refresh()[source]

Deletes all precomputed ENM models and features, and resets time stamp.

recoverPickle(folder=None, filename=None, days=30, **kwargs)[source]

Looks for precomputed pickle for the current PDB structure.

Parameters
  • folder (str) – path of folder where pickles are stored. If not specified, pickles will be searched for in the local Rhapsody installation folder.

  • filename (str) – name of the pickle. If not specified, the default filename 'PDBfeatures-[PDBID].pkl' will be used. If a PDBID is not found, user must specify a valid filename.

  • days (int) – number of days after which a pickle will be considered too old and won’t be recovered.

savePickle(folder=None, filename=None)[source]

Stores a pickle of the current class instance. The pickle will contain all information and precomputed features, but not GNM and ANM models. In case a PDBID is missing, the parsed PDB AtomGroup is stored as well.

Parameters
  • folder (str) – path of the folder where the pickle will be saved. If not specified, the local Rhapsody installation folder will be used.

  • filename (str) – name of the pickle. By default, the pickle will be saved as 'PDBfeatures-[PDBID].pkl'. If a PDBID is not defined, the user must provide a filename.

Returns

pickle path

Return type

str

resetTimestamp()[source]
setNumModes(n_modes)[source]

Sets the number of ENM modes to be computed. If different from the number provided at instantiation, any precomputed features will be deleted.

calcGNM(chID, env='chain')[source]

Builds GNM model for the selected chain.

Parameters
  • chID (str) – chain identifier

  • env (str) – environment model, i.e. 'chain', 'reduced' or 'sliced'

Returns

GNM model

Return type

GNM

calcANM(chID, env='chain')[source]

Builds ANM model for the selected chain.

Parameters
  • chID (str) – chain identifier

  • env (str) – environment model, i.e. 'chain', 'reduced' or 'sliced'

Returns

ANM model

Return type

ANM

calcGNMfeatures(chain='all', env='chain', GNM_PRS=True)[source]

Computes GNM-based features.

Parameters
  • chain (str) – chain identifier

  • env (str) – environment model, i.e. 'chain', 'reduced' or 'sliced'

  • GNM_PRS (bool) – whether or not to compute features based on Perturbation Response Scanning analysis

calcANMfeatures(chain='all', env='chain', ANM_PRS=True, stiffness=True, MBS=False)[source]

Computes ANM-based features.

Parameters
  • chain (str) – chain identifier

  • env (str) – environment model, i.e. 'chain', 'reduced' or 'sliced'

  • ANM_PRS (bool) – whether or not to compute features based on Perturbation Response Scanning analysis

  • stiffness (bool) – whether or not to compute stiffness with MechStiff

  • MBS (bool) – whether or not to compute Mechanical Bridging Score

calcDSSP(chain='whole')[source]

Runs DSSP on the PDB structure.

Parameters

chain (str) – chain identifier. If 'whole', the whole complex will be considered

Returns

modified PDB object with DSSP properties added as additional attributes, accessible via method getData()

Return type

AtomGroup

calcSASA(chain='all')[source]

Computes Solvent Accessible Surface Area of single chains with DSSP algorithm.

Parameters

chain (str) – chain identifier

calcDeltaSASA(chain='all')[source]

Computes the difference between Solvent Accessible Surface Area of an isolated chain and of the same chain seen in the complex.

Parameters

chain (str) – chain identifier

calcSelFeatures(chain='all', resid=None, sel_feats=None)[source]

Computes selected PDB-based features for all chains in the PDB structure, for a specific chain or for a single residue. Available features are listed in PDB_FEATS().

Parameters
  • chain (str) – chain identifier

  • resid (int) – residue number. If selected, a single chain must be also specified

  • sel_feats – list of feature names. If None, all PDB_FEATS() will be computed

Returns

a dictionary, containing names and values (or error messages) of selected features, for each chain or residue

Return type

dict

rhapsody.features.calcPDBfeatures(mapped_SAVs, sel_feats=None, custom_PDB=None, refresh=False, status_file=None, status_prefix=None)[source]
rhapsody.features.queryPolyPhen2(filename, dump=True, prefix='pph2', fasta_file=None, fix_isoforms=False, ignore_errors=False, **kwargs)[source]
rhapsody.features.parsePolyPhen2output(pph2_output)[source]

Import PolyPhen-2 results directly from the output of ‘queryPolyPhen2’ or from a file (in ‘full’ format).

rhapsody.features.getSAVcoords(parsed_lines)[source]

Extracts SAV Uniprot coordinates as provided by the user. If not possible, the Uniprot coordinates computed by PolyPhen-2 will be returned. A string containing the original submitted SAV is returned as well.

rhapsody.features.calcPolyPhen2features(PolyPhen2output)[source]
rhapsody.features.recoverEVmutFeatures(SAVs)[source]

Compute EVmutation features by fetching precomputed scores from the downloaded local folder. If multiple values are found for a given variant, the average will be taken.

Parameters

SAVs (list or tuple of strings) – list of SAV coordinates, e.g. 'P17516 135 G E'.

Returns

an array of EVmutation features for each SAV

Return type

NumPy structured array

rhapsody.features.calcPfamFeatures(SAVs, status_file=None, status_prefix=None)[source]
rhapsody.features.calcBLOSUMfeatures(SAV_coords)[source]

Submodules

rhapsody.features.BLOSUM module

This module defines a function for deriving features from a precomputed BLOSUM substitution matrix.

rhapsody.features.BLOSUM.BLOSUM_FEATS = ['BLOSUM']

Features computed from BLOSUM62 substitution matrix.

rhapsody.features.BLOSUM.calcBLOSUMfeatures(SAV_coords)[source]

rhapsody.features.EVmutation module

This module defines a function for deriving coevolutionary features from precomputed EVmutation scores.

rhapsody.features.EVmutation.EVMUT_FEATS = ['EVmut-DeltaE_epist', 'EVmut-DeltaE_indep', 'EVmut-mut_aa_freq', 'EVmut-wt_aa_cons']

List of features derived from EVmutation database of precomputed coevolution-based scores.

rhapsody.features.EVmutation.recoverEVmutFeatures(SAVs)[source]

Compute EVmutation features by fetching precomputed scores from the downloaded local folder. If multiple values are found for a given variant, the average will be taken.

Parameters

SAVs (list or tuple of strings) – list of SAV coordinates, e.g. 'P17516 135 G E'.

Returns

an array of EVmutation features for each SAV

Return type

NumPy structured array

rhapsody.features.PDB module

This module defines a class that organizes the calculation of PDB-based structural and dynamical features in a single place, and a function for using the latter on a list of PDB SAV coordinates.

rhapsody.features.PDB.STR_FEATS = ['SASA', 'SASA_in_complex', 'Delta_SASA']

List of available structural features.

rhapsody.features.PDB.DYN_FEATS = ['GNM_MSF', 'ANM_MSF', 'GNM_effectiveness', 'GNM_sensitivity', 'ANM_effectiveness', 'ANM_sensitivity', 'stiffness']

List of available dynamical features.

rhapsody.features.PDB.PDB_FEATS = ['SASA', 'SASA_in_complex', 'Delta_SASA', 'GNM_MSF-chain', 'GNM_MSF-reduced', 'GNM_MSF-sliced', 'ANM_MSF-chain', 'ANM_MSF-reduced', 'ANM_MSF-sliced', 'GNM_effectiveness-chain', 'GNM_effectiveness-reduced', 'GNM_effectiveness-sliced', 'GNM_sensitivity-chain', 'GNM_sensitivity-reduced', 'GNM_sensitivity-sliced', 'ANM_effectiveness-chain', 'ANM_effectiveness-reduced', 'ANM_effectiveness-sliced', 'ANM_sensitivity-chain', 'ANM_sensitivity-reduced', 'ANM_sensitivity-sliced', 'stiffness-chain', 'stiffness-reduced', 'stiffness-sliced']

List of available PDB-based structural and dynamical features. The latter can be computed by using three different environment models.

class rhapsody.features.PDB.PDBfeatures(PDB, n_modes='all', recover_pickle=False, **kwargs)[source]

Bases: object

A class for deriving structural and dynamical properties from a PDB structure.

Parameters
  • PDB (Atomic, str) – an object or a PDB code identifying a PDB structure.

  • n_modes (int, str) – number of GNM/ANM modes to be computed.

  • recover_pickle (bool) – whether or not to recover precomputed pickle, if found

__init__(PDB, n_modes='all', recover_pickle=False, **kwargs)[source]

Initialize self. See help(type(self)) for accurate signature.

getPDB()[source]

Returns the parsed PDB structure as an AtomGroup object.

refresh()[source]

Deletes all precomputed ENM models and features, and resets time stamp.

recoverPickle(folder=None, filename=None, days=30, **kwargs)[source]

Looks for precomputed pickle for the current PDB structure.

Parameters
  • folder (str) – path of folder where pickles are stored. If not specified, pickles will be searched for in the local Rhapsody installation folder.

  • filename (str) – name of the pickle. If not specified, the default filename 'PDBfeatures-[PDBID].pkl' will be used. If a PDBID is not found, user must specify a valid filename.

  • days (int) – number of days after which a pickle will be considered too old and won’t be recovered.

savePickle(folder=None, filename=None)[source]

Stores a pickle of the current class instance. The pickle will contain all information and precomputed features, but not GNM and ANM models. In case a PDBID is missing, the parsed PDB AtomGroup is stored as well.

Parameters
  • folder (str) – path of the folder where the pickle will be saved. If not specified, the local Rhapsody installation folder will be used.

  • filename (str) – name of the pickle. By default, the pickle will be saved as 'PDBfeatures-[PDBID].pkl'. If a PDBID is not defined, the user must provide a filename.

Returns

pickle path

Return type

str

resetTimestamp()[source]
setNumModes(n_modes)[source]

Sets the number of ENM modes to be computed. If different from the number provided at instantiation, any precomputed features will be deleted.

calcGNM(chID, env='chain')[source]

Builds GNM model for the selected chain.

Parameters
  • chID (str) – chain identifier

  • env (str) – environment model, i.e. 'chain', 'reduced' or 'sliced'

Returns

GNM model

Return type

GNM

calcANM(chID, env='chain')[source]

Builds ANM model for the selected chain.

Parameters
  • chID (str) – chain identifier

  • env (str) – environment model, i.e. 'chain', 'reduced' or 'sliced'

Returns

ANM model

Return type

ANM

calcGNMfeatures(chain='all', env='chain', GNM_PRS=True)[source]

Computes GNM-based features.

Parameters
  • chain (str) – chain identifier

  • env (str) – environment model, i.e. 'chain', 'reduced' or 'sliced'

  • GNM_PRS (bool) – whether or not to compute features based on Perturbation Response Scanning analysis

calcANMfeatures(chain='all', env='chain', ANM_PRS=True, stiffness=True, MBS=False)[source]

Computes ANM-based features.

Parameters
  • chain (str) – chain identifier

  • env (str) – environment model, i.e. 'chain', 'reduced' or 'sliced'

  • ANM_PRS (bool) – whether or not to compute features based on Perturbation Response Scanning analysis

  • stiffness (bool) – whether or not to compute stiffness with MechStiff

  • MBS (bool) – whether or not to compute Mechanical Bridging Score

calcDSSP(chain='whole')[source]

Runs DSSP on the PDB structure.

Parameters

chain (str) – chain identifier. If 'whole', the whole complex will be considered

Returns

modified PDB object with DSSP properties added as additional attributes, accessible via method getData()

Return type

AtomGroup

calcSASA(chain='all')[source]

Computes Solvent Accessible Surface Area of single chains with DSSP algorithm.

Parameters

chain (str) – chain identifier

calcDeltaSASA(chain='all')[source]

Computes the difference between Solvent Accessible Surface Area of an isolated chain and of the same chain seen in the complex.

Parameters

chain (str) – chain identifier

calcSelFeatures(chain='all', resid=None, sel_feats=None)[source]

Computes selected PDB-based features for all chains in the PDB structure, for a specific chain or for a single residue. Available features are listed in PDB_FEATS().

Parameters
  • chain (str) – chain identifier

  • resid (int) – residue number. If selected, a single chain must be also specified

  • sel_feats – list of feature names. If None, all PDB_FEATS() will be computed

Returns

a dictionary, containing names and values (or error messages) of selected features, for each chain or residue

Return type

dict

rhapsody.features.PDB.calcPDBfeatures(mapped_SAVs, sel_feats=None, custom_PDB=None, refresh=False, status_file=None, status_prefix=None)[source]

rhapsody.features.Pfam module

This module defines a function for computing conservation and coevolution properties of an amino acid substitution from a Pfam multiple sequence alignment.

rhapsody.features.Pfam.PFAM_FEATS = ['entropy', 'ranked_MI']

List of features computed from Pfam multiple sequence alignments.

rhapsody.features.Pfam.calcPfamFeatures(SAVs, status_file=None, status_prefix=None)[source]

rhapsody.features.PolyPhen2 module

This module defines functions for querying the PolyPhen-2 online tool, parsing its output and deriving features that will be used by the Rhapsody classifiers.

rhapsody.features.PolyPhen2.PP2_FEATS = ['wt_PSIC', 'Delta_PSIC']

List of features derived from PolyPhen-2’s output.

rhapsody.features.PolyPhen2.queryPolyPhen2(filename, dump=True, prefix='pph2', fasta_file=None, fix_isoforms=False, ignore_errors=False, **kwargs)[source]
rhapsody.features.PolyPhen2.parsePolyPhen2output(pph2_output)[source]

Import PolyPhen-2 results directly from the output of ‘queryPolyPhen2’ or from a file (in ‘full’ format).

rhapsody.features.PolyPhen2.getSAVcoords(parsed_lines)[source]

Extracts SAV Uniprot coordinates as provided by the user. If not possible, the Uniprot coordinates computed by PolyPhen-2 will be returned. A string containing the original submitted SAV is returned as well.

rhapsody.features.PolyPhen2.calcPolyPhen2features(PolyPhen2output)[source]

rhapsody.features.Uniprot module

This module defines a class and relative functions for mapping Uniprot sequences to PDB and Pfam databases.

rhapsody.features.Uniprot.queryUniprot(*args, n_attempts=3, dt=1, **kwargs)[source]

Redefine prody function to check for no internet connection

class rhapsody.features.Uniprot.UniprotMapping(acc, recover_pickle=False, **kwargs)[source]

Bases: object

__init__(acc, recover_pickle=False, **kwargs)[source]

Initialize self. See help(type(self)) for accurate signature.

refresh()[source]

Refresh imported Uniprot records and mappings, and delete precomputed alignments.

getFullRecord()[source]

Returns the output from queryUniprot()

getPDBrecords()[source]

Returns a dictionary containing only the ‘dbReference’ records relative to PDB, extracted from the full Uniprot record.

getPDBmappings(PDBID=None)[source]

Returns a list of dictionaries, with mappings of the Uniprot sequence onto single PDB chains. For each PDB chain, the residue intervals retrieved from the Uniprot database are parsed into a list of tuples (‘chain_sel’) corresponding to endpoints of individual segments. NB: ‘@’ stands for ‘all chains’, following Uniprot naming convention.

alignSinglePDB(PDBID, chain='longest')[source]

Aligns the Uniprot sequence with the sequence from the given PDB entry.

alignCustomPDB(PDB, chain='all', title=None, recover=False)[source]

Aligns the Uniprot sequence with the sequence from the given PDB.

alignAllPDBs(chain='longest')[source]

Aligns the Uniprot sequence with the sequences of all PDBs in the Uniprot record.

mapSingleResidue(resid, check_aa=False, depth='best')[source]

Map a single amino acid in a Uniprot sequence to PDBs. If ‘check_aa’ is True, it will return only PDB residues with the wild-type amino acid. If ‘depth’ is ‘matching’, it will use info from Uniprot record to determine which PDBs contain the given residue, and if ‘depth’ is ‘best’ only the longest chain will be considered and printed, to save time. If ‘depth’ is all, it will perform a thorough search among all PDBs (slow). The matching PDB residues will be sorted, in descending order, according to the identity of the relative chain with the Uniprot sequence.

mapSingleRes2CustomPDBs(resid, check_aa=False)[source]

Map an amino acid in the Uniprot sequence to aligned custom PDBs. If ‘check_aa’ is True, it will return only PDB residues with the wild-type amino acid.

setAlignAlgorithm(align_algorithm=1, gap_open_penalty=- 0.5, gap_ext_penalty=- 0.1, refresh=True)[source]

Set the Biopython alignment algorithm used for aligning Uniprot sequence to PDB sequences. All precomputed alignments will be deleted.

savePickle(filename=None, folder=None, store_custom_PDBs=False)[source]
recoverPickle(filename=None, folder=None, days=30, **kwargs)[source]
resetTimestamp()[source]
calcEvolProperties(resid='all', refresh=False, folder=None, max_cols=None, max_seqs=25000, **kwargs)[source]

Computes Evol properties, i.e. Shannon entropy, Mutual Information and Direct Information, from Pfam Multiple Sequence Alignments, for a given residue.

rhapsody.features.Uniprot.mapSAVs2PDB(SAV_coords, custom_PDB=None, refresh=False, status_file=None, status_prefix=None)[source]
rhapsody.features.Uniprot.seqScanning(Uniprot_coord, sequence=None)[source]

Returns a list of SAVs. If the string ‘Uniprot_coord’ is just a Uniprot ID, the list will contain all possible amino acid substitutions at all positions in the sequence. If ‘Uniprot_coord’ also includes a specific position, the list will only contain all possible amino acid variants at that position. If ‘sequence’ is ‘None’ (default), the sequence will be downloaded from Uniprot.

rhapsody.features.Uniprot.printSAVlist(input_SAVs, filename)[source]