rhapsody.predict package

This subpackage contains the core Rhapsody class, the main interface and relative functions needed for obtaining predictions from trained classifiers.

class rhapsody.predict.Rhapsody(query=None, query_type='SAVs', queryPolyPhen2=True, **kwargs)[source]

Bases: object

A class implementing the Rhapsody algorithm for pathogenicity prediction of human missense variants and that can also be used to compare results from other prediction tools, namely PolyPhen-2 and EVmutation.

__init__(query=None, query_type='SAVs', queryPolyPhen2=True, **kwargs)[source]

Initialize a Rhapsody object with a list of SAVs (optional).

Parameters
  • query (str, list) –

    Single Amino Acid Variants (SAVs) in Uniprot coordinates.

    • If None, the SAV list can be imported at a later moment, by using .importPolyPhen2output(), .queryPolyPhen2() or .setSAVs()

    • if query_type = 'SAVs' (default), query should be a filename, a string or a list/tuple of strings, containing Uniprot SAV coordinates, with the format 'P17516 135 G E'. The string could also be just a single Uniprot sequence identifier (e.g. 'P17516'), or the coordinate of a specific site in a sequence (e.g. 'P17516 135'), in which case all possible 19 amino acid substitutions at the specified positions will be analyzed.

    • if query_type = 'PolyPhen2', query should be a filename containing the output from PolyPhen-2, usually named pph2-full.txt

  • query_type (str) – 'SAVs' or 'PolyPhen2'

  • queryPolyPhen2 (bool) – if True, the PolyPhen-2 online tool will be queryied with the list of SAVs

setSAVs(query)[source]
queryPolyPhen2(query, filename='rhapsody-SAVs.txt')[source]
importPolyPhen2output(filename)[source]
getSAVcoords()[source]
setFeatSet(featset)[source]
setCustomPDB(custom_PDB)[source]
setTrueLabels(true_label_dict)[source]
getUniprot2PDBmap(filename='rhapsody-Uniprot2PDB.txt', print_header=True, refresh=False)[source]

Maps each SAV to the corresponding resid in a PDB chain.

getPDBcoords()[source]
getUniqueSAVcoords()[source]
calcFeatures(filename='rhapsody-features.txt', refresh=False)[source]
importFeatMatrix(struct_array)[source]
exportTrainingData(refresh=False)[source]
importPrecomputedExtraFeatures(features_dict)[source]
importClassifiers(classifier, aux_classifier=None, force_env=None)[source]
getPredictions(SAV='all', classifier='best', PolyPhen2=True, EVmutation=True, PDBcoords=False, refresh=False)[source]
getResAvgPredictions(resid=None, classifier='best', PolyPhen2=True, EVmutation=True, refresh=False)[source]
printPredictions(classifier='best', PolyPhen2=True, EVmutation=True, filename='rhapsody-predictions.txt', print_header=True)[source]
writePDBs(PDBID=None, predictions='best', path_prob=True, filename_prefix='rhapsody-PDB', refresh=False)[source]
savePickle(filename='rhapsody-pickle.pkl')[source]
rhapsody.predict.calcPredictions(feat_matrix, clsf, SAV_coords=None)[source]
rhapsody.predict.rhapsody(query, query_type='SAVs', main_classifier=None, aux_classifier=None, custom_PDB=None, force_env=None, refresh=False, log=True, **kwargs)[source]

Obtain Rhapsody pathogenicity predictions on a list of human missense variants ([ref])

Parameters
  • query (str, list) –

    Single Amino Acid Variants (SAVs) in Uniprot coordinates

    • if query_type = 'SAVs' (default), it should be a filename, a string or a list/tuple of strings, containing Uniprot SAV coordinates, with the format 'P17516 135 G E'. The string could also be just a single Uniprot sequence identifier (e.g. 'P17516'), or the coordinate of a specific site in a sequence (e.g. 'P17516 135'), in which case all possible 19 amino acid substitutions at the specified positions will be analyzed.

    • if query_type = 'PolyPhen2', it should be a filename containing the output from PolyPhen-2, usually named pph2-full.txt

  • query_type (str) – 'SAVs' or 'PolyPhen2'

  • main_classifier (str) – main classifier’s filename. If None, the default full Rhapsody classifier will be used

  • aux_classifier (str) – auxiliary classifier’s filename. If both main_classifier and aux_classifier are None, the default reduced Rhapsody classifier will be used

  • custom_PDB (str, AtomGroup) – a PDBID, a filename or an Atomic to be used for computing structural and dynamical features, instead of the PDB structure automatically selected by the program

  • force_env (str) – force a specific environment model for GNM/ANM calculations, among 'chain', 'reduced' and 'sliced'. If None (default), the model of individual dynamical features will match that found in the classifier’s feature set

  • refresh (str) – if True, precomputed features and PDB mappings found in the working directory will be ignored and computed again

  • log (str) – if True, log messages will be saved in rhapsody-log.txt

ref

Ponzoni L, Bahar I. Structural dynamics is a determinant of the functional significance of missense variants. PNAS 2018 115 (16) 4164-4169.

rhapsody.predict.print_sat_mutagen_figure(filename, rhapsody_obj, res_interval=None, PolyPhen2=True, EVmutation=True, extra_plot=None, fig_height=8, fig_width=None, dpi=300, min_interval_size=15, html=False, main_clsf='main', aux_clsf='aux.')[source]

Submodules

rhapsody.predict.core module

This module defines the main class used for running the pre-trained classifiers and organizing its predictions.

class rhapsody.predict.core.Rhapsody(query=None, query_type='SAVs', queryPolyPhen2=True, **kwargs)[source]

Bases: object

A class implementing the Rhapsody algorithm for pathogenicity prediction of human missense variants and that can also be used to compare results from other prediction tools, namely PolyPhen-2 and EVmutation.

__init__(query=None, query_type='SAVs', queryPolyPhen2=True, **kwargs)[source]

Initialize a Rhapsody object with a list of SAVs (optional).

Parameters
  • query (str, list) –

    Single Amino Acid Variants (SAVs) in Uniprot coordinates.

    • If None, the SAV list can be imported at a later moment, by using .importPolyPhen2output(), .queryPolyPhen2() or .setSAVs()

    • if query_type = 'SAVs' (default), query should be a filename, a string or a list/tuple of strings, containing Uniprot SAV coordinates, with the format 'P17516 135 G E'. The string could also be just a single Uniprot sequence identifier (e.g. 'P17516'), or the coordinate of a specific site in a sequence (e.g. 'P17516 135'), in which case all possible 19 amino acid substitutions at the specified positions will be analyzed.

    • if query_type = 'PolyPhen2', query should be a filename containing the output from PolyPhen-2, usually named pph2-full.txt

  • query_type (str) – 'SAVs' or 'PolyPhen2'

  • queryPolyPhen2 (bool) – if True, the PolyPhen-2 online tool will be queryied with the list of SAVs

setSAVs(query)[source]
queryPolyPhen2(query, filename='rhapsody-SAVs.txt')[source]
importPolyPhen2output(filename)[source]
getSAVcoords()[source]
setFeatSet(featset)[source]
setCustomPDB(custom_PDB)[source]
setTrueLabels(true_label_dict)[source]
getUniprot2PDBmap(filename='rhapsody-Uniprot2PDB.txt', print_header=True, refresh=False)[source]

Maps each SAV to the corresponding resid in a PDB chain.

getPDBcoords()[source]
getUniqueSAVcoords()[source]
calcFeatures(filename='rhapsody-features.txt', refresh=False)[source]
importFeatMatrix(struct_array)[source]
exportTrainingData(refresh=False)[source]
importPrecomputedExtraFeatures(features_dict)[source]
importClassifiers(classifier, aux_classifier=None, force_env=None)[source]
getPredictions(SAV='all', classifier='best', PolyPhen2=True, EVmutation=True, PDBcoords=False, refresh=False)[source]
getResAvgPredictions(resid=None, classifier='best', PolyPhen2=True, EVmutation=True, refresh=False)[source]
printPredictions(classifier='best', PolyPhen2=True, EVmutation=True, filename='rhapsody-predictions.txt', print_header=True)[source]
writePDBs(PDBID=None, predictions='best', path_prob=True, filename_prefix='rhapsody-PDB', refresh=False)[source]
savePickle(filename='rhapsody-pickle.pkl')[source]
rhapsody.predict.core.calcPredictions(feat_matrix, clsf, SAV_coords=None)[source]

rhapsody.predict.figures module

This module defines functions for generating figures that help inspect the predictions obtained from the main class.

rhapsody.predict.figures.print_sat_mutagen_figure(filename, rhapsody_obj, res_interval=None, PolyPhen2=True, EVmutation=True, extra_plot=None, fig_height=8, fig_width=None, dpi=300, min_interval_size=15, html=False, main_clsf='main', aux_clsf='aux.')[source]

rhapsody.predict.main module

This module defines the standard interface for running Rhapsody prediction algorithm.

rhapsody.predict.main.rhapsody(query, query_type='SAVs', main_classifier=None, aux_classifier=None, custom_PDB=None, force_env=None, refresh=False, log=True, **kwargs)[source]

Obtain Rhapsody pathogenicity predictions on a list of human missense variants ([ref])

Parameters
  • query (str, list) –

    Single Amino Acid Variants (SAVs) in Uniprot coordinates

    • if query_type = 'SAVs' (default), it should be a filename, a string or a list/tuple of strings, containing Uniprot SAV coordinates, with the format 'P17516 135 G E'. The string could also be just a single Uniprot sequence identifier (e.g. 'P17516'), or the coordinate of a specific site in a sequence (e.g. 'P17516 135'), in which case all possible 19 amino acid substitutions at the specified positions will be analyzed.

    • if query_type = 'PolyPhen2', it should be a filename containing the output from PolyPhen-2, usually named pph2-full.txt

  • query_type (str) – 'SAVs' or 'PolyPhen2'

  • main_classifier (str) – main classifier’s filename. If None, the default full Rhapsody classifier will be used

  • aux_classifier (str) – auxiliary classifier’s filename. If both main_classifier and aux_classifier are None, the default reduced Rhapsody classifier will be used

  • custom_PDB (str, AtomGroup) – a PDBID, a filename or an Atomic to be used for computing structural and dynamical features, instead of the PDB structure automatically selected by the program

  • force_env (str) – force a specific environment model for GNM/ANM calculations, among 'chain', 'reduced' and 'sliced'. If None (default), the model of individual dynamical features will match that found in the classifier’s feature set

  • refresh (str) – if True, precomputed features and PDB mappings found in the working directory will be ignored and computed again

  • log (str) – if True, log messages will be saved in rhapsody-log.txt

ref

Ponzoni L, Bahar I. Structural dynamics is a determinant of the functional significance of missense variants. PNAS 2018 115 (16) 4164-4169.