rhapsody.predict package¶
This subpackage contains the core Rhapsody class, the main interface and relative functions needed for obtaining predictions from trained classifiers.
- class rhapsody.predict.Rhapsody(query=None, query_type='SAVs', queryPolyPhen2=True, **kwargs)[source]¶
Bases:
object
A class implementing the Rhapsody algorithm for pathogenicity prediction of human missense variants and that can also be used to compare results from other prediction tools, namely PolyPhen-2 and EVmutation.
- __init__(query=None, query_type='SAVs', queryPolyPhen2=True, **kwargs)[source]¶
Initialize a Rhapsody object with a list of SAVs (optional).
- Parameters
query (str, list) –
Single Amino Acid Variants (SAVs) in Uniprot coordinates.
If None, the SAV list can be imported at a later moment, by using
.importPolyPhen2output()
,.queryPolyPhen2()
or.setSAVs()
if query_type =
'SAVs'
(default), query should be a filename, a string or a list/tuple of strings, containing Uniprot SAV coordinates, with the format'P17516 135 G E'
. The string could also be just a single Uniprot sequence identifier (e.g.'P17516'
), or the coordinate of a specific site in a sequence (e.g.'P17516 135'
), in which case all possible 19 amino acid substitutions at the specified positions will be analyzed.if query_type =
'PolyPhen2'
, query should be a filename containing the output from PolyPhen-2, usually namedpph2-full.txt
query_type (str) –
'SAVs'
or'PolyPhen2'
queryPolyPhen2 (bool) – if
True
, the PolyPhen-2 online tool will be queryied with the list of SAVs
- getUniprot2PDBmap(filename='rhapsody-Uniprot2PDB.txt', print_header=True, refresh=False)[source]¶
Maps each SAV to the corresponding resid in a PDB chain.
- getPredictions(SAV='all', classifier='best', PolyPhen2=True, EVmutation=True, PDBcoords=False, refresh=False)[source]¶
- getResAvgPredictions(resid=None, classifier='best', PolyPhen2=True, EVmutation=True, refresh=False)[source]¶
- printPredictions(classifier='best', PolyPhen2=True, EVmutation=True, filename='rhapsody-predictions.txt', print_header=True)[source]¶
- rhapsody.predict.rhapsody(query, query_type='SAVs', main_classifier=None, aux_classifier=None, custom_PDB=None, force_env=None, refresh=False, log=True, **kwargs)[source]¶
Obtain Rhapsody pathogenicity predictions on a list of human missense variants ([ref])
- Parameters
query (str, list) –
Single Amino Acid Variants (SAVs) in Uniprot coordinates
if query_type =
'SAVs'
(default), it should be a filename, a string or a list/tuple of strings, containing Uniprot SAV coordinates, with the format'P17516 135 G E'
. The string could also be just a single Uniprot sequence identifier (e.g.'P17516'
), or the coordinate of a specific site in a sequence (e.g.'P17516 135'
), in which case all possible 19 amino acid substitutions at the specified positions will be analyzed.if query_type =
'PolyPhen2'
, it should be a filename containing the output from PolyPhen-2, usually namedpph2-full.txt
query_type (str) –
'SAVs'
or'PolyPhen2'
main_classifier (str) – main classifier’s filename. If None, the default full Rhapsody classifier will be used
aux_classifier (str) – auxiliary classifier’s filename. If both main_classifier and aux_classifier are None, the default reduced Rhapsody classifier will be used
custom_PDB (str,
AtomGroup
) – a PDBID, a filename or anAtomic
to be used for computing structural and dynamical features, instead of the PDB structure automatically selected by the programforce_env (str) – force a specific environment model for GNM/ANM calculations, among
'chain'
,'reduced'
and'sliced'
. If None (default), the model of individual dynamical features will match that found in the classifier’s feature setrefresh (str) – if True, precomputed features and PDB mappings found in the working directory will be ignored and computed again
log (str) – if True, log messages will be saved in
rhapsody-log.txt
- ref
Ponzoni L, Bahar I. Structural dynamics is a determinant of the functional significance of missense variants. PNAS 2018 115 (16) 4164-4169.
- rhapsody.predict.print_sat_mutagen_figure(filename, rhapsody_obj, res_interval=None, PolyPhen2=True, EVmutation=True, extra_plot=None, fig_height=8, fig_width=None, dpi=300, min_interval_size=15, html=False, main_clsf='main', aux_clsf='aux.')[source]¶
Submodules¶
rhapsody.predict.core module¶
This module defines the main class used for running the pre-trained classifiers and organizing its predictions.
- class rhapsody.predict.core.Rhapsody(query=None, query_type='SAVs', queryPolyPhen2=True, **kwargs)[source]¶
Bases:
object
A class implementing the Rhapsody algorithm for pathogenicity prediction of human missense variants and that can also be used to compare results from other prediction tools, namely PolyPhen-2 and EVmutation.
- __init__(query=None, query_type='SAVs', queryPolyPhen2=True, **kwargs)[source]¶
Initialize a Rhapsody object with a list of SAVs (optional).
- Parameters
query (str, list) –
Single Amino Acid Variants (SAVs) in Uniprot coordinates.
If None, the SAV list can be imported at a later moment, by using
.importPolyPhen2output()
,.queryPolyPhen2()
or.setSAVs()
if query_type =
'SAVs'
(default), query should be a filename, a string or a list/tuple of strings, containing Uniprot SAV coordinates, with the format'P17516 135 G E'
. The string could also be just a single Uniprot sequence identifier (e.g.'P17516'
), or the coordinate of a specific site in a sequence (e.g.'P17516 135'
), in which case all possible 19 amino acid substitutions at the specified positions will be analyzed.if query_type =
'PolyPhen2'
, query should be a filename containing the output from PolyPhen-2, usually namedpph2-full.txt
query_type (str) –
'SAVs'
or'PolyPhen2'
queryPolyPhen2 (bool) – if
True
, the PolyPhen-2 online tool will be queryied with the list of SAVs
- getUniprot2PDBmap(filename='rhapsody-Uniprot2PDB.txt', print_header=True, refresh=False)[source]¶
Maps each SAV to the corresponding resid in a PDB chain.
- getPredictions(SAV='all', classifier='best', PolyPhen2=True, EVmutation=True, PDBcoords=False, refresh=False)[source]¶
- getResAvgPredictions(resid=None, classifier='best', PolyPhen2=True, EVmutation=True, refresh=False)[source]¶
- printPredictions(classifier='best', PolyPhen2=True, EVmutation=True, filename='rhapsody-predictions.txt', print_header=True)[source]¶
rhapsody.predict.figures module¶
This module defines functions for generating figures that help inspect the predictions obtained from the main class.
rhapsody.predict.main module¶
This module defines the standard interface for running Rhapsody prediction algorithm.
- rhapsody.predict.main.rhapsody(query, query_type='SAVs', main_classifier=None, aux_classifier=None, custom_PDB=None, force_env=None, refresh=False, log=True, **kwargs)[source]¶
Obtain Rhapsody pathogenicity predictions on a list of human missense variants ([ref])
- Parameters
query (str, list) –
Single Amino Acid Variants (SAVs) in Uniprot coordinates
if query_type =
'SAVs'
(default), it should be a filename, a string or a list/tuple of strings, containing Uniprot SAV coordinates, with the format'P17516 135 G E'
. The string could also be just a single Uniprot sequence identifier (e.g.'P17516'
), or the coordinate of a specific site in a sequence (e.g.'P17516 135'
), in which case all possible 19 amino acid substitutions at the specified positions will be analyzed.if query_type =
'PolyPhen2'
, it should be a filename containing the output from PolyPhen-2, usually namedpph2-full.txt
query_type (str) –
'SAVs'
or'PolyPhen2'
main_classifier (str) – main classifier’s filename. If None, the default full Rhapsody classifier will be used
aux_classifier (str) – auxiliary classifier’s filename. If both main_classifier and aux_classifier are None, the default reduced Rhapsody classifier will be used
custom_PDB (str,
AtomGroup
) – a PDBID, a filename or anAtomic
to be used for computing structural and dynamical features, instead of the PDB structure automatically selected by the programforce_env (str) – force a specific environment model for GNM/ANM calculations, among
'chain'
,'reduced'
and'sliced'
. If None (default), the model of individual dynamical features will match that found in the classifier’s feature setrefresh (str) – if True, precomputed features and PDB mappings found in the working directory will be ignored and computed again
log (str) – if True, log messages will be saved in
rhapsody-log.txt
- ref
Ponzoni L, Bahar I. Structural dynamics is a determinant of the functional significance of missense variants. PNAS 2018 115 (16) 4164-4169.