Have a look at the demonstration provided by OpenHelix
Table of Contents
The global query enables the user to retrieve Swiss-Prot entries, diseases and variants from a disease, a protein/gene name, a Swiss-Prot accession number, or a variant identifier (FTID or rsID).
If the text entered corresponds to a MeSH disease or if it is a MeSH descriptor identifier (DUI), the returned Swiss-Prot entries and variants are those indexed with the given MeSH descriptors or its children.
If the text is a MIM number or a Swiss-Prot disease, the entries returned are those for which the given disease, or MIM number, has been extracted from the Swiss-Prot disease comment line.
If the text is a gene name, a protein name or an accession number, the entry returned is the protein, its diseases and variants, only if it corresponds to a human protein having at least one variant or one disease association.
If the text is a variant identifier (FTID (UniProtKB) or rsID (dbSNP)), the corresponding protein is returned with the diseases associated to this variant specifically.
If the text entered does not correspond to any identifier, protein or gene name or exact MeSH disease, the proteins returned are the one which disease (MeSH or disease as extracted from the disease comment line) contains the text.
The disease query enables the user to retrieve Swiss-Prot entries and variants from a disease.
If the disease entered corresponds to a MeSH disease or if it is a MeSH descriptor identifier (DUI), the returned Swiss-Prot entries and variants are those indexed with the given MeSH descriptors or its children.
If the disease entered does not correspond to a MeSH term, or if it is a MIM number, the entries returned are those for which the given disease, or MIM number, has been extracted from the Swiss-Prot disease comment line
The user can enter one disease or several MeSH descriptor identifiers (DUI) or several MIM numbers separated by spaces.
Disease file upload
The file can contain diseases or MeSH descriptor identifiers (DUI) or MIM numbers each on a new line.
Proteins and variants linked to disease
Proteins and variants linked to the disease are searched. It means that all the proteins implicated in the disease are returned even if no variants are known to be associated to the disease.
Variants linked to disease
Only proteins whose variants are known to be associated to the disease are searched.
The Medical Subject Headings (MeSH) terminology is a controlled vocabulary thesaurus used for biomedical and health-related documents indexing. It is maintained and used by the National Library of Medicine. ( MeSH Home Page ).
About two third of the Swiss-Prot entries known to be implicated in a disease have been automatically mapped to the MeSH terminology ( Mottaz, A. et al. (2008) BMC Bioinformatics. Apr 29;9 Suppl 5:S3. ).
The general query enables the user to retrieve Swiss-Prot entries and variants using Swiss-Prot accession number or identifier, protein name or gene name.
If the searched protein is not a Swiss-Prot human protein containing variant or disease annotation, it will not be found (see Protein not found ).
The user can enter one gene/protein name or several Swiss-Prot accession numbers or identifiers separated by spaces.
General file upload
The file can contain Swiss-Prot accession numbers, identifiers, protein names or gene names, each on a new line.
The variant query enables the user to search for variants with specific molecular characteristics. The Swiss-Prot variants are systematically classified into three categories: "polymorphism", "disease" or "unclassified".
- Polymorphism: A variant is classified as "Polymorphism" if no disease-association has been reported;
- Disease: A variant is classified as "Disease" when it is found in patients and disease-association is reported in literature. However, this classification is not a definitive assessment of pathogenicity;
- Unclassified: A variant is "unclassified" if disease-association remains unclear.
The user can enter one or several variants identifiers such as Swiss-Prot FTID or dbSNP rsID separated by spaces.
The file can contain one or several variants identifiers such as Swiss-Prot FTID or dbSNP rsID each on a new line
Substitution amino acids
The user can specify for the desired variants the wild-type residue or the mutated residue or both. Polar amino acids include: Arginine, Lysine, Aspartate, Glutamate, Asparagine and Glutamine. Hydrophobic amino acids include: Valine, Isoleucine, Leucine, Methionine, Phenylalanine, Tryptophan and Cysteine.
The user can specify for the desired variants a threshold for the blosum score. The Blosum score is the score within a Blosum matrix for the corresponding wild-type to variant amino acid change. The log-odds score measures the logarithm for the ratio of the likelihood of two amino acids appearing by chance. The Blosum62 substitution matrix is used. This substitution matrix contains scores for all possible exchanges of one amino acid with another.
Lowest score: -4 (low probability of substitution), highest score: 11 (high probability of substitution)
The user can specify for the desired variants a threshold for the conservation score. The score is a decimal number between 0 and 1. The score was calculated using orthologous sequences from the Orthologs Matrix Project (OMA) project ( Schneider, A. (2007) Bioinformatics. 23(16): 2180-2182. ). The computation involves several steps:
- Identify to which OMA group the UniProt sequence belongs;
- Perform multiple sequences alignment of all the sequences belonging to the OMA group identified above using MAFFT alignment program ( Katoh, K.; Misawa, K.; Kuma, K.; Miyata, T. (2002) Nucleic Acids Res. 30(15): 3059-3066. );
- Compute the diversity of the alignment as well as the conservation score of each residue (or position) of the UniProt sequence using the program SCORECONS ( Valdar, W.S. (2002) Proteins. 48(2): 227-241. ).
Protein features in sequence neighbourhood
The user can find variants close in the sequence to a feature. He can specify the distance threshold between the mutated residue and the feature, distance that is a number of residue.
The user can find variants that have been mapped on an experimental 3 dimensional structure.
3D homology models
The user can find variants for which an available protein homology model(s) exists. The models were constructed using PromodII, the core program of SWISS-MODEL ( Guex, N and Peitsch, M.C. Electrophoresis 18_2714-2723, 1997 ).
Protein homology models were constructed only for proteins that have a suitable structural template deposited in the Protein Data Bank (PDB). The sequence identity between the Swiss-Prot protein sequence and the PDB template is at least 70%. In addition, only crystal structures with better than 2.5 A resolution are selected as templates. In cases where there are several suitable templates, an additional selection step will be performed to select only templates that are significantly different from each other, i.e. they display a root mean square deviation (rmsd) of more than 1.5 A.
The user can choose to retrieve variants whose wild type residue is surface accessible or buried, by specifying the solvent-accessible surface area (SAS). The SAS is calculated using the MSMS program. We can consider that the variant is surface accessible if the SAS is greater than 0. ( Sanner, M.F., Olson, A.J. & Spehner, J.C. (1996). Biopolymers, 38:305-320. ).
The user can choose to retrieve variants whose wild type residue is involved in a protein-protein interface.
We consider that a residue is involved in the interface if one of its atoms is located within a distance r of an atom of a residue present in another protein chain. In the "carbon alpha" method, we only consider the atom carbon alpha of the residue and the distance r is set to 6 Å. In the "Van der Waal" method, all atoms are taken into consideration, and the distance r is set to 4.5 Å.
Protein features in 3D neighbourhood
The user can specify for the desired variants a feature that is close to the wild type residue in the 3D structure. The distance radius between the wild type residue and the feature can vary between 3 to 6 angstroms and can be chosen by the user. The mapping of the Swiss-Prot features onto 3D structures was performed using SSMap ( David, F.P.A. and Yip, Y.L. (2008) BMC Bioinformatics, 9:391 ). Only variants that has been mapped on a experimentally resolved 3D structure can be retrieved.
The downloadable table contains:
Accession: The Swiss-Prot accession number.
Entry name: The Swiss-Prot entry name.
Disease: The Disease extracted from the Swiss-Prot disease comment line.
MeSH descriptor: MeSH descriptor Unique identifier (descriptorUI).
Feature identifier: The Swiss-Prot sequence feature identifier (ftid), identifying the variants.
Variant: The name of the variant, according to the HGVS recommendations.
rsID: The dbSNP variant identifier.
PDB structure identifier: The PDB structure which contains the variant residue, chosen according to the structural definition of the variant residue environment.
PDB chain: The chain of the PDB structure which contains the variant residue.
PDB position: The position in the PDB chain of the variant residue.
Protein not found
SwissVar gives access to Swiss-Prot human proteins with variants or disease annotation. Different reasons can explain that a protein is not found:
- The protein does not have any variants or disease annotated in Swiss-Prot.
- The protein is not a human protein.
- The protein is in UniProtKB/TrEMBL and not in UniProtKB/Swiss-Prot.
OMIM not found
SwissVar only contains MIM numbers describing phenotypes (# and +).
MeSH descriptor not found
SwissVar only contains MeSH descriptors of the 'Diseases' and 'Psychiatry and Psychology' trees.
You can directly access the results in the xml or tab delimited format by using the url 'http://swissvar.expasy.org/cgi-bin/swissvar/result' with parameter 'format' having the value xml, tab or html. Without other parameter, all the proteins, diseases and variants will be returned. You can also specify a value to the global_textfield parameter.