Motivation

Disulfide bonds in proteins stabilize their structure and play relevant roles in their functions. Their formation requires an oxidizing environment and their stability is therefore depending on the redox ambient potential that may differ depending on the subcellular compartment. Several methods are available to predict the bonding state of cysteines and their connectivity patterns. However none takes into consideration the relevance of protein subcellular localization.

DisLocate is a method based on Grammatical-Restrained Hidden Conditional Random Fields (GRHCRFs)[1] and Support Vector Regression (SVR) for the prediction of cysteine connectivity patterns in a protein chain. The method consists of two prediction steps:

  • Cysteine bonding-state prediction
  • Connectivity pattern prediction

GRHCRFs are employed in the first step for the prediction of cysteine bonding-states. The method takes advantage of the protein subcellular localization as predicted by the BaCelLo predictor [2]. This information, in combination with the Position Specific Scoring Matrix (PSSM), has been proven to be relevant for the prediction of bonding-states of cysteines [3].

For the second step, a SVR based method is used to estimate bonding probabilites among pairs of cysteines. This estimation is carried out on the basis of three descriptors: PSSM, Cysteine Separation Distance (CSD) and Relative Order of Cysteines (ROC). Estimated probabilities are the used as edge weihgts of a fully-connected graph whose nodes are cysteines and edges represent disulfide bonds. The final most probable connectivity pattern is then predicted by solving a Maximum-Weighted Perfect Matching problem on the graph. For more information see [3].

The web server DisLocate takes as input one protein sequence in fasta format and provides the predicted connectivity pattern in output. Only protein sequences with at least two cysteines are accepted while sequences containing invalid characters, i.e. non alphabetical and non amino-acid chars, are not accepted. Results are delivered as soon as possible and mantained in our servers for a week after they become available.

Datasets

Materials used in [3] can be found here:

References