Hi all, I found a lot of programs for predicting unstructured areas in proteins. I tried a few (DisEMBL, Disopred, MFDp) but I am unable to choose the one. Does anyone have suggestions regarding those prediction programs?
There are different programs availabe that often reach different results. The reason for this discrepancy is not only the mathematical approach but rather the definition of disorder. Some concepts treat small loops of 7 residues with high side chain mobility as disordered (e.g. because they are not properly resolved in an X-ray structure) while others consider this as a 'loop within a globular domain' and focus on long stretches without any secondary structure. All of these definitions have their merits, and which one is best depends on your planned use of the data.
My own interest in protein disorder is for the identification of short linear motifs that are often found in those regions. Among the ~10 different programs I evaluated, I got the best results from IUpred and Globplot (using the B-factor scale, not the default). When I did this analysis I was working for a company, so a few programs had to be excluded on license grounds. Again, which program works best for you depends on the underlying concept of disorder.
If you are really serious about this issue, I can give two recommendations:
Assemble a set of positive test cases (proteins that are known to have disordered regions by your working definition) and negative test cases (proteins that don't) and test a number of different programs (and different scales for proteins that give you a choice). By comparing the output, you get a good impression on what the programs really score.
Run multiple predictors over the sequence and then try to calculate a consensus (e.g. by acception only regions that are predicted as disordered by at least 3 out of 5 predictors). When doing that, make sure that you exclude programs/scales that use the 'wrong' definition of disorder.