EVAcon: evaluation criteria

		EVA mirrors - Secondary structure - Comparative modelling - Threading - Contacts - FTP - search

Version 2.0 September 2004 email		EVAcon: evaluation criteria *All servers are now evaluated in the contact prediction category: comparative modelling, ab initio, threading, meta and contact specialists*

The distance between two residues are calculated as the distance in Angstroms between their Cb carbons (Ca for Gly). For a given set of predicted contacts, five main parameters are calculated.

Contact evaluation. Contact is defined as the distance between Cb (Ca for Gly) <= 8.0 Angstroms.

Accuracy (Acc ). The relation between the number of true predicted contacts and the total number of predicted contacts.

Acc= nt / n

Where nt is the number of true predicted contacts and n is the total number of predicted contacts.

Improvement over random (Imp). The relation between the accuracy and the accuracy of random (predicting all the pairs in the protein as contacting).

Imp= Acc / (C/N)

Where N is the total number of residue pairs in the protein excluding the ones close in the sequence (see below) and C is the observed number of contacts within N.

Coverage (Cov ). The relation between the number of true predicted contacts and the number of observed contacts in the experimental structure.

Distance distribution of the predicted contacts, Xd. The weighted harmonic average difference between the predicted contacts distance distribution and the all-pairs distance distribution.

Xd= SUM {i=1,15}((Pip-Pia) / (di * 15))

Where the sum runs for all the distance bins. There are 15 distance bins covering the range from 0 to 60 A. di is the distance representing each bin, its upper limit (normalised to 60). Pip is the percentage of predicted pairs whose distance is included in the i bin. Pia is the same for all the pairs. Defined in that way, Xd>0 indicates the positive cases where the population of predicted contacts distances is shifted to lower distances (see J. Mol. Biol. (1997), 271:511-523).

For the calculation of these parameters, both, the predicted pairs of residues and all the pairs in the protein are split in three sets according to the separation of the two residues of the pair in the linear sequence of the protein, the number of residues between them: seqsep>=6, seqsep>=12 and seqsep>=24. Acc, Imp, Cov and Xd are evaluated for these three sets.

Delta evaluation. Percentage of correctly predicted contacts within |delta| residues, measured along the sequence, of the experimental contact (see Ortiz et al., 1999, Proteins 3:177-185). This is done for |delta|=0, 1, 2, 3, 4 and 5. It means that a predicted contact between two residues i and j is considered correct if there is any real contact between any residue in the range [i-delta,i+delta] and any residue in the range [j-delta,j+delta]. A contact evaluation with delta=0 is equivalent to a standard contact evaluation. Acc, Imp and Cov are evaluated here as well.

Predictors can submit a number of residue pairs as the ones predicted to be in contact or can send all the pairs in the protein with an associated score for each pair (see file format). In the second case, the list is sorted by the score and evaluations are made taken different numbers of top pairs as function of the protein length: the first 2L, L, L/2, L/5 and L/10 pairs are taken (L: length of the protein). All those calculations are performed for the three subsets of pairs explained above (seqsep>=6, 12 and 24). Targets for Contact Prediction are split in different sets according to their sequence length. The fundamental parameter for the evaluation is the highest sequence separation (seqsep>=24).

EVA mirrors - Secondary structure - Comparative modelling - Threading - Contacts - FTP - search