Eva-CM,
a web server for CONTINUOUS and AUTOMATED evaluation of protein
structure prediction servers, works by continually taking new structures
from Protein Data Bank (PDB),
submitting them to various participating modeling servers, accepting
the returned predictions, evaluating the predictions, and displaying the
results on the web.
CRITERIA:
The intent for EVA is to have a small number of simple criteria, arranged
in a hierarchical manner from coarse to fine, measuring the main aspects
of comparative modeling. These aspects include fold recognition, alignment,
overall accuracy, core accuracy, loop accuracy, and sidechain accuracy.
Another intent is to have criteria such that their distributions determined
by EVA can be applied as predictors of accuracy of a comparative model
where the actual structure is not known.
1) Does the model have the correct fold?
The model has the correct fold when a structure-based alignment of the
model with the actual target structure aligns at least 30% of the residues.
The alignment is defined by the least-squares superposition that maximizes
the subset of the CE (Shindyalov, I.N., Bourne, P.E. 1998.
Protein Eng. 11 pp739-47.) equivalences corresponding to pairs of identical
Calpha atoms within 3.5A of each other. This is obtained by the MODELLER
"SUPERPOSE REFINE_FIT = on" command applied to the CE model-target alignment,
with the Calpha-Calpha distance cutoff of 3.5A, and removing the equivalences
between non-identical residues from the refined list. Two residues are
identical when they are the same residue in the sequence.
2) If a model does not have the correct fold, did a modeling server
fail because there is no related fold or because it did not find the correct
fold?
In other words, is the modeled sequence related structurally to any
of the known protein structures? The answer is obtained by using CE to
determine if the actual target structure has at least 30% of its residues
aligned within 3.5A with any of the known structures. This
additional fold assignment qualification penalizes the modeling server
that returns to a user a model for a sequence unrelated to any of the known
structures.
3) Difficulty of the modeling case.
The difficulty level is obtained by comparing
the modeled sequence with the sequence of the structurally most similar
known structure. The most similar known structure is defined as the structure
that has the largest number of residues aligned with the target structure,
as defined in point 2. Once the closest template is found (by CE search), the difficulty
level is based on the comparison of the pairwise alignment derived from the structure alignment
and the pairwise alignment of both sequences by ALIGN command in MODELLER.
4) Alignment accuracy.
Alignment accuracy is the fraction of aligned residues defined in (1).
Note that this does not measure the accuracy of the input alignment, but
the fraction of the residues that are modeled "correctly", some of them
possibly because loop modeling worked well.
5) Overall accuracy.
The RMSD and DRMS errors are calculated separately for all pairs of
identical Calpha, mainchain, sidechain, and all atoms from the model and
the target. Each RMSD is calculated for a least-squares superposition of
the atoms that are compared. The MODELLER SUPERPOSE command is used. Four
different RMSDs are given for Calpha atoms, main-chain atoms, side-chain
atoms and all atoms in the model.
6) Accuracy of the correctly aligned positions.
The RMSD and DRMS errors are calculated separately for all pairs of
aligned Calpha, mainchain, sidechain, and all atoms from the model and
the target. The aligned residues are defined in (1), thus the same comment
as in (3) also applies here. Each RMSD is calculated for a least-squares
superposition of the atoms that are compared.
7) Accuracy of the protein core.
The core region is defined by the target residues that are buried
or in secondary structure according to MODELLER (WRITE_DATA command). A
residue is buried when less than 20% of its surface area is exposed. The
RMSD and DRMS are calculated for core Calpha atoms, mainchain atoms, sidechain
atoms, and all atoms. Rotamer classes are also compared (chi1, chi2, chi3,
chi4, as well as chi1+2). The MODELLER COMPARE command is used. The rotamer
results are reported separately for the 20 residue types.
8) Accuracy of the non-core regions.
Each non-core region, or "loop", is treated separately. The local and
global RMSDs are calculated for their Calpha, main-chain, side-chain and
all atoms in the model and the target. The global superposition is defined
as the least-squares superposition of the Calpha atoms aligned as defined
in (1).
10) Stereochemical quality of the model
Several stereochemical criteria are also tabulated, including the Procheck
criteria for Phi/Psi values and H-bonds, as well as the stereochemical
g-factor.
11) Contribution of a model.
All returned models are assessed and they contribute equally to the
server average.
The Eva Team
|