SRS-CNB
SRS-EBI

SRSWWW Exercises
 

A. Simple Queries

  1. Search SWISS-PROT for proteins involved in cancer. First search in the Description field then try other fields such as Keywords.
  2. Search for the SWISS-PROT entry with accession number P19558.
  3. Search for all authors with surname "Smith" in SWISS-PROT.
  4. Search for all SWISS-PROT proteins involved in cancer which were published by someone named "Smith". Do this search first as a combination of seaches in the "Query Manager" and again from the query form.
  5. How many non-EST sequences exist in EMBL? ( Use the 'Division' index)
  6. How many human sequences exist in EMBL and SWISS-PROT. What is the problem with just using 'human' in the query form with standard setting?
  7. Find all sequences in EMBL which were released this year ( Select the 'Date' field and use dd-mmm-yy(yy): format).
  8. Find all sequences in both SWISS-PROT and SWISSNEW that were created between 1. January 1996 and 1 July 1996
  9. Find the EMBL sequences that were published in Biochemistry, Vol. 21, pages 1453 to 1463 in 1982. ( Use the field information page to get further help)
  10. Search all dihydrofolate reductases in SWISS-PROT.
  11. Search all dihydrofolate reductases in SWISS-PROT with sequences of length between 500 and 700.
  12. Search 'kinase' in the 'Description' index of SWISS-PROT. Why are some of the found entries not kinases? Find at a word that when found together with 'kinase' strongly indicates that the protein is not a kinase at all. ( Select the 'Description' field to be displayed in the entry list)
B. Browsing Indices
  1. How many spellings exist in the 'Keywords' index of EMBL for the name(s) of the ribulose bisphosphate carboxylase. ( Use lots of the wildcard '*' anywhere in the search word)
  2. Find out if the 'Description' and 'Keywords' indices of SWISS-PROT contain any words with spaces. ( Use a search expression with a wildcard ('*') at the end and the beginning)
  3. What is the shortest author name in SWISS-PROT? ( Use wildcards '?')
  4. Search 'homeobox*' in 'AllText' of all sequence databanks. How many indices are searched alltogether? Is the query 'homeobox*' suitable to find all proteins containing a homeobox?
  5. Use a regular expression search to find all words consisting of 'nif' and another character ( Don't forget to put the regular expression within '/'s)
  6. How many entries with 5 digit sequence length exist in EMBL. Why is the index browser not very useful for searching integers?
  7. For which species exist at least 1000 entries in SWISS-PROT ( Use the fact that organism names contain a space which higher level taxa mostly don't have)
C. Subentry Queries
  1. How many SWISS-PROT sequences exist with transmembrane regions? ( Use the 'transmem' feature key)
  2. How many transmembrane regions exist in SWISS-PROT?
  3. How many transmembrane regions in SWISS-PROT are shorter than 10 amino acids or longer than 50?
  4. How many pseudo genes annotated as CDS (CoDing Sequence) exist in EMBL?
  5. Retrieve the set of all human transmembrane segments and save them to your directory using the view "FastaSeqs".
D.Using Views
  1. Create a view for EMBL in the query form that displays the ID and Description line and the sequence in PIR format then perform a new search for all mouse sequences.
  2. Create another view for EMBL in the query form that displays the ID, AccNumber, Description and DBOrigin in a table and again search for all mouse sequences.
  3. Search in the Description field of SWISS-PROT for all all "uroplakin" sequences and compare the hydrophilicity plots. ( Use the "Rao and Argos transmembrane index" in the 'proteinMap' view)
E. Performing Links
  1. Which fraction of EMBLNEW are updates of EMBL entries?
  2. Create a set of all human sequences in EMBL and EMBLNEW but remove the entries in EMBL that have been updated in EMBLNEW.
  3. Which fraction SWISSPROT entries is linked to PROSITE?
  4. How many PROSITE entries are NOT linked to SWISSPROT? Why?
  5. How many SWISSPROT entries are NOT linked to EMBL? How many of them are fragments?
  6. For how many unique reactions catalysed by an enzyme do we have its tertiary structure in PDB? ( Enzyme reactions are described in the databank ENZYME)
  7. How many PDB entries exist with a resolution of at least 2 Angstroem and that have calcium binding sites? ( Search SWISS-PROT to find entries with calcium binding sites and PDBFINDER to search for entries with the correct resolution) ( At the moment subentries cannot be linked to another databank but they can be converted to the parent databank using a link to "parent")
  8. Search the dihydrofolate reductase family in PROSITE and link it to SWISSPROT. If you compare with the set from A.3, are the two sets the same?

Rodrigo López Serrano (European Bioinformatics Institute, Cambridge).