Search SWISS-PROT for proteins involved in cancer. First search in the
Description field then try other fields such as Keywords.
Search for the SWISS-PROT entry with accession number P19558.
Search for all authors with surname "Smith" in SWISS-PROT.
Search for all SWISS-PROT proteins involved in cancer which were published
by someone named "Smith". Do this search first as a combination of seaches
in the "Query Manager" and again from the query form.
How many non-EST sequences exist in EMBL? ( Use the 'Division' index)
How many human sequences exist in EMBL and SWISS-PROT. What is the problem
with just using 'human' in the query form with standard setting?
Find all sequences in EMBL which were released this year ( Select the 'Date'
field and use dd-mmm-yy(yy): format).
Find all sequences in both SWISS-PROT and SWISSNEW that were created between
1. January 1996 and 1 July 1996
Find the EMBL sequences that were published in Biochemistry, Vol. 21, pages
1453 to 1463 in 1982. ( Use the field information page to get further help)
Search all dihydrofolate reductases in SWISS-PROT.
Search all dihydrofolate reductases in SWISS-PROT with sequences of length
between 500 and 700.
Search 'kinase' in the 'Description' index of SWISS-PROT. Why are some
of the found entries not kinases? Find at a word that when found together
with 'kinase' strongly indicates that the protein is not a kinase at all.
( Select the 'Description' field to be displayed in the entry list)
B. Browsing Indices
How many spellings exist in the 'Keywords' index of EMBL for the name(s)
of the ribulose bisphosphate carboxylase. ( Use lots of the wildcard '*'
anywhere in the search word)
Find out if the 'Description' and 'Keywords' indices of SWISS-PROT contain
any words with spaces. ( Use a search expression with a wildcard ('*')
at the end and the beginning)
What is the shortest author name in SWISS-PROT? ( Use wildcards '?')
Search 'homeobox*' in 'AllText' of all sequence databanks. How many indices
are searched alltogether? Is the query 'homeobox*' suitable to find all
proteins containing a homeobox?
Use a regular expression search to find all words consisting of 'nif' and
another character ( Don't forget to put the regular expression within '/'s)
How many entries with 5 digit sequence length exist in EMBL. Why is the
index browser not very useful for searching integers?
For which species exist at least 1000 entries in SWISS-PROT ( Use the fact
that organism names contain a space which higher level taxa mostly don't
have)
C. Subentry Queries
How many SWISS-PROT sequences exist with transmembrane regions? ( Use the
'transmem' feature key)
How many transmembrane regions exist in SWISS-PROT?
How many transmembrane regions in SWISS-PROT are shorter than 10 amino
acids or longer than 50?
How many pseudo genes annotated as CDS (CoDing Sequence) exist in EMBL?
Retrieve the set of all human transmembrane segments and save them to your
directory using the view "FastaSeqs".
D.Using Views
Create a view for EMBL in the query form that displays the ID and Description
line and the sequence in PIR format then perform a new search for all mouse
sequences.
Create another view for EMBL in the query form that displays the ID, AccNumber,
Description and DBOrigin in a table and again search for all mouse sequences.
Search in the Description field of SWISS-PROT for all all "uroplakin" sequences
and compare the hydrophilicity plots. ( Use the "Rao and Argos transmembrane
index" in the 'proteinMap' view)
E. Performing Links
Which fraction of EMBLNEW are updates of EMBL entries?
Create a set of all human sequences in EMBL and EMBLNEW but remove the
entries in EMBL that have been updated in EMBLNEW.
Which fraction SWISSPROT entries is linked to PROSITE?
How many PROSITE entries are NOT linked to SWISSPROT? Why?
How many SWISSPROT entries are NOT linked to EMBL? How many of them are
fragments?
For how many unique reactions catalysed by an enzyme do we have its tertiary
structure in PDB? ( Enzyme reactions are described in the databank ENZYME)
How many PDB entries exist with a resolution of at least 2 Angstroem and
that have calcium binding sites? ( Search SWISS-PROT to find entries with
calcium binding sites and PDBFINDER to search for entries with the correct
resolution) ( At the moment subentries cannot be linked to another databank
but they can be converted to the parent databank using a link to "parent")
Search the dihydrofolate reductase family in PROSITE and link it to SWISSPROT.
If you compare with the set from A.3, are the two sets the same?
Rodrigo López Serrano (European Bioinformatics Institute,
Cambridge).