Methods for Database Searches
 Sequence-based Methods for Functional Assignment


              


Pairwise and Multiple Sequence Alignments and Similarity Searches


Introduction
Alignments are fundamental to most bioinformatics methods. Good alignments can be used to infer the common ancestry of a number of sequences, to make predictions about function and structure and to identify conserved regions that may be of structural or functional importance.

In this part of the practical we will encounter some basic alignment techniques using server-based sequence database search methods and multiple sequence alignment tools.



Tools and Databases

References




Patterns and Profiles, Protein Motifs and Domains


Introduction

 
In general protein sequences can be grouped in clusters. It is possible to identify and cluster groups of sequences that are maximally similar between them, and minimally similar to other clusters. These clusters of evolutionary-related protein sequences are called families.

The sequence information from alignments of these clusters or families can be combined in order to search for function or for more distantly-related sequences (remote homologues). Evolutionary information from aligned homologous sequences combined in this way is known as a profile.

Another general property of sequences is that sequence similarity may be restricted to short stretches of sequence called domains and motifs.

The definition of these conserved sub-sequences depends on their size and function. Domains are stretches of sequence that appear as structural modules, often within many proteins. Motifs are short conserved sub-sequences that often correspond to active or functional sites. Motifs can be used to help predict function and in the identification of remote homologues.

In the second part of the practical we will go through several examples to illustrate the concepts of  patterns, profiles, motifs, domains and families. Each of the exercises will be centered in the analysis of a specific sequence. The examples include results from database searches or from the application of a range of tools.



Tools and Databases

  • InterPro - an integrated database of protein families, domains, motifs and functional sites.
  • Blocks - multiply aligned ungapped segments for the most highly conserved regions of proteins.
  • Motif - a server that scans databases to find motifs or patterns and that can generate sequence profiles.
  • Pfam - multiple sequence alignments and HMMs of protein domains and families.
  • PRINTS - database of groups of conserved motifs, or protein fingerprints.
  • ProDom - protein domain families automatically generated from SWISS-PROT and TrEMBL.
  • PROSITE - database of protein families and domains defined by functional sites, patterns and profiles.
  • SMART - Simple Modular Architecture Research Tool for the identification of domains.
  • COGS database - clusters of sequences determined by comparing sequences from whole genomes.



References



 

Michael Tress
Protein Design Group