Entrez Help Document: Nucleotide - Protein - Genome - Structure

Entrez Help Document

PubMed

Entrez

BLAST

OMIM

Taxonomy

Structure

Last modified : September 7, 2000

Nucleotide - Protein - Genome - Structure - PopSet

Table of Contents

Introduction

The Databases

Nucleotide

Protein

Genome

Structure

PopSet

Database Interlinking

Limits

Index

History

Clipboard

Summary Matrices (Print hint: Open and print this link separately)

Creating a Web Link(Print hint: Open and print this link separately)

Searching

Subject Searching

Phrase Searching

Searching for Authors

Searching for Unique Identifiers

Searching by Molecular Weight

Range Searching

Truncating

Combining Sets

Refining Your Search

Search Fields and Qualifiers

Boolean Operators

Using Limits

Using the Indexes

Using Your History

Writing Advanced Search Statements

Displaying and Saving Results

Display Button, Show Button, and Display Formats

Selecting Documents, Displaying Them or Accessing Their Links

Details Button, Add To Clipboard, and Save

Link Out

Introduction

Welcome to the Entrez Help document. The purpose of this document is to provide assistance in using Entrez on the World Wide Web (WWW). Entrez integrates the scientific literature, DNA and protein sequence databases, 3-D protein structure data, population study data sets, and assemblies of complete genomes into a tightly coupled system. Help using the literature component of Entrez, known as PubMed, is also available. Go to PubMed Help.
This help document is organized as follows:
Introduction - describes the new Entrez WWW homepage, its databases and features
Searching - introduces and demonstrates basic search techniques
Refining Your Search - demonstrates advanced search techniques using limits, indexes, and histories and includes help with Writing Advanced Search Statements
Displaying and Saving Results - explains the various display formats, how to save results and how to link to related information in other databases
Link Out - introduces this newest Entrez feature and explains how to use it
The new PubMed/Entrez homepage

Available databases are shown on the black menu bar across the top of the page, beneath the NCBI logo. Your choices are: PubMed, Nucleotide, Protein, Structure, Genome, and PopSet. The scope of these databases is explained in the Database section of this introduction. The available databases are also displayed on the "Search" pull-down menu as shown below. Databases are selected from the black menu bar or the pull-down menu. In this example, PubMed is the chosen database as shown on the pull-down menu and the large "PubMed" at the top of the page adjacent to NCBI (National Center for Biotechnology Information) and and National Library of Medicine (NLM).
Database pull-down menu

The search query box appears directly beneath the "Search" pull-down menu. Search terms are typed into this box and executed by selecting the "Go" button (or by pressing the "return" button on your computer keyboard). The "Clear" button erases search terms in the query box. Use it to begin a new search.
There are links to Limits, Index, History, and Clipboard below the search query box. These are defined in this introduction and summarized in the Summary Matrices at the end of this introduction. See also the Using Limits, Using the Indexes, Using Your History, and Details Button, Add To Clipboard, and Save sections of this document for more information.

The Databases

Nucleotide Database
The Nucleotide database contains sequence data from GenBank, EMBL, and DDBJ, the members of the tripartite, international collaboration of sequence databases. EMBL is the European Molecular Biology Laboratory (EMBL) at Hinxton Hall, UK, DDBJ is the DNA Database of Japan (DDBJ) in Mishima, Japan. Sequence data is also incorporated from the Genome Sequence Data Base (GSDB), Santa Fe, NM. Patent sequences are incorporated through arrangements with the U.S. Patent and Trademark Office (US PTO), and via the collaborating international databases from other international patent offices.

Protein Database
The Protein database contains sequence data from the translated coding regions from DNA sequences in GenBank, EMBL and DDBJ as well as protein sequences submitted to PIR, SWISSPROT, PRF, Protein Data Bank (PDB) (sequences from solved structures).

Genome Database
The Genomes database provides views for a variety of genomes, complete chromosomes, contiged sequence maps, and integrated genetic and physical maps.

Structure Database
The Structure database or Molecular Modeling Database (MMDB) contains experimental data from crystallographic and NMR structure determinations. The data for MMDB are obtained from the Protein Data Bank (PDB). The NCBI has cross-linked structural data to bibliographic information, to the sequence databases, and to the NCBI taxonomy.
Use the NCBI 3D structure viewer, Cn3D, for easy interactive visualization of molecular structures from Entrez.

PopSet Database
The PopSet database contains aligned sequences submitted as a set resulting from a population, a phylogenetic, or mutation study describing such events as evolution and population variation. The PopSet database contains both nucleotide and protein sequence data.

Database Interlinking
What makes Entrez more powerful than many services is that most of its records are linked to other records, both within a given database (such as Nucleotide) and between databases. Links within a database are called "neighbors" (e.g., Nucleotide neighbors).
Protein and Nucleotide neighbors are determined by performing similarity searches using the BLAST algorithm to compare the entry amino acid or DNA sequence to all other amino acid or DNA sequences in the database.
Links between databases are also possible. Nucleotide sequence records in the Nucleotide database are linked to the PubMed citation of the article in which the sequences were published. Protein sequence records are linked to the nucleotide sequence from which the protein was translated.
See Displaying and Saving Results for more information on links within and between databases.

Limits

Limits are ways to restrict a search to a defined subset of the database. Limits can be set to restrict a search to a particular database field (e.g., the author field). Limits can be set to search everything but a particular type of data (e.g., exclude patent records). Alternatively, limits can be set to only search a particular type of data (e.g., Genomic RNA/DNA) or to only search data from a particular source (e.g., EMBL). Date limits and sequence length limits are also possible.
The nature of Limits makes them dependent on the database fields. Because the Entrez databases are different, Limits available for each database are also different. See the Limits Available by Database Summary in the Summary Matrices section of this introduction. See also the Using Limits section of this document for help in using limits in your search.
Limits available for the Nucleotide database

Index

Indexes are alphabetical lists of terms from searchable database fields. When indexes are displayed, they provide a way to browse the terms by which records and/or data are described. Entrez not only lets you browse indexes, you can also select terms to search directly from them.
As with limits, the indexes available for a particular database are dependent on the searchable fields of that database. See the Indexes Available by Database in the Summary Matrices section of this introduction.
The view below displays the entries listed alphabetically under "bacter" in the "Organism" index of the Nucleotide database. Specific indexes are selected from the "View Index" pull-down menu. Indexes are searchable by typing search terms in the query box. They are also browsable by selecting the "Up" and "Down" buttons to scroll. See the Using the Indexes section of this document for help in using indexes in your search.
Nucleotide database "All Fields" Index

Available indexes for the Nucleotide database are shown below. Nucleotide "View Index" pull-down menu

History

History provides a record of the searches performed during a search session. Histories are database specific. Each time search terms are typed into the query box and the search is executed, the search terms, the time the search was executed and the search results are saved automatically in the History for that database. Additionally, each search is numbered. The History can be recalled at any time during a search session, but histories are lost after one hour of inactivity. Histories can be used to review, revise or combine the results of earlier searches. See the Using Your History section of this document for help in using your search history.
The History of a search session in Nucleotide database

Clipboard

The Clipboard is a temporary place where search results can be saved by users. Clipboards are database specific. Search results are not saved automatically. Each database clipboard is limited to 500 items and items saved to the clipboard are lost after one hour of inactivity. Items can be displayed and saved from the Clipboard. See Details Button, Add To Clipboard, and Save section of this document for help in adding records to and using records on your clipboard.
A Clipboard from the Nucleotide database

Searching

Select an Entrez database and enter one or more search terms in the query box (e.g., 16S RNA).
Query Box for Nucleotide Database

Subject Searching

Subject terms are automatically combined (ANDs). In the example above the search query - 16S RNA retrieves all records with the terms 16S AND RNA. See Boolean Operators for more information on combining terms with Boolean Operators.

Phrase Searching

To force Entrez to search for a phrase enter double quotes (" ") around the phrase. For example, "16S RNA" retrieves only 86 documents compared to the subject search 16S AND RNA that retrieves more than 24,000 documents.
Using quotes forces Entrez to check a phrase list against which the search terms are matched. It is not adjacency searching. If the search phrase is not in the phrase list, Entrez treats the terms as though they are not in quotes and automatically combines them (AND).
Although phrase searching is useful it should be used with caution because enclosing search terms in quotes restricts the documents retrieved to only those documents with exact matches to the text string within the quotes. In this example, documents with the term 16S RNA are retrieved but documents with the term 16S RNA gene are not.

Searching for Authors

Enter author names in the format: last name plus initials (e.g., johnson d). Do not use punctuation. This format instructs Entrez to search only the author field. Entrez automatically truncates on the author's name to account for varying initials and designations such as Jr. or 2nd. If only a last name is used in the query box (e.g., johnson), Entrez will search All Fields for that term.

Searching for Unique Identifiers

Unique identifiers can be accession numbers, which apply to a complete sequence record, or sequence identification numbers, which apply to the individual sequences within a record.
The format of accession numbers varies, depending upon the source database. (As noted above, in The Databases section, each data domain in Entrez contains records from a number of different sources.) Some examples of typical accession number formats are below. The Sample GenBank Record contains additional detail about accession numbers.

Type of Record Sample Accession Format

GenBank/EMBL/DDBJ Nucleotide Sequence Records One letter followed by five digits, e.g.:
U12345
Two letters followed by six digits, e.g.:
AF123456

GenPept Sequence Records
(which contain the amino acide translations from GenBank/EMBL/DDBJ records that have a coding region feature annotated on them): three letters and five digits, e.g.:
AAA12345

Protein Sequence Records from SwissProt and PIR: one letter and five digits, e.g.:
P12345

Protein Sequence Records from PRF: a series of digits (often six or seven)
followed by a letter, e.g.:
1901178A

RefSeq Nucleotide Sequence Records Two letters, and underscore bar, and six digits, e.g.:
mRNA records (NM_*):
NM_000492
genomic DNA contigs (NT_*):
NT_000347

RefSeq Protein Sequence Records Two letters (NP), and underscore bar, and six digits, e.g.:
NP_000483

Protein Structure Records PDB accessions generally contain one digit followed by three letters, e.g.:
1TUP
MMDB ID numbers generally contain four digits, e.g.:
3973
The record for the Tumor Suppressor P53 Complexed With Dna can be retrieved by either number above.

There are two types of sequence identification numbers:

GI numbers:

a series of digits that are assigned consecutively by NCBI to each sequence it processes
Version numbers:

consist of the accession number followed by a dot and a version number
For example, the RefSeq record for the Homo sapiens cystic fibrosis transmembrane conductance regulator has the accession number NM_000492. The record contains one nucleotide sequence and one amino acid translation, which have the following sequence identifers:nucleotide sequence:
GI: 6995995
VERSION: NM_000492.2protein translation:
GI: 6995996
VERSION: NP_000483.2
If a sequence changes in any way, it receives a new GI number, and the version number is incremented by one. The Sample GenBank Record contains additional detail about GI and Version sequence identification numbers.

Searching by Molecular Weight

NCBI implemented a new "Molecular Weight" search field in July 2000 for searches of the Entrez Proteins database at the request of the mass spectrometry group at NIH. Dr. Lewis Pannell provided technical advice.The Molecular Weight field can be queried as a single molecular weight:
002002 [Molecular Weight]
or a range of weights:
002002:002009 [Molecular Weight]
or either expression can be combined with other Entrez search terms, for example, to limit by organism:
002002:002009 [Molecular Weight] AND human [Organism]
Note that molecular weight must be entered as a fixed 6 digit field, filled with leading zeros (not letter O). The square brackets can contain the full spelling of the search field, as in the examples above, or the abbreviation [MOLWT] in upper or lower case.Note also that where cleavage products are annotated with features, the molecular weight of each cleavage product is calculated, not the molecular weight of the whole protein. Thus you may retrieve a large protein when querying with a small molecular weight -- be sure to check the feature table of the protein record to see if it has cleavage products.How the Molecular Weight is calculated:

If cleavage products are annoted, molecular weight is calculated for each cleavage product, not for the whole protein. Cleavage products are not consistently annotated, but we have done our best to detect the annotations across different database styles. For example, cleavage products are annotated as "matp" in GenBank, but as "Region" with "/region_name="Mature chain" in SWISSPROT.Note that this means more than one molecular weight may point to a single protein record!

If only a signal peptide is annotated, it is removed, and the molecular weight is calculated on the rest of the protein.

If there are no such features on the protein, then the molecular weight for the whole protein is calculated. In this case, a check is made for an initial Met, and it is not included in the calculation if found.

If completely unknown amino acids (eg. "X") are found, a molecular weight is not calculated. Ambiguous amino acids are calculated as one of their possible forms:B means D or N -- molecular weight is calculated as D

Z means E or Q -- molecular weight is calculated as E
Molecular weight is calculated as part of the indexing process for protein records in Entrez. The weights are present only in the Molecular Weight index, and are not shown explicitly on the protein sequence records.

Range Searching

Range searching can be done on three data elements: accession numbers [ACCN], sequence length [SLEN], and molecular weights [MOLWT]. The range operator is the colon (:), and the appropriate field qualifier should be included in square brackets after the second term. Field qualifiers are case insensitive, so either [ACCN] or [accn] will work. It is not necessary to include a space between the search term and the field qualifer, although that can be done, if desired.Example searches: Range of accession numbers:
AF114696:AF114714[ACCN]
Note: It is not possible to search for a range of sequence identification numbers (known as GI numbers and Version) numbers.
Range of sequence lengths:
3000:4000[SLEN]Range of molecular weights can be searched in the Proteins database:
002002:002009[MOLWT]
Note: Molecular weights must be expressed in six digits, filled with leading zeros (not letter O). Additional information about searching by molecular weight is included above.
Also, a range search can be combined with other Entrez search terms, for example, to limit by organism:
002002:002009[MOLWT] AND human[ORGN]

Truncating

Truncating search terms is a convenient way to find all the records that contain terms that begin with a given text string. Place an asterisk (*) at the end of a search term to find all records with a term that begins with that text string. For example, the truncated search term "immunoglob*" will retrieve all records in the database that contain the word immunoglobulin, immunoglobulins, immunoglobin, and immunoglobins.
Entrez searches the first 150 variations of a truncated term. If a truncated term produces more than 150 variations, which is possible with terms like "bact*," Entrez gives the following warning:
"Wildcard search for 'bact*' used only the first 150 variations. Lengthen the root word to search for all endings."
Phrases that include a space in the word after the asterisk will NOT be retrieved. For example, if you search "chromo*," the documents retrieved will contain terms like chromobacterium but not chromo helicase.
Left-handed truncation is not possible (e.g., "*bacterium").

Combining Sets

Use your search History to combine documents retrieved with different search terms at different times during your search session. For example, search the Nucleotide database for HIV. This search retrieves 37,549 documents. Now search the Nucleotide database for protease. This search retrieves 13,061 documents. Now click on the History for the Nucleotide database.

The results for the HIV and protease search terms are saved as Search Sets #1 and #2, respectively. In the query box, type #1 AND #2 and select Go. This search combines the documents in Search Set #1 (HIV) with the documents in Search Set #2 (protease) and retrieves only those documents that are in both sets (i.e., 3,156 documents).
Click on History again and note Search Set #3 (#1 AND #2).
Remember, this History is for the Nucleotide database only and it will be lost after one hour of inactivity. See Boolean Operators and Using Your History for more information and examples.

Refining Your Search

Sometimes it is necessary to refine your search statement by using the Limit, Index and History options of a given Entrez database. The key to using these options, especially the Limit and Index options, is a better understanding of the Entrez databases' search fields and Boolean Operators.

Boolean Operators

Boolean Operators used in Entrez are:
AND: To AND two search terms together instructs Entrez to find all documents that contain BOTH terms.
OR: To OR two search terms together instructs Entrez to find all documents that contain EITHER term.
NOT: To NOT two search terms together instructs Entrez to find all documents that contain search term 1 BUT NOT search term 2.
The Entrez search rules and syntax for using Boolean operators are:
1. Boolean operators, AND, OR, NOT must be entered in UPPERCASE (e.g., promoters OR response elements).
2. Entrez processes all Boolean operators in a left-to-right sequence. The order in which Entrez processes a search statement can be changed by enclosing individual concepts in parentheses. The terms inside the parentheses are processed first as a unit and then incorporated into the overall strategy. For example, the search statement: g1p3 AND (response element OR promoter) is processed by Entrez by ORing the terms response element OR promoter first and then ANDing the resulting set of documents with g1p3.
3. Click on the Details button to see how Entrez translated and executed your search strategy.
4. See Writing Advanced Search Statements for more information on using Boolean Operators and Entrez Search Field Qualifiers.
Details Button

Details Screen

Using Limits

Limits are used to refine search results to retrieve only the most relevant documents. In other words limits remove unneeded or unwanted documents. This section provides examples for using limits to:

Limit a search to a particular database field

Exclude certain kinds of sequences

Limit the search to a particular molecule type

Limit the search to a particular gene location

Display only the master or only the parts of segmented sets of sequences

Limit the search to records from a particular sequence database

Limit the search by date

Using more than one limit at a time

See Summary Matrices to review the limits available for each database.

Limit a search to a particular database field
Example: You are only interested in nucleotide sequences from the mouse.
1. Select the Nucleotide database from the black menu bar or the Search pull-down menu.
2. Select Limits.
3. In the "Limited To:" section, select Organism from the Search Field pull-down menu.
4. Type "mouse" without quotes in the query box and select Go.
On the results screen note that the check box next to Limits is checked indicating that limits are selected and active. Beneath the check box the selected and active limits are highlighted in yellow (i.e., Field: Organism).
Searching for "Mouse" in the Organism Search Field

Example: You are only interested in protein sequences that are less than 50 amino acids in length:
1. Select the Protein database from the black menu bar or the Search pull-down menu.
2. Select Limits.
3. In the "Limited To:" section, select Sequence Length from the Search Field pull-down menu.
4. Type "0:50" without quotes in the query box and select Go.
On the results screen note that the check box next to Limits is checked indicating that limits are selected and active. Beneath the check box the selected and active limits are highlighted in yellow (i.e., Field: Sequence Length).

Exclude certain kinds of sequences
Example: You are interested in mitochondrial carriers but you do not want the EST sequences:
1. Select the Nucleotide database from the black menu bar or the Search pull-down menu.
2. Type "mitochondrial carrier" without quotes in the query box.
3. Select Limits.
4. In the "Limited To:" section, check the box next to "Exclude ESTs" and select Go.
On the results screen note note that the check box next to Limits is checked indicating that limits are selected and active. Beneath the check box the selected and active limits are highlighted in yellow (i.e., Limits: Exclude ESTs).
In the Nucleotide database you can exclude EST, STS, GSS, working drafts, and/or Patent sequences. In the Protein database you can exclude Patent sequences.

Limit the search to a particular molecule type
Example: You are only interested in Cryptosporidium ribosomal RNA sequences.
1. Select the Nucleotide database from the black menu bar or the Search pull-down menu.
2. Type "cryptosporidium" without quotes in the query box.
3. Select Limits.
4. In the "Limited To:" section, select the "Molecule" pull-down menu and choose rRNA and select Go.
On the results screen note that the check box next to Limits is checked indicating that limits are selected and active. Beneath the check box the selected and active limits are highlighted in yellow (i.e., Limits: rRNA).

Limit the search to a particular gene location
Example: You are interested in the genes in the chloroplast of flowering plants.
1. Select the Nucleotide database from the black menu bar or the Search pull-down menu.
2. Type "flowering plants" without quotes in the query box.
3. Select Limits.
4. In the "Limited To:" section, select the "Gene Location" pull-down menu and choose Chloroplast and select Go.
On the results screen note that the check box next to Limits is checked indicating that limits are selected and active. Beneath the check box the selected and active limits are highlighted in yellow (i.e., Limits: Chloroplast).

Display only the master or only the parts of segmented sets of sequences
Example: You are interested in the cystic fibrosis transmembrane conductance regulator (CFTR) gene. You know that there are several segmented sets of sequences associated with the CFTR gene. But you are only interested in displaying the master record of any segmented sets associated with the CFTR gene.
1. Select the Nucleotide database from the black menu bar or the Search pull-down menu.
2. Type "cftr" without quotes in the query box.
3. Select Limits.
4. In the "Limited To:" section, select the "Segmented Sequences" pull down menu and choose show only master of set and select Go.
On the results screen note that the check box next to Limits is checked indicating that limits are selected and active. Beneath the check box the selected and active limits are highlighted in yellow (i.e., Limits: Show only master of set).
Please note that this option does not allow you to limit the documents retrieved to only those containing segmented sequences. It simply allows you to control how segmented sets of sequences are displayed.

Limit the search to records from a particular sequence database
Example: You are interested only in cysteine phosphatase protein sequences submitted directly to PIR.
1. Select the Protein database from the black menu bar or the Search pull-down menu.
2. Type "cysteine phosphatase" without quotes in the query box.
3. Select Limits.
4. In the "Limited To:" section, select the "Only From" pull-down menu and choose PIR and select Go.
On the results screen note that the check box next to Limits is checked indicating that limits are selected and active. Beneath the check box the selected and active limits are highlighted in yellow (i.e., Limits: PIR).

Limit the search by date
Example: You want to see any nucleotide sequences from pigs added to the database (or updated) in the last 30 days.
1. Select the Nucleotide database from the black menu bar or the Search pull-down menu.
2. Type "pigs" without quotes in the query box.
3. Select Limits.
4. In the "Limited To:" section, select Organism from the Search Field pull-down menu.
5. And in the "Limited To:" section, select the "Modification Date" pull down menu and choose 30 days and select Go.
On the results screen note that the check box next to Limits is checked indicating that limits are selected and active. Beneath the check box the selected and active limits are highlighted in yellow (i.e., Field: Organism, Limits: 30 days).
Example: You want to retrieve all mouse or human protein sequences added to the database (or updated) during 1997.
1. Select the Protein database from the black menu bar or the Search pull-down menu.
2. Select Limits. 3. Type "mouse OR human" without quotes in the query box.
4. Select Limits.
5. In the "Limited To:" section, select Organism from the Search Field pull-down menu.
6. And in the "Limited To:" section, select the "Modification Date" pull down menu and choose Modification Date (as opposed to Publication Date). In the date boxes, type the dates in the format YYYY/MM/DD. You can tab from box to box in the date fields. The From date is 1997/01/01 and the To date is 1997/12/31. Select Go.
On the results screen note that the check box next to Limits is checked indicating that limits are selected and active. Beneath the check box the selected and active limits are highlighted in yellow (i.e., Field: Organism, Limits: Modification Date, from 1997/01/01 to 1997/12/31).

Using more than one limit at a time
As shown in the last two examples, you can use more than one limit at a time. Here is one more example using multiple limit features in an Entrez search.
Example: You are interested in the protein translations of human GenBank nucleotide sequences added to the protein database (or updated) in the last 30 days. You do not want patent records.
1. Select the Protein database from the black menu bar or the Search pull-down menu.
2. Select Limits. 3. Type "human" without quotes in the query box.
4. Select Limits.
5. In the "Limited To:" section, select Organism from the Search Field pull-down menu.
6. On the same screen, select the exclude patents check box, select GenBank from the Only From pull-down menu, and finally select 30 days from the Modification Date pull-down menu and select Go.
On the results screen note that the check box next to Limits is checked indicating that limits are selected and active. Beneath the check box the selected and active limits are highlighted in yellow (i.e., Field: Organism, Limits: Exclude patents, 30 days, GenBank).

Using the Indexes

Indexes are used to browse and/or select the terms by which records and/or data are described. This section provides examples for using indexes to:

Examine Search Field Indexes

Browse, Select and Search Terms

Select, Combine and Search Multiple Terms

Select, Combine and Search Multiple Terms from Multiple Indexes

See Summary Matrices to review the indexes available for each database. Also please review how to use Boolean Operators.

Examine Search Field Indexes
Example: Examine the kind of information indexed in the Nucleotide database's Properties index.
1. Select the Nucleotide database.
2. Select Index.
3. Select the Properties index from the View Index pull-down menu.
4. Type "0" (the number zero) without quotes in the View Index query box and select View.
Because index entries are listed alphabetically, the number zero instructs Entrez to begin the index display at the very first entry (i.e., biomol genomic).
The first few entries of the Nucleotide database's Properties index

Use the scroll bar to view further entries. Use the "Down" and "Up" buttons to display the next set of entries in either direction. The Properties search field and its corresponding index are very useful. This field contains information about the GenBank division to which the record belongs (i.e., gbdiv inv). It also describes the molecule type and location. The Properties field also describes such things like whether the sequence is part of a population study or segmented set.
Compare the Nucleotide database's Properties index to the Properties index of the other databases. A Properties index is not available for the Structure database.
Example: Examine the kind of information indexed in the Genome database's Feature key index.
1. Select the Genome database.
2. Select Index.
3. Select the Feature key index from the View Index pull-down menu.
4. Type "0" (the number zero) without quotes in the View Index query box and select View.
The first few entries of the Genomes database's Feature key index

Use the scroll bar to view the entries. Use the "Down" and "Up" buttons to display the next set of entries in either direction. The Feature key search field and its corresponding index are also very useful. This field contains information about the biological features of the nucleotide sequences as annotated by submitters and database staff. For more information on the Feature key field, please see the Feature Table Definitions at at .....
Compare the Genome database's Feature key index to the Nucleotide and PopSet databases' Feature key index. Feature key indexes are not available for the Protein and Structure databases.
Example: Go directly to an entry in the Structure database's Text Word index.
1. Select the Structure database.
2. Select Index.
3. Select the Text Word index from the View Index pull-down menu.
4. Type "heart" without quotes in the View Index query box and select View.
Entries found under "heart" in the Structure database's Text Word index

Browse, Select and Search Terms
Example: You want to search all sequences in the GenBank EST division.
The GenBank divisions are indexed in the Properties field of the Nucleotide and Genome databases. ESTs are found in the Nucleotide database.
1. Select the Nucleotide database.
2. Select Index.
3. Select the Properties index from the View Index pull-down menu.
4. Type "gbdiv" without quotes in the View Index query box and select View.
5. View the list of entries and locate the "gbdiv est" entry.
6. Select the gbdiv est entry by clicking on it once.
7. Select the gbdiv est entry as a search term by clicking "AND." Note that the term is now located in the Search query box as "gbdiv est"[Properties].
8. Select Go to execute this search.
Browsing, selecting and searching from the Nucleotide database's Properties index

Select, Combine and Search Multiple Terms
Example: You want all the population sets for humans, mice and drosophila.
1. Select the PopSet database.
2. Select Index.
3. Select the Organism index from the View Index pull-down menu.
4. Type "human" without quotes in the View Index query box and select View.
5. View the list of entries and locate the "human" entry.
6. Select the "human" entry by clicking on it once.
7. Select the "human" entry as a search term by clicking "AND." Note that the term is now located in the Search query box as "human" [Organism].
8. Type "mouse" without quotes in the View Index query box and select View.
9. View the list of entries and locate the "mouse" entry.
10. Select the "mouse" entry by clicking on it once.
11. Select the "mouse" entry as a search term by clicking "OR." Note that the term is now located in the Search query box with the human term (i.e., "human"[Organism] OR "mouse"[Organism]).
12. Repeat steps 8-11 above for drosophila so that the final search statement in the query box is:
"human"[Organism] OR "mouse"[Organism] OR "drosophila"[Organism]
13. Select Go to execute this search.

Select, Combine and Search Multiple Terms from Multiple Indexes
Example: You want all protein kinase sequences from pigs.
1. Select the Protein database.
2. Select Index.
3. Select the Organism index from the View Index pull-down menu.
4. Type "pig" without quotes in the View Index query box and select View.
5. View the list of entries and locate the "pig" entry.
6. Select the "pig" entry by clicking on it once.
7. Select the "pig" entry as a search term by clicking "AND." Note that the term is now located in the Search query box as "pig" [Organism].
8. Select the Text Word index from the View Index pull-down menu.
9. Type "kinase" without quotes in the View Index query box and select View.
10. View the list of entries and locate the "kinase" entry.
11 Select the "kinase" entry by clicking on it once.
12. Select the "kinase" entry as a search term by clicking "AND." Note that the term is now located in the Search query box as "kinase" [Text Word] and that the final search statement in the query box is:
"pig"[Organism] AND "kinase"[Text Word]
13. Select Go to execute this search.
REMEMBER that Entrez processes complex search statements using Boolean Operators in a specific order as described in the Boolean Operators section above. You can always check the Details button to see how your final search statements are executed.

Using Your History

History provides a record of the searches performed during a search session. This section provides examples for using your search history to:

Review a Search Session and Combine Results

Refine search results

Please review how to use Boolean Operators.

Review a Search Session and Combine Results
Example: Search for Streptomyces, Pseudomonas, and glucanase and then use History to combine results.
1. Select the Protein database.
2. Type "streptomyces" in the query box and select Go.
3. Select Clear.
4. Type "pseudomonas" in the query box and select Go.
5. Select Clear.
6. Type "glucanase" in the query box and select Go.
7. Select History.
8. Review your search History and results. Note that each search statement is numbered. Also note the time and number of results for each search statement.
9. Combine the results of your earlier searches using the search numbers and Boolean operators. For example: (#1 OR #2) AND #3. Select Go. 10. Select History to once again review your search History and results.
Protein database Glucanase Search History

Although search Histories are database specific, the History numbering system is continuous across all databases searched during a singular search session. For instance, let's say you just finished searching the Protein database using the example above. Next you want to search the Structure database for similar information. You cannot use your Protein database search History in the Structure database. However, as you start searching the Structure database, Entrez sequentially numbers the search sets based on the last search query executed in any database. Therefore, in this example, the first search query executed in the Structure database is numbered search #30. The next search query executed is numbered search #31 and so on. Entrez will save a maximum of 100 queries at a time.
A final note on search histories. If you search the same query in the same database during the same search session, the search set will only be saved in the History one time. It will be saved under the first set number not the latter ones.

Refine search results
Example: You are interested in any DNA sequences of the mouse fas antigen.
1. Select the Nucleotide database.
2. Type "mouse AND fas" without quotes in the query box and select Go.
3. The search retrieves 135 documents. You do not want to review all 135 documents and decide you are really interested in any sequences with annotated exons or introns.
4. Select History.
5. Refine the results of your search using the search number and Boolean operators. For example: #5 AND (exon OR intron). Select Go.
6. Select History to once again review your search History and results. Refining the search has reduced the number of retrieved documents to 14.
Mouse fas antigen Search History

Writing Advanced Search Statements

Complex search statements can be written and executed directly from the the query box of any of the five databases. As long as you obey some simple rules and use the correct syntax.
Perform a search by specifying the search terms, their fields, and the boolean operations to perform on the term. Use the following syntax:
term [field] OPERATOR term [field]
Where term(s) is/are the search terms, the field(s) are the Search Fields and Qualifiers , and the OPERATOR(s) are the Boolean Operators. Remember that Boolean operators are normally processed left to right. If you wish part of your boolean expression to be processed out of order, enclose it in parentheses.
Example: Find all human nucleotide sequences with D-loop annotations.
In the Nucleotide database use the following expression -
D-loop[FKEY] AND human[ORGN]
Example: Find all human protein sequences with lengths between 50 and 60 amino acids and that were entered into the database during 1999.
In the Protein database use the following expression -
human[ORGN] AND 50[SLEN]:60[SLEN] AND 1999[MDAT]
Example: Find drosophila population studies published in the Journal of Molecular Evolution
In the PopSet database use the following expression -
j mol evol[JOUR] AND drosophila[ORGN]

Displaying and Saving Results

Entrez displays search results as shown below:
Search results for "hiv protease," Nucleotide Database

The Search query box provides a summary of the database searched and the search terms as entered (i.e., "Search Nucleotide for hiv protease"). No Limits are applied as the Limits check box is not checked.

Display Button, Show Button, and Display Formats

Display Button - The default display format is the Summary format shown in the example above.
To change the Display format, select an alternate format from the format (i.e., Summary) pull-down menu and click the Display button.
To view the "graphical view" click on the accession number to display the GenBank report format. On the GenBank report format, click on the accession number one more time. The Entrez graphical view is displayed. Alternatively, select the Brief view from the display pull-down menu, click the Display button. Entrez will provide the graphical view this way as well.
Show Button - The default number of documents displayed is 20. The total number of pages are displayed to the far right of the Show button (i.e., Select page: 1 2). In this example, 30 documents were retrieved and since we are displaying 20 documents at a time there is a total of two pages. The Select page: numbers are hotlinked to enable quick navigation from one page to the next.
To change the number of documents displayed per page, select an alternate number from the number (i.e., 20) pull-down menu and click the Display button.
Change the Display button to brief and Show 50 documents per page. Note that the number of pages changes to one and there are no hotlinks to other pages since all 30 documents retrieved are displayed on page one.
See the Display formats table for a summary of the Display formats available by database.

Selecting Documents, Displaying Them or Accessing Their Links

A closer look at the results screen reveals more display options.
Closer Look at Display

Please note the check box to the left of each numbered result. Check boxes are used to select individual documents from a set of documents retrieved. Once selected, the documents can be displayed (in various formats), saved to the clipboards, or saved to a local disk. Select documents 1, 3, and 5 by clicking the check box. Documents are deselected by unclicking the check box.
Select Documents Using Check Boxes

Display documents 1, 3, and 5 in FASTA format by selecting Display FASTA and then clicking Display.
Display FASTA format of Selected Documents

For a useable FASTA format that can be easily used in other applications, select the Text button. The Text button uses your browser to display the sequence in FASTA format. See the example below. Copy and paste the sequence from the browser to other applications. Also see the section below on saving to local disk for information on saving more useable data formats from Entrez.
The Text Button display of FASTA format of Selected Documents

Also note that on every results display screen, document Accession numbers are always hotlinked. If selected, these hotlinks also display the graphical view. In the Structure and PopSet databases, these links display the Structure and PopSet summaries, not the graphical view. Select the hotlink to the graphical view for Accession Number AJ010487. >From the Graphic Summary, in the left menu bar, link to the GenBank, FASTA, or Feature Table.
Display Graphic Summary of Individual Document (AJ010487)

On the Graphic Summary for AJ010487, click on the interval 1 - 297 to see the full graphical presentation, including all annotated features, of this sequence.
Display Full Graphical View of AJ010487

Finally, note the links to other databases to the far right of the results display. Select the PubMed link for Accession Number AJ002505.
Display PubMed Link of Individual Document

Details Button, Add To Clipboard, and Save

Details Button - Click the Details button to display your search strategy as translated using Entrez's search and syntax rules. The Details window also contains error messages, when applicable. Note that the Details reports the database searched, the number of documents retrieved (with hotlink to the documents) and your search statement as written (i.e., not translated by Entrez). Within the Details window, you can modify and resubmit your search strategy. Submit the modified search query by selecting the Search button.
Details Button

Adding to the Clipboard - Select documents 1, 3, and 5 from the results set by clicking on the check box adjacent to the document number. Then click the Add to Clipboard button. Note that 3 items were added to the Clipboard. You are also reminded that the Clipboard is limited to 500 items and that these three items will be lost after one hour of inactivity during a single search session. Also, please note that the document numbers for these items (i.e., documents 1, 3, and 5) are now shown in green to indicate that they are on the Clipboard. This feature is useful because as you continue to search, if these documents are retrieved through other search strategies their document numbers will appear in green to indicate that they are already on the Clipboard.
Adding to the Clipboard

Retrieving documents from the Clipboard - Select the Clipboard button. The items on the Clipboard are displayed in the default Summary format. Note the documents are renumbered, but the numbers are in green to indicate that the items are on the Clipboard. Also please note that you can display Clipboard items in all available formats and you can link to document neighbors or related items in other databases. Items are removed from the Clipboard by selecting the items using the checkbox and selecting the Remove from Clipboard button.
Retrieving Items from the Clipboard

Saving to a local disk - Select the Save button at the top (or bottom) of the results display screen next to the Text button. Documents can also be saved from the Clipboard in the same manner described here. Before clicking the Save button, decide two things: Which documents you want and in what format. After selecting your documents by clicking on the check boxes and choosing the format using the format pull-down menu, click the Save button. You are prompted to name the file to which the results are saved on your local drive. If you do not select specific documents, all documents in the results set are saved. In the example below, documents 2, 3, 4, 6, and 9 will be saved to disk in the FASTA format. If these documents were not selected, all 30 documents (i.e., the entire retrieved set) would be saved to disk in the FASTA format.
Saving Selected Documents to Local Disk

Note that when you need FASTA formatted data, saving to disk provides the most useable FASTA format available from Entrez. Shown below is one of the 5 documents saved to disk in FASTA format in the above example. This format is ready for use in other applications.
>gi|2582741|emb|AJ002508.1|HIVAJ2508 Human immunodeficiency virus type 1 protease gene (isolate B07 week 48) CCTCARRTCACTCTTTGGCARCGACCCCTCGTCACAATAAAGATAGGGGGGCAACTAAAGGAAGCTCTAT TAGATACAGRAGCAGATGATACAGTAKTAGAAGAMATRASTTTRCCAGGAAGRTGGAAACCAAAAATGAT AGGGGGAATTGGAGGTTTTWTCMAAGTAAGACAGTATGATCAGATACTCRTAGAAATCTGTGGRCATAAA GCTATAGGTACAGTATTAGTAGGACCTACACCTGTCAACATAATTGGAAGAAATCTGTTGACTCAGMTTG GTTGCACTYTAAATTTT
Printing - Use the Print function of your Web browser. As with Saving to local disk, before printing, decide two things: Which documents you want to print and in what format. Because you are using the Web browser print function, you can only print documents that are displayed. Therefore, consider increasing the number of documents displayed per page so that the total number of documents you want to print are displayed on one page. Print hints: To save paper, consider using the Text or Save buttons before printing. Doing so will eliminate everything but the actual data you need (i.e., Entrez search interface, menu bars). If you use the Text button, print from your web browser. If you use the Save button, print from another application on your machine.

Link Out

Linkout is a service that provides links from PubMed citations to full-text journal articles, biological data, sequence centers, etc. These other resources provide a URL, resource name, brief description of their web site, which PubMed uses to create the links to their sites. User registration, a subscription fee, or some other type of fee may be required to access the full text of articles in some journals using this feature. Information for developers is available at: http://www.ncbi.nlm.nih.gov:80/entrez/query/static/linkoutoverview.html