Introduction
Biorremediation requires the integration of huge amounts of data from
different sources: chemical structure and reactivity of organic
compounds; sequence structure and function of proteins (enzymes);
comparative genomics; environmental biology, etc.
MetaRouter is a system for maintaining heterogeneous
information related to Biodegradation in a framework that allows its
administration and mining (application of methods for extracting new
data). It is an application intended for laboratories working in this
area which need to maintain public and private data, linked internally
and with external databases, and to extract new information from it.
This is the On-line Help. Take a look at the User Manual for more
detailed information on the system.
The user interacts with the system through a web interface. The main menu
is shown on the left.
(1).
Database queries
In this section, you can query the data contained in the database
using simple web-based forms or complex SQL sentences.
Compound queries
All the information on chemical compounds contained in the database
can be retrieved here.
At the top of the form, a list with all the compounds and their synonyms is shown. The user can
select one or more compounds from this list. Synonyms are indented in this list.
(2).
The rest of the form allows selection of compounds by part of their name,
by part of their smiles code (3), by a
range of molecular weight or by a range of values of an associated property
(solubility, density, etc.; See Compound administration and (10)).
The smiles code can be useful for selecting compounds with given
chemical characteristics or substructures. For example C=O will retrieve the compounds
containing a carbonyl group; CCCCC will retrieve compounds with
5 or more linear saturated carbons. This is not exhaustive as the same
functional group or substructure can be described with more than one
smiles string.
To retrieve the information for the selected compounds or those that
match the search criteria, press "Search".
For each compound, the following information is shown: name (and synonyms), smiles code
(3), formula, image of the chemical
structure, canonical three-dimensional (3D) structure in PDB format
(4), molecular weight, list of properties and associated
values (10) and UMBBD code
(this is an active link to the page for this compound in the UMBBD
database
(5)).
"Find degradative pathway" runs the
PathFinder
system for the compound.
The results are arranged into pages that can be accessed
through the links in:
xx compound(s) found. Page(s) 1 2 3 ...
For querying compounds from the database using SQL sentences, see
SQL queries.
Reaction queries
All the information on chemical reactions contained in the database
can be retrieved here.
It is possible to search for reactions by compound(s) acting as substrate(s),
by those acting as products, by enzymes implicated and by organisms
where these enzymes are present (see Enzyme
queries).
Use the "Add" and "Remove" buttons to fill the substrates and products lists
with the desired compounds. The "Add" button opens a dialog box where you can
select compounds directly on the full list or searching by part of their names.
This search is performed in the synonyms list too.
If more than one substrate, product or enzyme is selected, they
can be combined with AND/OR (at the bottom of the lists)
By pressing the "Search" button, the reactions matching the
search criteria are shown. The results are also arranged into
pages as in the case of the "Compound Queries".
For each reaction, the chemical structures of substrates and
products, the name of the enzyme and the UMBBD code
(5) of the reaction are shown. All these
items are hyperlinked to the database information for compounds
(see Compound queries), to the database
information for enzymes (see Enzyme
queries) and to the UMBBD page for the reaction
respectively.
If Include link to "administration" is checked, a link is included for each
one of the reactions that goes directly to the administration page for that
reaction. See
Reaction administration.
For querying reactions from the database using SQL sentences, see
SQL queries.
Enzyme queries
It is possible to select the enzymes you are interested in
directly from the full list or to search by certain criteria: Part of the
enzyme name, values
in the 4 positions of the EC code, and
organisms where the enzyme is present.
For example, if you want to look for all the oxidoreductases
present in
Pseudomonas putida that are in the database,
enter a "1" as the first position of the EC code, select
Pseudomonas putida
in the list of organisms and press "Search".
For all the enzymes matching the search criteria the following
information is shown: enzyme name, UMBBD
(5) code, EC code and the organisms where
the corresponding gene is present. The UMBBD code is linked to
the information contained in UMBBD for that enzyme, the EC code
is linked to the corresponding entry in the ENZYME
(6) database and the name of each organism is
linked to the corresponding entry in the sequence database (sequence of that enzyme
in that organism)
through the SRS system
(7). There is also a link ("Associated
reactions") to the list of reactions where that enzyme is
involved (see Reaction queries).
For querying enzymes from the database using SQL sentences, see
SQL queries.
SQL queries
It is possible to directly interrogate the
MetaRouter database
using SQL
(8)
sentences. This requires knowledge of database technology and SQL
syntax but allows you to carry out complex queries just by typing a few words. It is
intended for expert users.
For constructing these sentences you need to
know the data model of the database: name of the tables, relations, etc.
(see Database schema). It is not possible to modify
the database here since only select commands are allowed (see
SQL administration for modifying the database
directly via SQL). The sentences should end with a ";".
For example, this sentence will show all the enzymes in the database that
are substrate specific, that is, their 4th EC code is a number (like
1.1.2.3 and not, for example 1.2.-.- that would represent a general enzymatic class):
select * from enzyme where ec_code4 is not null;
Database administration
In this section, the user can modify the contents of the database
(delete, add or modify records)
using simple web-based forms or complex SQL sentences.
Since the actions here can modify the database, this section is
protected by password. Take a look at
SQL administration for information on how to change
the username/password.
Compound administration
Use this section to delete compounds, add new ones or modify the
information for a given compound.
- Modifying the information for a compound: Select the compound
you want to modify in "Compound(s)" by picking it (or any of its synomyms) from the full list or by
searching by part of the name in the boxes above. On pressing "View", the
information for this compound is shown and you can modify it (see "Compound
information" below).
Press "Update" at the bottom of the page to include the modifications in the database.
- Deleting compound(s): Select one or more compounds and press
"Delete" at the bottom of the page.
- Inserting a compound: Insert the information you have for the
compound (see "Compound
information" below) and press "Insert" at the bottom of the page.
Compound information. The only information needed is the name, the
rest is optional.
- Name: Any string of characters and symbols, including spaces.
- Synonyms: One synonym per line. Any string of characters and symbols, including spaces.
- Minnesota code: The identifier of the compound in the
UMBBD (5)
database. cNNNN where N represents a digit (0-9).
- SMILES code: The SMILES
(3)
codification of the chemical structure. Any string of characters but
space.
- Chemical formula: For example C6H6Cl2O5.
- Molecular weight: A real value. For example 123.3
- Image:A graphic file (GIF, JPG, ...) containing the image
of the compound. Use the "Browse" button to search for the file.
It is better to use images with white or transparent background and smaller
than 300x300 pixels. You can get
the image for a given compound by:
- 3D structure: The three-dimensional structure of the compound
(coordinates of the atoms) in PDB format. You can browse the file
containing the structure or copy/paste it in the the text box.
You can obtain the canonical 3D structure
from the SMILES (3) string with the CORINA system
(http://www2.chemie.uni-erlangen.de/software/corina/).
- Compound properties: Use the button "Compound properties" to assign
or change
values of existing properties for the current compound. For creating new
properties use "Properties administration" (see below).
For changing or assigning values for the properties type the values in the
corresponding boxes and press "Update". Property values must be real numbers
(10).
For removing property values check
the corresponding check-boxes and press "Remove". Press "Done" when
finished.
- Properties administration: Use this section to create or modify
existing properties (not the values associated to the compounds, see
above).
To remove one or more properties check
the corresponding check-boxes and press "Remove". To change the name of a
property edit the corresponding text and press "Update". To insert a new
property type its name in the blank text-box and press "Insert". Press "Done" when
finished.
Reaction administration
Use this section to add, delete or modify reactions in the database.
- Modifying the information for a reaction: Choose the reaction
you want to modify and press "View". The substrates, products and enzyme
involved in the reaction are marked in the corresponding lists. Change
this information (selecting and un-selecting compounds, substrates and
enzyme) and press "Update".
You can also find the reaction you want to modify in
Reaction queries
and follow the Administration link.
- Deleting reaction(s): Choose the reaction or reactions you want
to remove from the list and press "Delete".
- Adding new reactions: Select the substrates, products and enzyme
involved (if it is known) from the corresponding lists and press "Insert".
- As in the other cases, use the "Add" and "Remove" buttons to fill the lists
of substrates and products with the desired compounds.
You can select an enzyme directly from the
main list or search by its name in the box above.
- Use "New compound" or "New enzyme" to insert compounds or enzymes in
the database if they are not there, before creating a reaction
involving them.
See
Compound administration and
Enzyme administration.
- To insert a reaction that goes from a
compound(s) to the standard metabolism (end-points in biodegradation
pathways), do not select any compound as product. For example, if you select A
as substrate and no products, you are inserting the reaction
A ---> InMet (InMet: intermediate metabolism, standard metabolism)
Enzyme administration
Use this section to add, delete or modify enzymes in the database.
- Modifying the information for an enzyme: Choose the enzyme from
the main list (or search for it by its name in the box above) and press
"View". Modify the information (see below) and press "Update".
- Deleting enzyme(s): Select one or more enzymes as described in
the previous point and press "Delete".
- Adding a new enzyme: Fill in the fields for which you have
information and press "Insert". Only "Enzyme name" is required, the
rest is optional.
Enzyme information
- Enzyme name: Any string of characters and symbols, including spaces.
- EC code: An integer (or "-" to indicate generic class) in the
four positions. You can obtain EC codes from the ENZYME database
(6).
- Minnesota code: The code in the
UMBBD database(5). eNNNN where N
represents a digit (0-9).
Use the button "Organisms and DB entries" to link the selected enzyme to a set of
organisms (the organisms where this enzyme has been found) and to the corresponding sequence-database entries:
- To delete the link between the enzyme and the organism or organisms
selected in the list, press "Delete".
- To add a link between the enzyme, an organism and optionally a database
entry, select the organism in the list, type a sequence identifier in
"sequence DB entry code" and press "Insert". Sequence identifiers from
SWISSPROT, TREMBL and TREMBLNEW are accepted.
- To change the DB identifier select the organism, press "View", modify
the identifier and press "Update".
Press "Back" to return to the enzyme administration form.
SQL administration
It is possible to modify the
MetaRouter database
using SQL
(8)
sentences. This requires knowledge of database technology and SQL
syntax but allows you to carry out complex modifications just by typing a few words. It is
intended for expert users.
For constructing these sentences you need to
know the data model of the database: name of the tables, relations, etc.
(see Database schema). The sentences should end with a ";".
Only insert, update and delete commands are allowed here.
For example, the following sentence would delete all the organism
associations for the enzyme(s) which contain enzymeX in their names:
delete from belong_to where enzyme_id in (select enzyme_id from enzyme where name like '%enzymeX%');
Changing the username/password for administration:
Use the following sentence in SQL administration to change the username
(to uuuuu) and password (to ppppp)
for database administration:
insert into admin_user values ('uuuuu','ppppp');
Applications
MetaRouter was designed as a framework where to include programs for mining
the data of the database described above. This can be done with applications that can
be added to the system.
PathFinder
PathFinder is a system for locating biodegradative pathways for a set of compounds,
that is, pathways that go from those compounds to the standard metabolism. It
can also locate pathways between two sets of compounds.
This system uses the reactions included in the database for locating those
pathways.
In PathFinder, a "state" is a set of compounds. The system walks from one
state to another using the reactions. For example, if we have a state with the
compounds A, B and C and we apply the reaction B --> H we end up in the state
(A,H,C). If we apply to the same initial state (A,B,C) the reaction B --> C we end up in
the state (A,C). The final goal is to reach the state (InMet), that is, everything
goes to the standard metabolism.
(9)
The PathFinder input form allows you to select the compound or compounds you
want to degrade (initial state) and optionally an additional set of final
compound(s). As ever, you can select the compounds directly from the list or search
by part of their names. If you un-select the "to standard metabolism" checkbox the system
will try to locate pathways from the initial set of compounds to the final set.
Once you have selected the compound(s) you want to degrade, press "Find pathway
...".
All the possible degradative pathways for this compound(s) are shown with the
corresponding connections between them, like a network of reactions.
By default,
only the images of the compounds and the reaction arrows are shown in the representation. You can select
which elements you want to represent (Image, Compound name, Formula, Molecular
weight, Smile Code, Minnesota Code, Enzyme and property values) by checking them in the "Show" box
and pressing "Redraw". For example, for large and confusing pathways you can switch off
the representation of images and switch on the representation of names.
Compounds and reaction arrows can be colored by a given compound property or
EC code. For that, select the coloring criteria in "Color compounds by" and
"reactions by" and press "Redraw". In this case you can switch the
representation of the color scale on and off with the "scale" checkbox.
You can represent all the pathways (default), only the shortest one,
the ones where the involved enzymes are present in a given organism(s) or the
ones where the involved compounds have the value of a property into a given
range. For that,
select the option you want in the "Restrict" box and press "Redraw".
In the representations, the compounds (image, name, etc) are hyperlinked to the
corresponding compound information pages in the database (see above), the reaction
arrows are linked to the reaction information pages and the enzyme names to the
enzyme information pages.
Press "New Run" to run PathFinder for another compound(s).
If you want to export the image containing the representation of the pathways, use the "Save image" button.
DO NOT use the "save image as..." feature of the web browser. You can add a title to the image.
Example:
Let's inspect the possible degradative pathways for toluene.
First select "Toluene" in the list of initial compounds. For that, you can type
"toluene" in the search box (which will fill the search list with all the compounds
containing "toluene" in their names) and then look for "Toluene" there. On pressing
"Find degradative pathway" you will see the degradative network
for toluene in a large representation. Move the scroll bars in your web browser to
navigate through the representation. If you switch off "image" and switch on "name"
and "enzyme" you get an easier representation with only the names of the compounds
and the enzymes involved. Go back to the original representation by switching on
"images" and switching off "names". Then select "shortest one" and press "Redraw". You see
that, despite the large number of possible pathways, the shortest degradative
pathway for toluene is composed of only 4 reactions. To see which pathways could be
carried out by Pseudomonas
putida select "Show by"-"Organisms", select this bacteria in the list of
organisms and then press "Redraw".
Notes
(1) Using a web interface is a great advantage since the system can run in a
central machine and be used from any small computer with just a web
browser. However it has the disadvantage that some of the features (frames,
fonts, etc.) depend on the browser and version used (MS-Explorer, Netscape,
...). If the interface looks odd try adjusting the preferences on
your web browser (font type and size, etc.).
The server closes the connection after some time of user inactivity (idle time, ~1/2 hour).
(2) The procedure for multiple selection on lists depends on the web browser
used. Try just clicking on more than one item, or combining with the [SHIFT] or
[CTRL] keys.
(3) SMILES is a system for coding chemical compounds as linear strings of
ASCII characters. It was developed by Daylight Chemical Information
Systems, Inc.
(
http://www.daylight.com/).
More information:
http://www.daylight.com/smiles/f_smiles.html
(4) It is possible to configure the web browser for automatically opening a
program for visualizing 3D structures (moving, rotating, etc.) when
clicking on a PDB file, in the same way as it opens MS-Word on clicking a
DOC file, for example.
For that you have to install any such program, e.g.
RasMol (
http://www.umass.edu/microbio/rasmol/index2.htm) or
Chime (
http://www.umass.edu/microbio/chime/) (both free), and then configure your web browser to use that
program when clicking on PDB files. Chime is easier to configure since
it can be installed as a plugin for Netscape or MS-Explorer.
(5) The University of Minnesota Biocatalysis/Biodegradation Database
(
http://umbbd.ahc.umn.edu/) is
the largest resource of information about Biodegradation on the Internet.
(6) ENZYME is a repository of information on enzymes
(nomenclature, sequence, etc.)
(
http://www.expasy.ch/enzyme/).
Bairoch A. The ENZYME database in 2000.
Nucleic Acids Res.
28:304-305(2000).
(7) SRS is a system for indexing, connecting and querying Molecular Biology
databases
(
http://srs.ebi.ac.uk/).
Although the system belongs to Lion Bioscience
(
http://www.lionbioscience.com/)
they maintain a free academic version.
(8) SQL (Structured Query Language) was developed by IBM as a
standard language for interrogating relational databases. It is now
implemented in most commercial and free database systems with little
differences. See
http://www.sql.org/.
The variant used in
MetaRouter is that implemented in
PostgreSQL (
http://www.postgresql.org/).
(9) Working with "states" (sets of compounds) attempts to simulate
an environment with a set of pollutants where a given reaction, carried out by a given
bacteria, can modify one of the pollutants but not the others which "moves" the
system to another "state" (another set of compounds) where another bacteria can act,
etc.
One could wonder which enzymes are needed to end up in the state
InMet (all
degraded), which are the bacteria that have them, etc.
But don't worry too much
about that, if you just start with a one-compound
state you will get the standard representations of reactions and pathways.
(10)
Five properties are included in the original
MetaRouter installation:
density, melting point (
oC), boiling point (
oC), water
solubility (mg/100mL) and evaporation rate. When only qualitative solubilty
information was available, the following numerical values where asigned:
"insoluble": 0.0; "slightly soluble": 0.1; "soluble": 10.0 and "very
soluble": 100.0.
You can define new properties and assign their values for the compounds in
Compound administration.
Database schema
This is the data model implemented in the MetaRouter relational database. You need
it mainly for constructing SQL sentences in
SQL queries and
SQL administration.
NOTES: role in is_part_of represents the role of the
compound in the reaction where it is acting (1: substrate; 2:product).
Help & on-line support
- The Help section allows you to access the corresponding section of this document
in a context-sensitive way.
- Take a look at the User Manual for more detailed information.
- The On-line support option allows you to to send questions or comments by
e-mail to the
MetaRouter team at ALMA Bioinformatics, S.L. Please indicate your full name,
address, email and telephone when sending questions by e-mail.
On-line support
Acknowledgments
We acknowledge Dr. Victor de Lorenzo, the members of his lab and ALMA
Bioinformatics' staff for fruitful discussions.
The University of Minnesota Biocatalysis/Biodegradation Database (UMBBD)
(5) was the main public source of information used for the initial
filling of the database.
About
MetaRouter v1.1
David Guijas & Florencio Pazos
in collaboration with Dr. Victor de Lorenzo's lab (CNB)
ALMA Bioinformatics, S.L.
Metarouter represents data derived, with permission, from the
University of Minnesota Biocatalysis/Biodegradation Database
(UM-BBD, http://umbbd.ahc.umn.edu/), obtained on May, 2002.
ALMA Bioinformatics, S.L.
Centro Empresarial Euronova,
Ronda de Poniente, 4 - 2nd floor, Unit C-D
28760 Tres Cantos, Madrid, Spain
www.almabioinfo.com
alma@almabioinfo.com
Telephone: +34 91 141 71 50
Fax: +34 91 806 03 49