MetaRouter v1.0. Help. ALMA Bioinformatica

Introduction
Database queries
Database administration
Applications
- PathFinder
Notes
Database schema
Help & on-line support
Acknowledgments
About

Introduction

Biorremediation requires the integration of huge amounts of data from different sources: chemical structure and reactivity of organic compounds; sequence structure and function of proteins (enzymes); comparative genomics; environmental biology, etc.

MetaRouter is a system for maintaining heterogeneous information related to Biodegradation in a framework that allows its administration and mining (application of methods for extracting new data). It is an application intended for laboratories working in this area which need to maintain public and private data, linked internally and with external databases, and to extract new information from it.

This is the On-line Help. Take a look at the User Manual for more detailed information on the system.

The user interacts with the system through a web interface. The main menu is shown on the left. (1).

Database queries

In this section, you can query the data contained in the database using simple web-based forms or complex SQL sentences.

Compound queries

All the information on chemical compounds contained in the database can be retrieved here.

At the top of the form, a list with all the compounds and their synonyms is shown. The user can select one or more compounds from this list. Synonyms are indented in this list. (2).
The rest of the form allows selection of compounds by part of their name, by part of their smiles code (3), by a range of molecular weight or by a range of values of an associated property (solubility, density, etc.; See Compound administration and (10)). The smiles code can be useful for selecting compounds with given chemical characteristics or substructures. For example C=O will retrieve the compounds containing a carbonyl group; CCCCC will retrieve compounds with 5 or more linear saturated carbons. This is not exhaustive as the same functional group or substructure can be described with more than one smiles string.

To retrieve the information for the selected compounds or those that match the search criteria, press "Search".
For each compound, the following information is shown: name (and synonyms), smiles code (3), formula, image of the chemical structure, canonical three-dimensional (3D) structure in PDB format (4), molecular weight, list of properties and associated values (10) and UMBBD code (this is an active link to the page for this compound in the UMBBD database (5)).
"Find degradative pathway" runs the PathFinder system for the compound.

The results are arranged into pages that can be accessed through the links in:
xx compound(s) found. Page(s) 1 2 3 ...

For querying compounds from the database using SQL sentences, see SQL queries.

Reaction queries

All the information on chemical reactions contained in the database can be retrieved here.

It is possible to search for reactions by compound(s) acting as substrate(s), by those acting as products, by enzymes implicated and by organisms where these enzymes are present (see Enzyme queries). Use the "Add" and "Remove" buttons to fill the substrates and products lists with the desired compounds. The "Add" button opens a dialog box where you can select compounds directly on the full list or searching by part of their names. This search is performed in the synonyms list too. If more than one substrate, product or enzyme is selected, they can be combined with AND/OR (at the bottom of the lists)

By pressing the "Search" button, the reactions matching the search criteria are shown. The results are also arranged into pages as in the case of the "Compound Queries".
For each reaction, the chemical structures of substrates and products, the name of the enzyme and the UMBBD code (5) of the reaction are shown. All these items are hyperlinked to the database information for compounds (see Compound queries), to the database information for enzymes (see Enzyme queries) and to the UMBBD page for the reaction respectively.

If Include link to "administration" is checked, a link is included for each one of the reactions that goes directly to the administration page for that reaction. See Reaction administration.

For querying reactions from the database using SQL sentences, see SQL queries.

Enzyme queries

It is possible to select the enzymes you are interested in directly from the full list or to search by certain criteria: Part of the enzyme name, values in the 4 positions of the EC code, and organisms where the enzyme is present.
For example, if you want to look for all the oxidoreductases present in Pseudomonas putida that are in the database, enter a "1" as the first position of the EC code, select Pseudomonas putida in the list of organisms and press "Search".

For all the enzymes matching the search criteria the following information is shown: enzyme name, UMBBD (5) code, EC code and the organisms where the corresponding gene is present. The UMBBD code is linked to the information contained in UMBBD for that enzyme, the EC code is linked to the corresponding entry in the ENZYME (6) database and the name of each organism is linked to the corresponding entry in the sequence database (sequence of that enzyme in that organism) through the SRS system (7). There is also a link ("Associated reactions") to the list of reactions where that enzyme is involved (see Reaction queries).

For querying enzymes from the database using SQL sentences, see SQL queries.

SQL queries

It is possible to directly interrogate the MetaRouter database using SQL (8) sentences. This requires knowledge of database technology and SQL syntax but allows you to carry out complex queries just by typing a few words. It is intended for expert users.

For constructing these sentences you need to know the data model of the database: name of the tables, relations, etc. (see Database schema). It is not possible to modify the database here since only select commands are allowed (see SQL administration for modifying the database directly via SQL). The sentences should end with a ";".

For example, this sentence will show all the enzymes in the database that are substrate specific, that is, their 4th EC code is a number (like 1.1.2.3 and not, for example 1.2.-.- that would represent a general enzymatic class):

select * from enzyme where ec_code4 is not null;

Database administration

In this section, the user can modify the contents of the database (delete, add or modify records) using simple web-based forms or complex SQL sentences.

Since the actions here can modify the database, this section is protected by password. Take a look at SQL administration for information on how to change the username/password.

Compound administration

Use this section to delete compounds, add new ones or modify the information for a given compound.

Modifying the information for a compound: Select the compound you want to modify in "Compound(s)" by picking it (or any of its synomyms) from the full list or by searching by part of the name in the boxes above. On pressing "View", the information for this compound is shown and you can modify it (see "Compound information" below). Press "Update" at the bottom of the page to include the modifications in the database.
Deleting compound(s): Select one or more compounds and press "Delete" at the bottom of the page.
Inserting a compound: Insert the information you have for the compound (see "Compound information" below) and press "Insert" at the bottom of the page.

Compound information. The only information needed is the name, the rest is optional.

Name: Any string of characters and symbols, including spaces.
Synonyms: One synonym per line. Any string of characters and symbols, including spaces.
Minnesota code: The identifier of the compound in the UMBBD (5) database. cNNNN where N represents a digit (0-9).
SMILES code: The SMILES (3) codification of the chemical structure. Any string of characters but space.
Chemical formula: For example C6H6Cl2O5.
Molecular weight: A real value. For example 123.3
Image:A graphic file (GIF, JPG, ...) containing the image of the compound. Use the "Browse" button to search for the file. It is better to use images with white or transparent background and smaller than 300x300 pixels. You can get the image for a given compound by:
- Drawing it yourself.
- Using a chemical drawing program.
- Retrieving it from public resources, like ChemFinder (http://chemfinder.cambridgesoft.com/).
- Using systems for generating images from the SMILES string (3). For example: http://www.daylight.com/daycgi/depict
3D structure: The three-dimensional structure of the compound (coordinates of the atoms) in PDB format. You can browse the file containing the structure or copy/paste it in the the text box.
You can obtain the canonical 3D structure from the SMILES (3) string with the CORINA system (http://www2.chemie.uni-erlangen.de/software/corina/).
Compound properties: Use the button "Compound properties" to assign or change values of existing properties for the current compound. For creating new properties use "Properties administration" (see below).
For changing or assigning values for the properties type the values in the corresponding boxes and press "Update". Property values must be real numbers (10). For removing property values check the corresponding check-boxes and press "Remove". Press "Done" when finished.
Properties administration: Use this section to create or modify existing properties (not the values associated to the compounds, see above).
To remove one or more properties check the corresponding check-boxes and press "Remove". To change the name of a property edit the corresponding text and press "Update". To insert a new property type its name in the blank text-box and press "Insert". Press "Done" when finished.

Reaction administration

Use this section to add, delete or modify reactions in the database.

Modifying the information for a reaction: Choose the reaction you want to modify and press "View". The substrates, products and enzyme involved in the reaction are marked in the corresponding lists. Change this information (selecting and un-selecting compounds, substrates and enzyme) and press "Update".
You can also find the reaction you want to modify in Reaction queries and follow the Administration link.
Deleting reaction(s): Choose the reaction or reactions you want to remove from the list and press "Delete".
Adding new reactions: Select the substrates, products and enzyme involved (if it is known) from the corresponding lists and press "Insert".
- As in the other cases, use the "Add" and "Remove" buttons to fill the lists of substrates and products with the desired compounds. You can select an enzyme directly from the main list or search by its name in the box above.
- Use "New compound" or "New enzyme" to insert compounds or enzymes in the database if they are not there, before creating a reaction involving them. See Compound administration and Enzyme administration.
- IMPORTANT: To insert a reaction that goes from a compound(s) to the standard metabolism (end-points in biodegradation pathways), do not select any compound as product. For example, if you select A as substrate and no products, you are inserting the reaction
  A ---> InMet (InMet: intermediate metabolism, standard metabolism)

Enzyme administration

Use this section to add, delete or modify enzymes in the database.

Modifying the information for an enzyme: Choose the enzyme from the main list (or search for it by its name in the box above) and press "View". Modify the information (see below) and press "Update".
Deleting enzyme(s): Select one or more enzymes as described in the previous point and press "Delete".
Adding a new enzyme: Fill in the fields for which you have information and press "Insert". Only "Enzyme name" is required, the rest is optional.

Enzyme information

Enzyme name: Any string of characters and symbols, including spaces.
EC code: An integer (or "-" to indicate generic class) in the four positions. You can obtain EC codes from the ENZYME database (6).
Minnesota code: The code in the UMBBD database(5). eNNNN where N represents a digit (0-9).

Use the button "Organisms and DB entries" to link the selected enzyme to a set of organisms (the organisms where this enzyme has been found) and to the corresponding sequence-database entries:

To delete the link between the enzyme and the organism or organisms selected in the list, press "Delete".
To add a link between the enzyme, an organism and optionally a database entry, select the organism in the list, type a sequence identifier in "sequence DB entry code" and press "Insert". Sequence identifiers from SWISSPROT, TREMBL and TREMBLNEW are accepted.
To change the DB identifier select the organism, press "View", modify the identifier and press "Update".

Press "Back" to return to the enzyme administration form.

SQL administration

It is possible to modify the MetaRouter database using SQL (8) sentences. This requires knowledge of database technology and SQL syntax but allows you to carry out complex modifications just by typing a few words. It is intended for expert users.

For constructing these sentences you need to know the data model of the database: name of the tables, relations, etc. (see Database schema). The sentences should end with a ";". Only insert, update and delete commands are allowed here.

For example, the following sentence would delete all the organism associations for the enzyme(s) which contain enzymeX in their names:

delete from belong_to where enzyme_id in (select enzyme_id from enzyme where name like '%enzymeX%');

Changing the username/password for administration:

Use the following sentence in SQL administration to change the username (to uuuuu) and password (to ppppp) for database administration:

insert into admin_user values ('uuuuu','ppppp');

Applications

MetaRouter was designed as a framework where to include programs for mining the data of the database described above. This can be done with applications that can be added to the system.

PathFinder

PathFinder is a system for locating biodegradative pathways for a set of compounds, that is, pathways that go from those compounds to the standard metabolism. It can also locate pathways between two sets of compounds.

This system uses the reactions included in the database for locating those pathways.
In PathFinder, a "state" is a set of compounds. The system walks from one state to another using the reactions. For example, if we have a state with the compounds A, B and C and we apply the reaction B --> H we end up in the state (A,H,C). If we apply to the same initial state (A,B,C) the reaction B --> C we end up in the state (A,C). The final goal is to reach the state (InMet), that is, everything goes to the standard metabolism. (9)

The PathFinder input form allows you to select the compound or compounds you want to degrade (initial state) and optionally an additional set of final compound(s). As ever, you can select the compounds directly from the list or search by part of their names. If you un-select the "to standard metabolism" checkbox the system will try to locate pathways from the initial set of compounds to the final set.
Once you have selected the compound(s) you want to degrade, press "Find pathway ...".

All the possible degradative pathways for this compound(s) are shown with the corresponding connections between them, like a network of reactions.
By default, only the images of the compounds and the reaction arrows are shown in the representation. You can select which elements you want to represent (Image, Compound name, Formula, Molecular weight, Smile Code, Minnesota Code, Enzyme and property values) by checking them in the "Show" box and pressing "Redraw". For example, for large and confusing pathways you can switch off the representation of images and switch on the representation of names.
Compounds and reaction arrows can be colored by a given compound property or EC code. For that, select the coloring criteria in "Color compounds by" and "reactions by" and press "Redraw". In this case you can switch the representation of the color scale on and off with the "scale" checkbox.
You can represent all the pathways (default), only the shortest one, the ones where the involved enzymes are present in a given organism(s) or the ones where the involved compounds have the value of a property into a given range. For that, select the option you want in the "Restrict" box and press "Redraw".
In the representations, the compounds (image, name, etc) are hyperlinked to the corresponding compound information pages in the database (see above), the reaction arrows are linked to the reaction information pages and the enzyme names to the enzyme information pages.
Press "New Run" to run PathFinder for another compound(s).

If you want to export the image containing the representation of the pathways, use the "Save image" button. DO NOT use the "save image as..." feature of the web browser. You can add a title to the image.

Example:
Let's inspect the possible degradative pathways for toluene.
First select "Toluene" in the list of initial compounds. For that, you can type "toluene" in the search box (which will fill the search list with all the compounds containing "toluene" in their names) and then look for "Toluene" there. On pressing "Find degradative pathway" you will see the degradative network for toluene in a large representation. Move the scroll bars in your web browser to navigate through the representation. If you switch off "image" and switch on "name" and "enzyme" you get an easier representation with only the names of the compounds and the enzymes involved. Go back to the original representation by switching on "images" and switching off "names". Then select "shortest one" and press "Redraw". You see that, despite the large number of possible pathways, the shortest degradative pathway for toluene is composed of only 4 reactions. To see which pathways could be carried out by Pseudomonas putida select "Show by"-"Organisms", select this bacteria in the list of organisms and then press "Redraw".

Notes

(1) Using a web interface is a great advantage since the system can run in a central machine and be used from any small computer with just a web browser. However it has the disadvantage that some of the features (frames, fonts, etc.) depend on the browser and version used (MS-Explorer, Netscape, ...). If the interface looks odd try adjusting the preferences on your web browser (font type and size, etc.).
The server closes the connection after some time of user inactivity (idle time, ~1/2 hour).

(2) The procedure for multiple selection on lists depends on the web browser used. Try just clicking on more than one item, or combining with the [SHIFT] or [CTRL] keys.

(3) SMILES is a system for coding chemical compounds as linear strings of ASCII characters. It was developed by Daylight Chemical Information Systems, Inc. (http://www.daylight.com/).
More information: http://www.daylight.com/smiles/f_smiles.html

(4) It is possible to configure the web browser for automatically opening a program for visualizing 3D structures (moving, rotating, etc.) when clicking on a PDB file, in the same way as it opens MS-Word on clicking a DOC file, for example. For that you have to install any such program, e.g. RasMol (http://www.umass.edu/microbio/rasmol/index2.htm) or Chime (http://www.umass.edu/microbio/chime/) (both free), and then configure your web browser to use that program when clicking on PDB files. Chime is easier to configure since it can be installed as a plugin for Netscape or MS-Explorer.

(5) The University of Minnesota Biocatalysis/Biodegradation Database (http://umbbd.ahc.umn.edu/) is the largest resource of information about Biodegradation on the Internet.

(6) ENZYME is a repository of information on enzymes (nomenclature, sequence, etc.) (http://www.expasy.ch/enzyme/).
Bairoch A. The ENZYME database in 2000. Nucleic Acids Res. 28:304-305(2000).

(7) SRS is a system for indexing, connecting and querying Molecular Biology databases (http://srs.ebi.ac.uk/). Although the system belongs to Lion Bioscience ( http://www.lionbioscience.com/) they maintain a free academic version.

(8) SQL (Structured Query Language) was developed by IBM as a standard language for interrogating relational databases. It is now implemented in most commercial and free database systems with little differences. See http://www.sql.org/.
The variant used in MetaRouter is that implemented in PostgreSQL (http://www.postgresql.org/).

(9) Working with "states" (sets of compounds) attempts to simulate an environment with a set of pollutants where a given reaction, carried out by a given bacteria, can modify one of the pollutants but not the others which "moves" the system to another "state" (another set of compounds) where another bacteria can act, etc. One could wonder which enzymes are needed to end up in the state InMet (all degraded), which are the bacteria that have them, etc.
But don't worry too much about that, if you just start with a one-compound state you will get the standard representations of reactions and pathways.

(10) Five properties are included in the original MetaRouter installation: density, melting point (^oC), boiling point (^oC), water solubility (mg/100mL) and evaporation rate. When only qualitative solubilty information was available, the following numerical values where asigned: "insoluble": 0.0; "slightly soluble": 0.1; "soluble": 10.0 and "very soluble": 100.0. You can define new properties and assign their values for the compounds in Compound administration.

Database schema

This is the data model implemented in the MetaRouter relational database. You need it mainly for constructing SQL sentences in SQL queries and SQL administration.

NOTES: role in is_part_of represents the role of the compound in the reaction where it is acting (1: substrate; 2:product).

Help & on-line support

The Help section allows you to access the corresponding section of this document in a context-sensitive way.
Take a look at the User Manual for more detailed information.
The On-line support option allows you to to send questions or comments by e-mail to the MetaRouter team at ALMA Bioinformatics, S.L. Please indicate your full name, address, email and telephone when sending questions by e-mail.
On-line support

Acknowledgments

We acknowledge Dr. Victor de Lorenzo, the members of his lab and ALMA Bioinformatics' staff for fruitful discussions.

The University of Minnesota Biocatalysis/Biodegradation Database (UMBBD) (5) was the main public source of information used for the initial filling of the database.

About

MetaRouter v1.1

David Guijas & Florencio Pazos
in collaboration with Dr. Victor de Lorenzo's lab (CNB)

ALMA Bioinformatics, S.L.

Metarouter represents data derived, with permission, from the University of Minnesota Biocatalysis/Biodegradation Database (UM-BBD, http://umbbd.ahc.umn.edu/), obtained on May, 2002.

ALMA Bioinformatics, S.L.
Centro Empresarial Euronova,
Ronda de Poniente, 4 - 2nd floor, Unit C-D
28760 Tres Cantos, Madrid, Spain
www.almabioinfo.com
alma@almabioinfo.com
Telephone: +34 91 141 71 50
Fax: +34 91 806 03 49

MetaRouter v1.1 On-line Help