Tutorial: Metannogen Data Management

This advanced tutorial demonstrates how metabolic networks are generated from the Metannogen datasets and exported as in text mode. In text mode the graphical user interface is not involved. It allows integration into script pipelines.

If you want to annotate an existing SBML file Metannogen, then you need another tutorial: Annotate SBML files.

If you just want to export a plain SBML file without additional information, you do not need this tutorial. Just use Menu-bar>File>Export ... .

This tutorial explains how two types of information are merged to generate the export file: It does not explain, how an existing SBML file is annotated with Metannogen annotations.
This tutorial explains SBML output. But since the program line option to -toSBML can be exchanged by another export option, this tutorial stands for any output format.
SBML is a standard format for metabolic networks and the exported SBML file is probably sufficient for your needs. However, for self defined data formats you will need to change the export plugin. This is described in the tutorial Customize the export.

The tutorial requires basic PC knowledge of the command Echo.

Background

The Metannogen datasets hold the expert knowledge which is entered by the data curators through Metannogen. Usually, datasets encode one or several metabolic reactions and contain the evidence in form of citations and database references that a certain reaction exists in the described biological system.

Though the datasets are stored as flat text lines, Metannogen is internally object oriented. The object model is created from the dataset and from optional files given as command line argument. It contains reaction and metabolite objects. These two object types are exposed by the and can be referred to by user defined scripts. This API is also used to generate the output files in different formats and allows the user to customize the output to her/his specific needs.

There exist two SBML exporters. One creates SBML directly from the reaction and species objects. The other first creates libSBML objects to use the export function of libSBML.

Metannogen provides only a few specific fields: Datasets identifier, equation of the metabolic reaction, name of the curator, compartment, EC-class. To enter information in a structured way for which no specific field in Metannogen exists, variable declarations inside the multi-line text-field are used. The variable declarations follow the syntax. The information is assigned to Reaction-objects where it can be retrieved with the getAttribute("Name of variable") method. This allows to use this information for the export of the network.

An increasing amount of information used for network simulations may be automatically computed or extracted from existing data sources. It is not recommended to store automatically generated information together with manually curated information. Instead it is recommended, to keep automatically generated information in separate files. Metannogen is able to merge the information from text files with those from datasets to generate the final model.

Preparation

All green on black command lines can be directly executed in the terminal. It requires a compatible command line interpreter as included in modern computer systems such as Macintosh and Linux. MS-Windows users need to install .
The first command "mkdir" creates the directory "~/metannogenTutorial/" in the home directory. The wget-line downloads Metannogen and the example dataset file. On Macintosh the command for downloading files is not but - please substitute "wget -N" by "curl -O".
mkdir ~/metannogenTutorial
cd ~/metannogenTutorial
wget -N http://www.bioinformatics.org/strap/metannogen/metannogen.jar
wget -N http://www.bioinformatics.org/strap/metannogen/tutorialData/myDatasets.datasets
    

Starting the graphical user interface

The file "myDatasets.datasets" contains a few datasets which can be viewed and edited with Metannogen. Start Metannogen by typing the following command and be prepared that large amounts of data will be loaded when Metannogen is run for the first time.
java -Xmx200M -jar metannogen.jar  -networks KEGG 
    
A form for selecting the dataset source appears. When the program is started for the first time, the KEGG metabolic network is downloaded (60 Megabyte) as specified by the command line option "-networks KEGG ". There are two radio buttons specifying whether datasets should be loaded from a file or from an URL. Select "File" and enter the file path:
~/metannogenTutorial/myDatasets.datasets
For your convenience, this text field offers tab-file-path completion. When Metannogen is running, the KEGG database is displayed in the left panel. Expand the Glycolysis tree node for a list of all reactions in the pathway Glycolysis. Those with a traffic light have a corresponding dataset. These datasets can be opened by clicking the tree nodes. Datasets can also be accessed from the tab "Datasets". Now leave the program. At this point modifications can be saved to the file "myDatasets.datasets".

This tutorial is about using the program in batch mode. From now on we will not use the graphical features any more.

SBML Output

To reduce the length of the command lines it is recommended to define an alias with the constant part of the command lines:
alias RunMetannogen="java -Xmx100M -jar metannogen.jar -stdout  -networks KEGG  -datasets myDatasets.datasets " 
    
Now use the -toSBML option to create the SBML output.
RunMetannogen -toSBML output.sbml 
    
This line produces the SBML file "output.sbml" from the dataset file "myDatasets.datasets". Please look at it with a text viewer to get an understanding of the XML structure.
less output.sbml
    
or
more output.sbml
    

Additional XML attributes

In the comment field of dataset R00235 the attribute "myAttribute" is defined. Please view the myDatasets.datasets file and locate these variable declarations. In the SBML-specification, "myAttribute" does not denote a specific attribute. To include this non-standard attribute in the output the option "-useReactionAttributes" is used. It takes a space separated list of attribute names.
RunMetannogen  -toSBML output.sbml  -useReactionAttributes myAttribute
fgrep myAttribute output.sbml
    
As a result, the attribute myAttribute defined in the dataset R00235 is included in the two localized reactions of this dataset: Nucleus and mitochondrial matrix. But attributes can also be defined in separate files. This allows separation of computed data from manually curated data.
echo -e 'R00235\t $anotherAttribute="one more" $yetAnother="another"' > attributes.txt
RunMetannogen  -toSBML output.sbml  -useReactionAttributes myAttribute anotherAttribute yetAnother -attributesOverride attributes.txt
fgrep anotherAttribute output.sbml
    
It is also possible to confine attributes to one compartment. Example with mitochondrial matrix:
echo -e 'R00235\t $anotherAttribute@mitoMx="one more" $yetAnother="another"' > attributes.txt
RunMetannogen  -toSBML output.sbml  -useReactionAttributes  myAttribute anotherAttribute yetAnother -attributesOverride attributes.txt
fgrep anotherAttribute output.sbml
    
To reduce typing, it is possible to specify more than one compartment suffices. The following three declarations are equivalent. Obviously, the last variant is the shortest:
  1. echo -e 'R00235\t $myAttribute@nuc="hello world" '> attributes.txt
    echo -e 'R00235\t $myAttribute@mitoMx="hello world"'>> attributes.txt
    						
  2. echo -e 'R00235\t $myAttribute@nuc="hello world"  $myAttribute@mitoMx="hello world"'> attributes.txt
    						
  3. echo -e 'R00235\t $myAttribute@nuc,mitoMx="hello world"'> attributes.txt
    						
The attribute name "$HL" is reserved: It contains space separated strings to be highlighted in publication abstracts.

Substitutions of Metabolites

Under certain circumstances some metabolites with distinct identifiers might be considered as being identical. This is when metabolite dictionaries come into play. Please consider D-glucose, beta-D-glucose and alpha-D-glucose which have the KEGG identifiers C00031, C00221 and C00267, respectively. These three forms are inter-converted spontaneously by a process called mutarotation. In a stoichiometric network they might be considered as one pool. This is achieved by representing all forms by only one identifier, here C00031. The following creates the appropriate Hash map or (Syn. hash table or dictionary).
echo C00221  C00031 >  dictGlucose.txt
echo C00267  C00031 >> dictGlucose.txt
    
Or if you like it compact. the following is equivalent:
echo C00221 C00267  C00031 > dictGlucose.txt
    
This hash-table is loaded with the option -dictionaryOfSpecies:
RunMetannogen  -toSBML output.sbml  -dictionaryOfSpecies dictGlucose.txt
fgrep "speciesType id" output.sbml
    
Watch the list of species in the output.sbml file. There should not be any C00221 and C00267 but only C00031.

Substitutions of Compartments

Imagine the nucleus and the cytosol should be treated as one compartment in the simulation. We give the union of both compartments the name "cytoOrNuc"
echo cyto cytoOrNuc  > dictCompart.txt
echo  nuc cytoOrNuc >> dictCompart.txt
    
Again this be contracted to:
echo cyto nuc cytoOrNuc > dictCompart.txt
    
RunMetannogen  -toSBML output.sbml  -dictionaryOfCompartments dictCompart.txt
fgrep cytoOrNuc output.sbml
    

Customization of the SBML-output

The SBML-format is under permanent development to include novel data types that are required by the Bioinformatics community. Metannogen offers the possibility to adapt its export functions to special needs. Adapting the output format at source code level is relatively easy and will be discussed in the tutorial Customize Export.