Below we describe each step of the pipeline in some detail. All datasetsgenerated by the individual stages of the processing pipeline are madeavailable as downloads. Appendix 11 lists theavailable files for each dataset.

Users can download DRISEE values as a tab-separated file. The first lineof the file contains headers for the values in the second line. Thesecond line contains DRISEE percent error values for A substitutions(A_err), T substitutions (T_err), C substitutions (C_err), Gsubstitutions (G_err), N substitutions (N_err), insertions and deletions(InDel_err), and the Total DRISEE Error. The third line indicatesheaders for all remaining lines. Rows 4 and 4+ present the DRISEE countsfor the indexed position across all considered bins of ADRs. Columnvalues represent the number of reads that match an A,T,C,G,N, or InDelat the indicated position relative to the appropriate consensus sequencefollowed by the number of reads that do not match an A,T,C,G,N, orInDel.

After the profiles have been downloaded, the analysis is no longerdependent on the MG-RAST server resources, instead using the computerthe browser is running on. This is achieved via the JavaScriptfunctionality in your browser (please make sure its enabled). Also datais stored in memory, providing you with a good reason to maximize thememory (RAM) of the machine you are running the analysis on.

The heatmap/dendrogram (Figure 5.22) allows anenormous amount of information to be presented in a visual form that isamenable to human interpretation. Dendrograms are trees that indicatesimilarities between annotation vectors. The MG-RAST heatmap/dendrogramhas two dendrograms, one indicating the similarity/dissimilarity amongmetagenomic samples (x-axis dendrogram) and another indicating thesimilarity/dissimilarity among annotation categories (e.g., functionalroles; the y-axis dendrogram). A distance metric is evaluated betweenevery possible pair of sample abundance profiles. A clustering algorithm(e.g., ward-based clustering) then produces the dendrogram trees. Eachsquare in the heatmap dendrogram represents the abundance level of asingle category in a single sample. The values used to generate theheatmap/dendrogram figure can be downloaded as a table by clicking onthe download button.

MG-RAST versions 1 and 2 had this type of output, but MG-RAST v3 doesnot. MG-RAST version 3 has been optimized for large (Gbase+) datasets,and per-read annotation for large datasets is extremely bulky anddifficult to interpret. The per-read annotations are not stored in afile on the server, but can be downloaded using the MG-RAST API.

Every completed MG-RAST dataset has a page where you can download thefiles produced by the different stages of the analysis, click on thelink on the metagenome overview page. Datasets which have been madepublic have links to an ftp site at the top of this download page whereyou can access additional information.


