About ToppCluster

ToppCluster is a tool for performing multi-cluster gene functional enrichment analyses on large scale data (microarray experiments with many time-points, cell-types, tissue-types, etc.). ToppCluster facilitates co-analysis of multiple gene lists and yields as output a rich functional map showing the shared and list-specific functional features. The output can be visualized in tabular, heatmap or network formats using built-in options as well as third-party software. ToppCluster uses the hypergeometric test to obtain functional enrichment achieved via the gene list enrichment analysis option available in ToppGene (Chen, Xu et al. 2007).

ToppCluster input can be one of two types:

as separate clusters of genes which can be successively added and labeled
and, alternatively, as a two column list with genes in the first column with gene cluster labels in the second column

Various parameters can be selected for the tests like P-value cutoff and multiple testing correction methods. One or more annotation sources can be included in the results, with a choice of 17 different annotation types. Results can also be filtered by minimum and maximum genes present in annotations. Results can be obtained in tabular formats as comma-separated values, tab-separated values, HTML table or Microsoft Excel format. It is also possible to get the results in three visualization formats – a standard heatmap in a PDF file generate using R (R-Development-Core-Team 2007), TreeView (Eisen, Spellman et al. 1998 and Saldanha 2004) heatmap format files or Cytoscape (Shannon, Markiel et al. 2003) importable network formats.

Sections

1. The ToppCluster Interface

2. Using ToppCluster

3. HTML Table Output Results

4. Network Generator Output

5. TreeView Clustered Data Output

6. References

1. The ToppCluster Interface

[Back to top]

The ToppCluster interface is user-friendly and provides easy options to perform a comparative enrichment analysis. View the sections and their associated numbers to learn more about the ToppCluster interface.

1. Select the gene identifier and choose a cluster label.

2. Enter your gene cluster.

3. Add another cluster.

4. Submit the gene clusters.

5. Alternate input method.

2. Using ToppCluster

[Back to top]

1. Select what gene identifier is being used. Your choices include:

HGNC Symbol (Official).
HGNC Symbol and Synonyms.
Entrez ID.
Ensembl ID.
Uniprot.

2. Paste your gene cluster list in the input list box.

3. If you need to add more gene clusters, click on the add cluster button.

4. Click next to submit the input gene clusters and proceed to the next stage.

5. Alternate entry method in the form of a two column list, with genes in the first column and gene cluster labels in the second column.Columns can be separated by tab, comma, semi-colon or vertical bar.

Clicking the "Next" button on either input method takes you to the paramaters selection screen.

6. Select the annotations and cutoffs.

7. Choose the desired output format.

8. Submit for analysis.

6. By default, all annotations are selected. Select annotations you want to be included in the output and the correction method, p-value cutoff and gene limits (minimum and maximum number of genes allowed for an annotation) you want to use for each annotation. Your choices of correction methods include:

Bonferroni - Sets the significance cutoff to the P-value cutoff divided by the number of tests. For example, if the P-value cutoff is 0.05 and there are 100 tests, the significance cutoff would be set to 0.0005. The Bonferroni correction may be quite conservative, sometimes yielding a high false negative rate.
FDR - Controlling the False Discovery Rate (FDR), or the expected proportion of false positives among the significant results, is another approach used frequently. FDR correction is less stringent than Bonferroni; it may yield more false positives but much less false negatives.

7. There are two types of output formatting available: Interactive and Batch. In interactive formatting, results will be displayed on the screen or be available for immediate download once the system has processed your input. Batch formatting will send results to an email of your choice and includes options not available in the standard Interactive format.

Interactive

Comma Separated Values - Output data in a CSV file using commas to separate columns.
Tab Separated Values - Output data in a file using tabs to separate columns.
HTML Table - Output data in HTML tables to be displayed in a web browser. See section on HTML Table Output Results.
Network Generator - Output data in an interactive html table where results can be selected and exported to a Cytoscape importable XGMML network file or a static PNG image.

Batch

Comma Separated Values - Output data in a CSV file using commas to separate columns.
Tab Separated Values - Output data in a file using tabs to separate columns.
Microsoft Excel Format - Output data to be opened in Excel spreadsheet format.
Clustered Data (Zipped) - Output data in TreeView importable format in a compressed zip file.
PDF Heatmap - Output data exported to a PDF file with a heatmap.

Note: When you select Batch format, the system will ask you for an email address. Once you enter the email address you want the results to be sent to, a confirmation message will be shown similar to the following:

The job has been started. The results will be sent to EMAIL_ADDRESS_YOU_ENTERED.

You will receive an email from bmi@cchmc.org.

Select the formatting you want to use for output.

8. Click Run.

3. HTML Table Output Results

[Back to top]

The HTML Table Output Results show a table of multiple columns and rows such as Category, ID, Title (or Source), Verbose ID, and many other columns.

Back to Start
Each of the interactive HTML output pages contains a âBack to Startâ to return to your original ToppCluster screen.

Click on the "Back to Start" link to return to your original ToppCluster screen.

Shareable Link
Each of the interactive HTML output pages contains a ‘shareable link’ or a ‘long term link’ to retrieve output directly at a later time or to share the output with a collaborator. The results associated with a link are stored for 30 days from the time of generation.

Click on "Shareable Link" to create a stored session. Then highlight and copy the link to share.

4. Network Generator Output

[Back to top]

The initial Network Generator Output screen shows a table of multiple columns and rows such as Category, ID, Title (or Source), Verbose ID, and many other columns.

Navigate
The Navigation drop-down menu displays all the available annotation types in your Extended HTML Table. Select the annotation type you want to jump to and let the system automatically relocate your position to the category you selected.

Links
The Links section provides two options:

Back to Start - Return to your original ToppCluster screen.
Shareable Link - Rretrieve output directly at a later time or to share the output with a collaborator. The results associated with a link are stored for 30 days from the time of generation.

Highlighting
The Highlighting section provides an option to highlight genes in the "Gene Set" column. If a p-value is checked, all genes associated with that p-value are highlighted on the entire results page. To highlight genes, click on the "Highlight genes" check box.

Notice the genes highlighted red in the image below.

Select All
The checkbox in the header row next to the "Title (or Source)" column allows you to select all the checkboxes on the page.

Network Generator
After selecting all or some of the results the you desire to be included in the network output, click on "Next"

Network Generator Page
The Network Generator page allows you to select properties like the type of network, the layout algorithm and the file format.

Summary
Summary shows a count of the number of boxes you've checked in the previous screen. It is possible to go back in your browser and select more boxes.

Method
Two types of networks can be generated:

Gene Level - This is a complete network including the input gene list names, the enriched features and the corresponding genes.
Abstracted - This is an abstract view which excludes the genes from the network, retaining only the enriched features related to input gene list names via edges that are weighted by the significance score.

Layout
Five layout options are available:

Kamada-Kawai - JUNG implementation of the Kamada-Kawai algorithm.
Fruchterman-Reingold - JUNG implementation of the Fruchterman-Reingold algorithm.
Spring - JUNG implementation of the Spring layout.
Circle - Lay all nodes in a circle (JUNG implementation).
Meyer's Self-Organizing - JUNG implementation of Meyer's "Self Organizing Map" layout.

File format
Three formats are available:

XGMML - XGMML is an XML based graph representation format compatible with Cytoscape.
PNG - Network in PNG image format.
Text - Network data in a simple text format.

Preview
Preview shows a smaller version of the PNG format in the "Preview" window below.

Using XGMML in Cytoscape

An example XGMML file can be downloaded here.

In Cytoscape, choose "File" > "Import" > "Network (multiple file types)".

In the "Import Network" dialog, choose "Select".

In the "Import Network Files" dialog, locate and choose the XGMML file and back in the "Import Network" dialog, choose "Import".

The network can be viewed and analyzed further in Cytoscape.

5. TreeView Clustered Data Output

[Back to top]

TreeView format CDT files can be obtained by choosing the "Clustered Data (zipped)" option from the ToppCluster output formats. An email address needs to be provided; the results are emailed to this address.

Once computed and generated, a ZIP file containing the CDT file set is emailed to the email address provided. Additionally, an online Java TreeView link to the results is provided. This link is made available for 21 days.

An example CDT file set can be downloaded here.

Unzip the CDT files to a convenient location.

In Java TreeView, choose "File" > "Open".

In the "Open" dialog, locate the unzipped Clustered Data files, select the CDT file, and click "Open".

The clustered data tree can be viewed and analyzed further in TreeView.

6. References

Chen, J., H. Xu, et al. (2007). "Improved human disease candidate gene prioritization using mouse phenotype." BMC Bioinformatics 8: 392.

Eisen, M. B., P. T. Spellman, et al. (1998). "Cluster analysis and display of genome-wide expression patterns." Proc Natl Acad Sci U S A 95(25): 14863-8.

Alok J. Saldanha. (2004). "Java Treeview—extensible visualization of microarray data." Bioinformatics 20(17):3246-3248.

R-Development-Core-Team (2007). "R: A language and environment for statistical computing." R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.

Shannon, P., A. Markiel, et al. (2003). "Cytoscape: a software environment for integrated models of biomolecular interaction networks." Genome Res 13(11): 2498-504.