GeneSpring Data Analysis
Frequently Asked Questions (FAQ)
- What types of data can I use with GeneSpring?
- How do I load data?
- How can I filter my dataset on fold change?
- How do I compare two Gene lists?
- How can I export a Gene list?
- How can I merge two Gene lists?
- How can I create a New Experiment from existing samples?
- How do I perform a one-way ANOVA statistical comparison?
- How do I perform Clustering operations?
- How do I import KEGG pathways in GeneSpring?
- How do I view data in a box plot?
- How do I filter genes on a volcano plot?
- How do I view data in MA plot (for two-color arrays)?
- How do I use GeneSpring-R plug-ins?
Note: Agilent has created a series of animated step-by-step tutorials that cover some of the same topics as this FAQ. If you are a new GeneSpring user, these tutorials may be very useful as each goes into much more detail than this document is able to. These tutorials can be found on Agilent's website.
GeneSpring accepts data in the form of tab-delimited text files. It will automatically recognize and load the following types of data:
- Axon GenePix Pro 2 & 3
- Clontech Atlas Image 1-Color
- Clontech Atlas Image 2-Color
- Packard Biochip ScanArray/QuantArray
- Incyte GEMTools 2.4
- BioDiscovery Imagene 4
GeneSpring will also take any other type of data, however, you will have to manually designate the type of information in each column using the Column Editor. For more information about the Column Editor, see the manual data loading section.
First you must determine whether GeneSpring will automatically recognize the format that your data is in. To verify this, please see the above list. If your data is recognizable, continue with Automatic Data Loading, otherwise, scroll down to Manual Data Loading.
Automatic Data Loading:
Begin by dragging and dropping a single data file into the main GeneSpring Window. You should see the following window appear:
GeneSpring should already have the type of data you are using selected in the Choose File Format menu. Click on the button to Create a New Genome. Generally it is a good idea to create a new genome for every new dataset to avoid confusion. Select a name for your genome by typing into the Choose a Name field. Click Next…
§ Next, you should see the following window:
Select all of the data files you are interested in loading and click Add >>. Click Next… to proceed.
You should now see the Sample Attributes window (pictured below).
It is not necessary to fill in any information; however, if you are likely to share this data with others, it may be helpful to include some of the specifications for them. When you are finished, click Next…
Following this, you will see a window indicating that GeneSpring is creating the samples. After that is finished, you will see the following window:
Click Yes. You will see another window informing you that GeneSpring is creating an experiment. Once it is done you will be given the option to name the experiment. Select a name and click Save.
† You will now see the New Experiment Checklist window. Begin by clicking on Normalizations…
You will now see the Normalizations window. Depending on the type of data you are using, there will be certain default normalizations already selected. Generally it is a good idea to use GeneSpring’s defaults, however, if you would like to perform different or additional normalizations, there are many options. Make sure to pay attention to any warnings. Once you are finished, click OK.
You will now come back to the Checklist window. Click Parameters… The follwing window will appear:
In GeneSpring, parameters are used to classify each sample and split the samples into groups. To add a new parameter, click New Parameter… From the resulting menu, select a parameter from the list or select Custom Parameter and click OK. Generally it is simpler to create a custom parameter. Give the new parameter a name, and for each sample assign a value. For example, if you are loading four samples, two wildtype and two mutants, you would name your parameter “Type”. You would then type wt into the corresponding wildtype boxes and mut into the corresponding mutant boxes. These labels are simply examples. You can name your parameters anything, and as long as you label the correct samples with the same name, they will be treated as replicates. Once you are done, click Save.
Back in the Checklist window, click Experiment Interpretation… You will see the following window:
This window allows you to change the way your data is displayed and analyzed. In the Analysis section, the defaults are generally the best options. If you would like your data analysis to be based on Fold Changes or a simple Signal/Control ratio, make the appropriate change. In this section you may also exclude “Marginal” or “Absent” labeled data from your analysis. The most important section in this window is “How to Display Parameters”. It is in this section that you define which parameter you will use for your comparisons. If you are interested in comparing the wildtype and mutants created earlier in the parameters section, set the Parameter “Type” to Continuous, and all other Parameters to Do Not Display. It is generally a good idea to view only one parameter at a time. Once you are finished, click Save.
Back in the Checklist window, select the final option: Error Model… The following window will appear:
The Cross-Gene Error Model generates T-test p values for each gene as well as standard deviation and standard error. For more specific information see this document provided by Agilent. GeneSpring’s default is to base the Cross-Gene Error Model on Deviation from 1.0. If you would like, you can produce better statistical analysis results by basing it on Replicates. If you have replicates for each condition of the parameter you are using for your analysis, base your error model on Replicates and select the parameter you wish to use. Click OK.
This should bring you back to the Checklist window. Click Close. You are now done loading your data and you may begin your analysis.
Manual Data Loading:
The first step in manually loading data files is to put your files in the correct format. Open your file in Microsoft Excel and look at the columns. If you are using two-color chips, you will need to create a new file containing the following columns:
- Channel 1 Intensity (Median Background-Subtracted Intensity is Recommended)
- Channel 2 Intensity (Median Background-Subtracted Intensity is Recommended)
- A Unique Identifier for each Gene (this could be an Accession Number or Clone ID, etc.)
- A description of each gene (to make the genes easily identifiable)
Once you have created the file in Excel, save it as tab-delimited text. Drag and drop the newly created file into the main GeneSpring window. You will now see the following window:
Notice that the file format selected is Custom. Any time that you attempt to load data in an unrecognizable format, Custom will be the only choice. Because it is a good idea to keep each new data set in a different Genome, select Create a New Genome and give it a name. Click Next…
You will now see the Column Editor Window:
For each column, set the corresponding value in the Function row. Set Channel 1 and Channel 2 intensity to Intensity for the experimental data and Ctrl Chnl for the control. Set the unique identifier to Gene Identifier, flag column to Flags, and your description column to Description. In the Flag Values section, set the values for Present, Absent, and Marginal (usually 0, -100, -50 respectively). Once you are finished, click Next…
This will bring you to the point where you select other data files to load. Make sure that all of your data files are in the exact same format as the one you just loaded and follow the instructions under Automatic Data Loading from the mark(§).
For One Color Data:
From the Filtering menu, select Filter on Fold Change… You will see the following window:
In the column to the left, select the Gene List you are interested in using as the pool of genes that will be compared, and click Choose Gene List >>. Also in the lefthand column, select either the two samples you wish to compare or two cases of the selected parameter and set them by clicking on Choose Condition 1 >> and Choose Condition 2 >>. Select the data type you would like to use in the analysis. Normalized Data is the default and the recommended value. Next, choose the type of comparison you are interested in. You can view those genes which are greater in Condition 1, those which are greater in Condition 2, those equal in Condition 1 and Condition 2, or those of Condition 1 which are either greater or less than Condition 2 by the selected fold change value. Set the fold difference using the slider or type it into the box. Once you have everything set, click Save… to make a list of the genes that meet your criteria.
For Two Color Data:
Begin by selecting Scatter Plot from the View menu. You should see the following:
Right-Click and select Display Options… The “Horizontal Axis” and “Vertical Axis” tabs will allow you to set which data each axis represents. Once these are set to the two samples or conditions you are interested in comparing, click on the “Lines to Graph” tab. Here you can set the fold change lines to any value you wish. Click OK. Right-Click on the region you are interested in: for genes in which the vertical condition is greater than the horizontal condition exceeding the threshold, use the region above the lines, for genes where the horizontal condition exceeds the vertical condition, use the region below the lines. From the resulting menu, click Select Genes Within Lines. This will highlight the genes you are interested in. Right-Click again and select Make List From Selected Genes. A window will pop up giving you the opportunity to name and save a new gene list. If you are interested in making a list of those genes above and below the fold change lines, simply right-click on both regions using the Select Genes Within Lines command and continue as described.
In the left-hand column, right-click on the first gene list you are interested in comparing. Highlight Venn Diagram and from the resulting menu select Left (Red). Next, right-click on the second gene list and select Right (Green). If you wish, you may repeat the process for a third gene list, this time selecting Bottom (Blue). Once these are selected, highlight the “all genes” gene list so that in the bottom right corner of your Venn diagram “all genes” appears. The purpose of highlighting “all genes” is to provide a universal list that contains all the genes held in each of the lists you are comparing. If you are comparing only lists derived from a certain other list, you may highlight that list instead of “all genes”. The Venn diagram should look very similar to the following:
You may also create new gene lists from the overlapping regions. Simply right-click and select the appropriate Make list of genes in… command.
Exporting a gene list is very simple. Double-Click on a gene list in the left-hand column to bring up the Gene List Inspector window:
This window gives you a number of exportation options. Clicking on Save to File… allows you to save the entire gene list to a tab delimited text file. Clicking Print List… allows you simply to print the list, and clicking Copy to Clipboard allows you to copy the entire list. You can also select and copy a specific group of genes by control or shift-clicking. Once copied, the data will paste into Microsoft Excel very nicely. When you are finished, click OK.
From the Edit menu, select Edit Gene List… You will see the following window:
Select the first of the two gene lists you wish to merge. Click the Add All button. Select the second, and repeat. You can also choose only certain genes contained in a list by selecting them in the Filter Results box and clicking Add. Once you are finished adding genes, click Save Gene List. If you have performed other types of analysis on the same dataset and you wish to create a gene list from a list in an external file, simply copy the list and paste it into the Gene List Editor window under the Type a List tab. Once the list is pasted, it will appear under the Filter Results heading.
From the Experiment menu, select Create New Experiment… The Select Samples window will appear:
In this window, select each sample you would like in your new experiment, and click Add. The different tabs on the left-hand side give you many options for filtering out samples you are not interested in, allowing you to choose from a more select group. Once you have added all of the samples you are interested in, click Next… You will then be given the option of creating new parameters for the experiment and performing normalizations. Both of these procedures are described in the data loading section below the mark(†).
Agilent has created a very comprehensive guide explaining the basics of one-way ANOVA statistical comparisons. They have also produced a helpful guide explaining multiple testing corrections that will be useful in performing effective statistical tests.
Begin by selecting any of the different types of clustering under Tools >> Clustering… This will bring up the following window:
This window contains every type of clustering that GeneSpring offers. Agilent’s QT Clustering guide gives a good overview of clustering in general.
To import KEGG pathways, go to ftp://ftp.genome.ad.jp/pub/kegg/pathways/ with a web browser that supports FTP. Download the folder of the specific organism that you are interested in onto your local computer. In GeneSpring, from File -> New Pathway -> import new KEGG pathways. A more detailed tutorial can be found on Agilent’s website.
GeneSpring GX v7.3 offers box plot to view the data. A box plot can be produced in GeneSpring through View -> Graph -> Box Plot. Box plot view is used to look at, and compare, the distributions of expression values for the samples or conditions in an experiment. You can compare the distribution of the arrays before and after normalization.
Volcano plots allow you to visualize the relationship between fold-change and statistical significance. Volcano plots are used to visually represent differential expression between two different conditions and can be used in publications to provide a visual summary of p-values and fold-change values.
These are steps to filter on Volcano plot from GeneSpring GX user menual:
1. Select Filter on Volcano Plot from the Filtering menu to open the Filter on Volcano Plot window.
2. To select a gene list other than the current selection, do one of the following. Only genes that are members of the selected gene list are subjected to the filter.
- Right-click on the gene list to use in Navigator and select Set from the shortcut menu.
- Click on the gene list to use in Navigator, then click the Choose Gene List button.
3. To select the first condition, do one of the following:
- Right-click on the condition to use in Navigator and select Set from the shortcut menu.
- Click on the condition to use in Navigator, then click the Choose Condition 1 button.
4. To select the second condition, do one of the following:
- Right-click on the condition to use in Navigator and select Set from the shortcut menu.
- Click on the condition to use in Navigator, then click the Choose Condition 2 button.
5. Select one of the following test types: Parametric test-assume variances equal, Parametric test-don't assume variances equal, Parametric test-use all available error estimates, or Non-parametric test.
*Note: If the same condition is selected for Condition 1 and Condition 2, then t-test p-value is used as the test type.
6.Select the multiple testing correction algorithm to use: Bonferroni, Bonferroni Step Down (Holm), Westfall and Young Permutation (slow), Benjamini and Hochberg False Discovery Rate, or None.
7. Select one of the following values to associate with the gene list: Fold Change (fold-change between Condition 1 and Condition 2) or P-Value (p-value from statistical comparison of Condition 1 and Condition 2).
*Note: If Condition 1 and Condition 2 are different, then the graph is the -log10(p-value) vs. the log2(fold change). If Condition 1 and Condition 2 are the same, then the graph is the -log10(p-value) vs. the log2(normalized ratio).
8. Set the minimum fold difference for the filter by entering it in the box or by adjusting the horizontal slider.
9. Set the minimum P-value for the filter by entering it in the box or by adjusting the vertical slider.
10. (optional) Select the Main Window view option to lock the preview pane view to that of the main window. Otherwise the best view for the current filter is used.
11. (optional) Clear the Interactive Update check box if you don't want the graphical display to be dynamically updated with genes that pass the current filter settings. In this case, you won't be given any indication of how many genes pass the filter until you click the Save button. Clearing this option may help if the system becomes sluggish when working with large experiments.
12. Click the Save button to save the filter results as a gene list, for use in other filter steps or analyses.
An MA plot can provide insight as to the quality of 2-color array hybridizations and any indication of dye-bias. A tutorial of how to create MA plot can be found on Agilent’s website.
The GeneSpring GX R-integration package will allow you to expand the capability of GeneSpring by integrating R programs seamlessly into GeneSpring. With the integration package you will be able to use the power of R and BioConductor scripts in GeneSpring's rich graphical environment. More information can be found here.