Differential gene expression analysis

Modified on Wed, 10 Jan at 11:30 AM

TABLE OF CONTENTS

Introduction
1 Creating a plot
2 Selecting data
3 Setting parameters
4. Performing the differential gene analysis
When the parameters are all set-up, you can click on the Run button to compute the differential gene analysis.
5 Differential gene analysis interactive plot page
6 Differential analysis video tutorial
7 Useful links

Introduction

Differential analysis is one of the most commonly performed analyses when interpreting omics data. Differential analysis means taking the (normalized) data and performing statistical analysis to discover quantitative changes in expression/abundance levels between experimental groups. For example, for each feature in the data we can perform statistical testing to decide whether an observed difference (change) in expression/abundance is significant or not, which means checking whether the change is greater than what would be expected due to natural random variation. Basically, differential analysis is used to determine whether there are any features that are significantly different between two groups.

The UniApp uses linear models (limma, MAST, T-test and Wilcoxon). This statistical technique is highly flexible and can accommodate a large variety of experimental designs, correct for confounding variables, etc. The results are summarized through a volcano plot, but the full results can still be explored through an interactive table.

1 Creating a plot

As a first step of the analysis, a plot must be created by clicking on the create plot icon. This will lead you to a section where the analysis of interest can be selected.

In order to ensure efficient organization, a name and description must be assigned to the analysis under the appropriate fields. Subsequently, select "Variation Analysis" under "Choose algorithm to run your analysis". Press "Select Algorithm" which will lead to the second column opening on the left.

2 Selecting data

Now the required project, track and plot can be selected. Differential gene analysis is done based on the results of Data Pretreatment, make sure to select the right track element. Press "Select Data" when the required fields are filled out and the third column will open on the left.

3 Setting parameters

Design: here you can define between which groups you wish to perform the differential analysis. Groups are defined by metadata categories.
Algorithm: the type of model to use, can be Limma or MAST. They are both generalized linear models. Limma is a popular method to analyze both bulk and single cell datasets, while MAST is more accurate for zero-inflated single cell data.

3.1 Differential analysis design

In the Design tab you can choose between which metadata groups you wish to perform the differential analysis.

Observation: set metadata variable whose groups you wish to compare. Note that this is supported only for categorical variables.
Reference group: sets reference group of the selected metadata variable. This can be usually be the control or healthy group of your experiment. Genes expressed highly in the reference group will appear as downregulated and with a negative logFC. You can select multiple groups.
Experimenal group: sets experimental group of the selected metadata variable. This can be usually be the treatment or diseased group of your experiment. Genes expressed highly in the experimental group will appear as upregulated and with a positive logFC. You can select multiple groups.
Covariates: the covariates to include during the modelling. The covariates can be chosen from the variables present in the metadata.
Scaling: In the case of metabolomics and proteomics data you will see this additional option under scaling (the following options are available: None, Auto, Center, Scale, Range, Pareto, Vast, Level).

There must be no overlap of observations/groups between the reference and experimental groups. Additionally, you cannot select as a covariate the same group you are using to perform the comparison.

3.2 Differential analysis algorithm

Sets the type of model to use, can be Limma or MAST. They are both generalized linear models (see Section TODO for more information). Limma is a popular method to analyze bulk datasets, while MAST is more accurate for zero-inflated single cell data. TODO (explain T-test and Wilcoxon)

4. Performing the differential gene analysis

When the parameters are all set-up, you can click on the Run button to compute the differential gene analysis.

As soon as the reduction is computed, the plot you just created will appear in your track. You can click on the "View interactively" button to explore the results of the differential gene analysis in the interactive plot page.

5 Differential gene analysis interactive plot page

As soon as the differential analysis is computed, a volcano plot will appear. A volcano plot shows the relationship between the p-values of a statistical test and the magnitude of the difference in expression/abundance values between the reference and experimental groups. Each dot in the plot is a feature. On the y-axis the -log10 p-values are plotted. On the x-axis the (log) fold change is plotted. By default, the blue color represents the significant features (p < 0.05), and the grey color represents the non significant features (p >= 0.05).

5.1 Plot parameters

You can also look at the differential analysis table by clicking on Show data table in the Visualization - Plot parameters tab. The table is interactive and sortable by clicking on column headers.

The columns in the table are:

Feature: the name of the feature.

Log fold change: represents the magnitude of the difference between the reference and experimental groups. If the value is positive, we say that the feature is upregulated in the experimental group (and downregulated in the reference group). If the value is negative, we say that the feature is downregulated in the experimental group (and upregulated in the reference group). It is important to note that this value actually represents the log fold change only if the data is in log-space. If the data was not log transformed, this value is just the difference between the experimental group mean and the reference group mean.

Average expression: the average expression/abundance across all observations used in the differential analysis.

P-value: the significance of the result. Usually the significance threshold is set at 0.05.

Adjust p-value: adjusted p-values calculated with the Benjamini-Hochberg procedure (false discovery rate, FDR).

5.2 Highlight features

In the Highlight features tab you can set the parameters to decide which features to highlight on the volcano plot. This can be done either by metric or by providing a custom list of features.

5.2.1 Highlight features - gene metrics

When highlighting features by metric you can select the following parameters:

Filter on adjusted p-value: toggles filtering on adjusted p-value.

p-value threshold: sets p-value threshold for a feature to be highlighted.

Minimal absolute log fold change value: sets minimal absolute logFC value for a feature to be highlighted.

Highlight color: sets color of highlighted features.

Background color: sets color of background features.

5.2.2 Highlight features - custom gene list

When highlighting features by a custom gene list you can select the following parameters:

Background color: sets color of background features.

Number of custom groups: sets the number of custom groups of features. For each custom group you can provide the list of features and set the color.

Custom gene set color: sets color for the custom gene set highlighted.

Features to highlight: here you can input a list of features to highlight, or search for a feature to select.

5.3 Marker format and color for numerical variables

In the Marker format and color tab you can change the appearance of the markers on the volcano plot.

Marker symbol: change marker symbol.
Marker size: adjust marker size.
Marker opacity: adjust marker opacity.

5.4 Details

The Details tab contains additonal options for customizing your plot.

5.4.1 Grid style

Show grid: toggles grid.
Grid width: adjusts grid width.
Grid color: changes grid color.
Border width: changes width of plot border.

5.4.2 Title style

Title: sets plot title.

Title font size: adjusts plot title font size.

Legend position x-direction: changes plot title position on the x axis.

Legend position y-direction: changes plot title position on the y axis.

5.4.3 Plot margins

Margin bottom: sets bottom margin.

Margin left: sets left margin.

Margin right: sets right margin.

Margin top: sets top margin.

Padding: adjusts margin padding.

5.5 Axes style

Here you can edit the axis style for the x,y and z axes.

Axis label: sets axis label.

Axis padding: adjusts axis padding.

Invert axis: inverts axis.

Dimension to plot on axis: set dimesion to plot on axis. In PCA you can generate plot from different pricinple components using this option.

5.6 Export settings

Here you can prepare your plot for export.

Export format: sets plot file format.

Width of plot: adjusts plot width.

Height of plot: adjusts plot height.

File name: set file name for exported plot.

6 Differential analysis video tutorial

Expected soon

7 Useful links

Differential analysis via linear models (limma)
Differential analysis via linear models (MAST)
Understanding linear models
Statistical tests
P-values
Adjusted p-values
Volcano plot