Differential gene expression analysis

Modified on Wed, 10 Jan 2024 at 11:30 AM

TABLE OF CONTENTS

Introduction


    Differential analysis is one of the most commonly performed analyses when interpreting omics data. Differential analysis means taking the (normalized) data and performing statistical analysis to discover quantitative changes in expression/abundance levels between experimental groups. For example, for each feature in the data we can perform statistical testing to decide whether an observed difference (change) in expression/abundance is significant or not, which means checking whether the change is greater than what would be expected due to natural random variation. Basically, differential analysis is used to determine whether there are any features that are significantly different between two groups.


    The UniApp uses linear models (limma, MAST, T-test and Wilcoxon). This statistical technique is highly flexible and can accommodate a large variety of experimental designs, correct for confounding variables, etc. The results are summarized through a volcano plot, but the full results can still be explored through an interactive table.


1 Creating a plot


As a first step of the analysis, a plot must be created by clicking on the create plot icon. This will lead you to a section where the analysis of interest can be selected.


 

In order to ensure efficient organization, a name and description must be assigned to the analysis under the appropriate fields. Subsequently, select "Variation Analysis" under "Choose algorithm to run your analysis". Press "Select Algorithm" which will lead to the second column opening on the left.


2 Selecting data

Now the required project, track and plot can be selected. Differential gene analysis is done based on the results of Data Pretreatment, make sure to select the right track element. Press "Select Data" when the required fields are filled out and the third column will open on the left.


3 Setting parameters


  • Design: here you can define between which groups you wish to perform the differential analysis. Groups are defined by metadata categories.
  • Algorithm: the type of model to use, can be Limma or MAST. They are both generalized linear models. Limma is a popular method to analyze both bulk and single cell datasets, while MAST is more accurate for zero-inflated single cell data.

3.1 Differential analysis design



In the Design tab you can choose between which metadata groups you wish to perform the differential analysis. 

  • Observation:  set metadata variable whose groups you wish to compare. Note that this is supported only for categorical variables.
  • Reference group: sets reference group of the selected metadata variable. This can be usually be the control or healthy group of your experiment. Genes expressed highly in the reference group will appear as downregulated and with a negative logFC. You can select multiple groups.
  • Experimenal group: sets experimental group of the selected metadata variable. This can be usually be the treatment or diseased group of your experiment. Genes expressed highly in the experimental group will appear as upregulated and with a positive logFC. You can select multiple groups. 
  • Covariates: the covariates to include during the modelling. The covariates can be chosen from the variables present in the metadata.
  • Scaling: In the case of metabolomics and proteomics data you will see this additional option under scaling (the following options are available: None, Auto, Center, Scale, Range, Pareto, Vast, Level). 


There must be no overlap of observations/groups between the reference and experimental groups. Additionally, you cannot select as a covariate the same group you are using to perform the comparison.


3.2 Differential analysis algorithm


4. Performing the differential gene analysis

When the parameters are all set-up, you can click on the Run button to compute the differential gene analysis. 

As soon as the reduction is computed, the plot you just created will appear in your track. You can click on the "View interactively" button to explore the results of the differential gene analysis in the interactive plot page. 

5 Differential gene analysis interactive plot page


As soon as the differential analysis is computed, a volcano plot will appear. A volcano plot shows the relationship between the p-values of a statistical test and the magnitude of the difference in expression/abundance values between the reference and experimental groups. Each dot in the plot is a feature. On the y-axis the -log10 p-values are plotted. On the x-axis the (log) fold change is plotted. By default, the blue color represents the significant features (p < 0.05), and the grey color represents the non significant features (p >= 0.05).

 

5.1 Plot parameters



You can also look at the differential analysis table by clicking on Show data table in the Visualization - Plot parameters tab. The table is interactive and sortable by clicking on column headers.
 
The columns in the table are:
Feature: the name of the feature.
Log fold change: represents the magnitude of the difference between the reference and experimental groups. If the value is positive, we say that the feature is upregulated in the experimental group (and downregulated in the reference group). If the value is negative, we say that the feature is downregulated in the experimental group (and upregulated in the reference group). It is important to note that this value actually represents the log fold change only if the data is in log-space. If the data was not log transformed, this value is just the difference between the experimental group mean and the reference group mean.
Average expression: the average expression/abundance across all observations used in the differential analysis.
P-value: the significance of the result. Usually the significance threshold is set at 0.05.
Adjust p-value: adjusted p-values calculated with the Benjamini-Hochberg procedure (false discovery rate, FDR).

5.2 Highlight features

In the Highlight features tab you can set the parameters to decide which features to highlight on the volcano plot. This can be done either by metric or by providing a custom list of features.

5.2.1 Highlight features - gene metrics


When highlighting features by metric you can select the following parameters:


Filter on adjusted p-value: toggles filtering on adjusted p-value.
p-value threshold: sets p-value threshold  for a feature to be highlighted.
Minimal absolute log fold change value: sets minimal absolute logFC value for a feature to be highlighted.
Highlight color: sets color of highlighted features. 
Background color: sets color of background features. 


5.2.2 Highlight features - custom gene list


When highlighting features by a custom gene list you can select the following parameters:

Background color: sets color of background features.
Number of custom groups: sets the number of custom groups of features. For each custom group you can provide the list of features and set the color. 
Custom gene set color: sets color for the custom gene set highlighted.
Features to highlight: here you can input a list of features to highlight, or search for a feature to select.


5.3 Marker format and color for numerical variables


In the Marker format and color tab you can change the appearance of the markers on the volcano plot. 

Marker symbol: change marker symbol.
Marker sizeadjust marker size.
Marker opacity: adjust marker opacity.

5.4 Details 


The Details tab contains additonal options for customizing your plot. 

5.4.1 Grid style


Show grid: toggles grid.
Grid widthadjusts grid width.
Grid color: changes grid color.
Border width: changes width of plot border. 

5.4.2 Title style

Title: sets plot title.
Title font size: adjusts plot title font size.
Legend position x-direction: changes plot title position on the x axis.
Legend position y-direction: changes plot title position on the y axis.

5.4.3 Plot margins

Margin bottom: sets bottom margin.
Margin leftsets left margin.
Margin right: sets right margin.
Margin top: sets top margin.
Padding: adjusts margin padding.

5.5 Axes style

Here you can edit the axis style for the x,y and z axes.

Axis labelsets axis label.
Axis padding: adjusts axis padding.
Invert axis: inverts axis.
Dimension to plot on axis: set dimesion to plot on axis. In PCA you can generate plot from different pricinple components using this option.

5.6 Export settings


Here you can prepare your plot for export.

Export formatsets plot file format.
Width of plot: adjusts plot width.
Height of plot: adjusts plot height.
File name: set file name for exported plot.


6 Differential analysis video tutorial

Expected soon

7 Useful links



Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select atleast one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article