Gene association analysis

Modified on Mon, 16 Oct 2023 at 01:33 PM

TABLE OF CONTENTS

Quantify the association between a gene of interest and all other genes


During the analysis of (single cell) gene expression data we may be interested in quantifying correlations between genes. A strong correlation may indicate the presence of a regulatory mechanism between genes, or that the two genes respond similarly to the same stimulus. Either way, a large positive (or negative) association between genes may point to a possible interaction that can be interesting to study further.


The gene association analysis module allows to compute the association between a specific gene (gene of interest or GOI), and all other genes. Associations can be quantified across all samples at once, or separately for different subgroups of samples. In the latter case, association values across subgroups can be combined with meta-analysis for obtaining a single association value for each gene. 


We use the expression "gene association analysis" to describe this analysis in the context of a concrete example. However, this module can be used for estimating associations between any type of omics measurements, including metabolomics, proteomics, etc. 


1 Creating a Plot


The first step of the analysis is to create a plot by clicking on the create plot icon. This will lead to a section where the algorithm of interest can be selected, in this case gene association analysis. 

 


To ensure the plots are efficiently organized, a name and description must be assigned to the analysis under the appropriate fields. Under the "Choose algorithm to run your analysis", "gene association analysis" must be selected.


2 Selecting data 


In the field "Choose track element", input can be Normal or Engineered matrixFor more information about the data to use as input, see section on Useful concepts.  

Using the "Select Cells" button, you can choose the observations to use as input. For more information see the section on Cell/sample selection.   



It is strongly advised to use normalized and scaled data, as well as to ensuring that any batch effect has been removed from the data. 
- Using unnormalized or unscaled data make impossible to compare association values across genes
- Batch effects can create spurious associations, as well as hide real ones, either way hindering the validity of the analysis


3 Setting parameters 

Once all input tracks have been selected the Set parameters field will be displayed with the following tabs: design, feature of interest and association type.

 

3.1 Setting parameters - Design 


Here you can define whether the gene association analysis should be performed separately for subgroups of samples, or on all the samples all together. For working with all the samples, leave "Design" empty. Otherwise, choose a categorical column in your metadata. Samples will be partitioned according to the groups defined by the column you chose.


3.2 Setting parameters - Feature of interest


Feature of interest: this is the gene we are interested to contrast against all other genes. The analysis will compute the correlation between your gene of interest and each other genes present in the dataset.


3.3 Univariate association


Univariate correlation can be computed with the classical formula of the Pearson's correlation coefficient:

Here "Gi" represents the expression level of the gene of interest in sample i, while "gi" is the expression of the other gene in sample i.  


Spearman's correlation uses exactly the same formula, however expression values are replaced with their respective ranks.


3.3.1 Correlation type - Pearson vs Spearman 


In case the user selects to perform a univariate analysis, it is also possible to specify whether Pearson's or Spearman's formula should be used.



4 Performing the gene association analysis


When the parameters are all set-up, you can click on the Run button to complete the analysis. 

As soon as the analysis is over, a new table will appear in your track. 

By default the genes are ranked according to their score, from most positive to most negative as input for downstream analyses like the GSEA, rank-based meta-analysis etc.

5. Visualization

You visualize the results as a table. The table is interactive and sortable. The columns in the table are:

  • Ranks: the rank of the genes
  • All other columns: ranked listed of genes in each group based on association coefficients



Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select atleast one of the reasons

Feedback sent

We appreciate your effort and will try to fix the article