Gene Regulatory Network (GRN) Reconstruction

Modified on Fri, 14 Jun at 2:01 PM

TABLE OF CONTENTS

1 Identifying possible regulatory relationships
2 Creating a Plot
3 Selecting data
4 Setting parameters
5 Performing the GRN reconstrution analysis
6 Network visualization
- 6.1 Filtering on importance score
- 6.2 GRN table view

1 Identifying possible regulatory relationships

Single cell gene expression data can be used for identifying gene regulatory mechanisms. A gene regulatory mechanism (or relationship) is present whenever a gene (tipically a transcription factor) regulates the expression level of a second gene. Such regulation often happens through a protein encoded by the regulator that binds on the promoter of the regulated genes. When several of such regulatory relationships are identified, they can be summarized and visualized as a network, namely a gene regulatory network (GRN).

The GRN reconstruction module allows to identify candidate gene regulatory relationships, by analyzing cross-sectional, observational data. We refer to these relationships as "candidate" to underline that this type of analysis does not provide definitive evidence for the existence of regulatory relationship between genes. Experimental evidence (e.g., knock-out experiments) are needed in order to obtain a confirmation.

2 Creating a Plot

The first step of the analysis is to create a plot by clicking on the create plot icon.

This will lead you to the Create plot page. Here you can start by filling in the plot name and plot template.

This will lead to a section where the algorithm of interest can be selected, in this case the Gene regulatory network.

3 Selecting data

In the field "Choose track element", you can select the input for the gene regulatory network algorithm. The algorithm can accept any normalized omic data.

Using the "Select observations" button, you can choose the observations to use as input. For more information see the section on Cell/sample selection.

It is strongly advised to use normalized data, as well as to ensuring that any batch effect has been removed from the data.
- Using unnormalized data make impossible to compare expression values across genes
- Batch effects can create spurious associations between genes, as well as hide real ones, either way hindering the validity of the analysis.

4 Setting parameters

Once all input tracks have been selected the "Set parameters" field will be displayed with the following tabs: "Gene of interest", "Feature selection" and "Method"

4.1 Gene of interest

Here you can select one or multiple hub genes of you gene regulatory network. The correlations between this gene and all of its predicted interacting partners will be calculated and visualized.

4.2 Feature selection

Feature selection involves the process of choosing a subset of genes or molecular features from a larger pool of candidates for further analysis, with the goal of reducing noise, improving computational efficiency, and enhancing the interpretability of the network. There are multiple methods for feature selection in the GRN analysis, including statistical techniques (e.g. highly variable genes) and biological knowledge-based approaches. Here you can choose one of the following options:

Highly variable - includes the top n number of most highly variebly genes
Metabolic - includes genes involved metabolism.
Transcription factors - includes genes that are known transcritption factors.
Transporters - includes genes that are known membrane transporters.
Cell surface - includes genes whose products are localized on the cell surface.
Custom - includes only genes that the user selects.

4.3 Method

We use an established method, namely GRNBoost2, for inferring candidate gene regulatory relationships. In short, for each gene GRNBoost2 attempts to quantify which other genes are essential in order to predict its expression. A series of gradient boosting regression models (GBMs) are used during this process. A characteristics of GBMs is that they can capture both linear and non-linear regulatory relationships among genes. GRNBoost2 final product consists in a series of gene-gene candidate relationships, where in each relationship at least one gene was essential in order to predict the expression value of the other gene. Each relationship is accompanied by an adimensional number quantifying the strength of the association between the two genes. More information on the GRNBoost2 algorithm is available in its original publication.

While GRNBoost2 is able to estimate the strength of gene-gene regulatory relationships, it does not provide indications on whether the regulating gene enhances or suppresses the expression of the regulated gene. We use Pearson's correlation coefficients in order to distinguish between different types of relationships, with positive coefficients indicating an enhancing relationship and negative coefficients suggesting a suppression of the expression level of the regulated gene.

5 Performing the GRN reconstrution analysis

When the parameters are all set-up, you can click on the Run button to complete the analysis.

As soon as the analysis is over, a new plot will appear in your track. You can click on the "View interactively" button to explore the results of the network analysis in the interactive plot page.

6 Network visualization

The analysis results are visualized as a network, where each node represents a gene and each edge represents a regulatory relationship. The hub gene and its interacting partners are color-coded separately. Red arrows signify positive regulatory relationships (higher expression of the regulator brings higher expression for the regulated), while green arrows negative regulation (higher expression of the regulator brings lower expression for the regulated).

At the beginning only all the interacting partners of the hub genes specified in the create plot page are visualized.

You can choose how many hub genes a plotted in the "Select network hub genes" menu.

6.1 Filtering on importance score

You can filter the displayed interacting partners based on the importance score of their interactions in the "Data to show" menu.

The importance score quantifies the weight of the regulator gene in predicting the expression of the target gene. Under the hood, each regulator / target gene relationship is evaluated through Gradient Boosting Machines. This algorithm builds several models (“decision trees”) for attempting to estimate the expression of the target gene on the basis of their candidate regulators. The importance score indicates to what extent the regulator assists in decreasing the uncertainty in predicting the expression value of the target gene across all models.

Put more simply, the importance score quantifies the weight of the regulator gene in predicting the expression of the target gene. Thus, the importance score indicates to what extend the regulator assists in decreasing the uncertainty in predicting the expression value of the target gene.

These scores are adimensional and do not have a straightforward, direct interpretation. They are mainly used for ordering the candidate regulators of each gene from the most relevant (high importance score) to the least relevant (low importance score). Also, please note that the original importance scores are scaled between 0 and 1 for ease of visualization, however this transformation does not change the interpretation of the scores.

Thus, a score of 0.5 simply means that the gene is less likely to be a regulator of the target gene, at least with respect to a gene with a score of 1. Unfortunately, no further interpretation can be provided outside of this relative ranking.

6.2 GRN table view

In the table view you can explore the underlying results of the gene regulatory network.