Gene Regulatory Network (GRN) Reconstruction

Modified on Sun, 07 May 2023 at 07:44 PM

Identifying possible regulatory relationships


Single cell gene expression data can be used for identifying gene regulatory mechanisms. A gene regulatory mechanism (or relationship) is present whenever a gene (tipically a transcription factor) regulates the expression level of a second gene. Such regulation often happens through a protein encoded by the regulator that binds on the promoter of the regulated genes. When several of such regulatory relationships are identified, they can be summarized and visualized as a network, namely a gene regulatory network (GRN).


The GRN reconstruction module allows to identify candidate gene regulatory relationships, by analyzing cross-sectional, observational data. We refer to these relationships as "candidate" to underline that this type of analysis does not provide definitive evidence for the existence of regulatory relationship between genes. Experimental evidence (e.g., knock-out experiments) are needed in order to obtain a confirmation.


1 Creating a Plot


The first step of the analysis is to create a plot by clicking on the create plot icon. This will lead to a section where the algorithm of interest can be selected, in this case Network analysis. 

 


To ensure the plots are efficiently organized, a name and description must be assigned to the analysis under the appropriate fields. Under the "Choose algorithm to run your analysis", you should select "GRN Reconstruction (GRNBoost2)".


2 Selecting data 


In the field "Choose track element", input can be Normal or Engineered matrixFor more information about the data to use as input, see section on Useful concepts.  

Using the "Select Cells" button, you can choose the observations to use as input. For more information see the section on Cell/sample selection.   



It is strongly advised to use normalized data, as well as to ensuring that any batch effect has been removed from the data. 
- Using unnormalized data make impossible to compare expression values across genes
- Batch effects can create spurious associations between genes, as well as hide real ones, either way hindering the validity of the analysis.


3 Setting parameters 

Once all input tracks have been selected the Set parameters field will be displayed with the following tabs: number of highly variable genes and genes to visualize.

 

3.1 Setting parameters - Number of highly variable genes 

GRN reconstruction algorithms are computationally demanding. In particular, computational time scales approximately quadratically with the number of considered genes. For example, if analyzing 500 genes requires 100 seconds on your data, analyzing double the number of genes (1000) will take four times longer (400 seconds). In order to keep computational times within an acceptable range, it is advisable to include only the most variable genes within the analysis. The ratio behind this strategy is that genes with a nearly constant expression are unlikely to be involved in any detectable regulatory relationship. 


If you are not sure how many genes you should include at first in your analysis, you can start with a low number, e.g., 500. You can then raise this number afterwards if you deem it fit for you analysis.



3.2 Setting parameters - Genes to visualize

Even with a relatively low number of genes (e.g., 500), it may be difficult to discern any relevant pattern if the network is visualized in its whole all at once. Here the user can select a few genes around which the visualization of the network will be built. This means that the selected genes will be visualized, along with any other gene which regulates them or that it is directly regulated by them. In this way it is easier to inspect the regulatory mechanisms that involve the selected genes of interest. Leave this field blank for visualizing the whole network. 


4 Methodology


We use an established method, namely GRNBoost2, for inferring candidate gene regulatory relationships. In short, for each gene GRNBoost2 attempts to quantify which other genes are essential in order to predict its expression. A series of gradient boosting regression models (GBMs) are used during this process. A characteristics of GBMs is that they can capture both linear and non-linear regulatory relationships among genes. GRNBoost2 final product consists in a series of gene-gene candidate relationships, where in each relationship at least one gene was essential in order to predict the expression value of the other gene. Each relationship is accompanied by an adimensional number quantifying the strength of the association between the two genes. More information on the GRNBoost2 algorithm is available in its original publication.


While GRNBoost2 is able to estimate the strength of gene-gene regulatory relationships, it does not provide indications on whether the regulating gene enhances or suppresses the expression of the regulated gene. We use Pearson's correlation coefficients in order to distinguish between different types of relationships, with positive coefficients indicating an enhancing relationship and negative coefficients suggesting a suppression of the expression level of the regulated gene.


5 Performing the GRN reconstrution analysis


When the parameters are all set-up, you can click on the Run button to complete the analysis. 

As soon as the analysis is over, a new table will appear in your track. You can click on the "View interactively" button to explore the results of the network analysis in the interactive plot page. 

6 Network visualization

The analysis results are visualized as a network, where each node represents a gene and each edge represents a regulatory relationship. Red arrows signify positive regulatory relationships (higher expression of the regulator brings higher expression for the regulated), while green arrows negative regulation (higher expression of the regulator brings lower expression for the regulated).


At the beginning only the genes indicated in the "Genes to visualize" field of the "Set parameters" panel are visualized, along with the other genes directly connected to them. Furthermore, for each selected genes only the strongest relationships are visualized. The user can modify both (a) the genes to visualize and (b) the maximum number of edges to visualize. Below the same network is shown after restricting the maximum number of edges to visualize to 5.


Clicking on one of the nodes open the corresponding GeneCards page, as shown below for the VWA1 gene.



Finally, clicking on an edge open the StringDB page corresponding to the known interactions between the genes connected by the edge. Below is the page corresponding to the edge connecting ENO1 and HIF1A.

 


Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select atleast one of the reasons

Feedback sent

We appreciate your effort and will try to fix the article