TABLE OF CONTENTS
1 Identifying possible regulatory relationships
Single cell gene expression data can be used for identifying gene regulatory mechanisms. A gene regulatory mechanism (or relationship) is present whenever a gene (tipically a transcription factor) regulates the expression level of a second gene. Such regulation often happens through a protein encoded by the regulator that binds on the promoter of the regulated genes. When several of such regulatory relationships are identified, they can be summarized and visualized as a network, namely a gene regulatory network (GRN).
The GRN reconstruction module allows to identify candidate gene regulatory relationships, by analyzing cross-sectional, observational data. We refer to these relationships as "candidate" to underline that this type of analysis does not provide definitive evidence for the existence of regulatory relationship between genes. Experimental evidence (e.g., knock-out experiments) are needed in order to obtain a confirmation.
2 Creating a Plot
The first step of the analysis is to create a plot by clicking on the create plot icon. This will lead to a section where the algorithm of interest can be selected, in this case the Gene regulatory network.
3 Selecting data
In the field "Choose track element", you can select the input for the gene regulatory network algorithm. The algorithm only accepts normalized scRNA-seq data.
Using the "Select observations" button, you can choose the observations to use as input. For more information see the section on Cell/sample selection.
It is strongly advised to use normalized data, as well as to ensuring that any batch effect has been removed from the data.
- Using unnormalized data make impossible to compare expression values across genes
- Batch effects can create spurious associations between genes, as well as hide real ones, either way hindering the validity of the analysis.
4 Setting parameters
Once all input tracks have been selected the "Set parameters" field will be displayed with the following tabs: "Gene of interest" and "Method"
4.1 Gene of interest
Here you can select the a hub gene of you gene regulatory network. The correlations between this gene and all of its predicted interacting partners will be calculated and visualized. The correlations between the interacting partners themselves will not be considered.
4.2 Feature selection
Feature selection involves the process of choosing a subset of genes or molecular features from a larger pool of candidates for further analysis, with the goal of reducing noise, improving computational efficiency, and enhancing the interpretability of the network. There are multiple methods for feature selection in the GRN analysis, including statistical techniques (e.g. highly variable genes) and biological knowledge-based approaches. Here you can choose one of the following options:
- Highly variable - includes the top n number of most highly variebly genes
- Metabolic - includes genes involved metabolism.
- Transcription factors - includes genes that are known transcritption factors.
- Transporters - includes genes that are known membrane transporters.
- Cell surface - includes genes whose products are localized on the cell surface.
- Custom - includes only genes that the user selects.
We use an established method, namely GRNBoost2, for inferring candidate gene regulatory relationships. In short, for each gene GRNBoost2 attempts to quantify which other genes are essential in order to predict its expression. A series of gradient boosting regression models (GBMs) are used during this process. A characteristics of GBMs is that they can capture both linear and non-linear regulatory relationships among genes. GRNBoost2 final product consists in a series of gene-gene candidate relationships, where in each relationship at least one gene was essential in order to predict the expression value of the other gene. Each relationship is accompanied by an adimensional number quantifying the strength of the association between the two genes. More information on the GRNBoost2 algorithm is available in its original publication.
While GRNBoost2 is able to estimate the strength of gene-gene regulatory relationships, it does not provide indications on whether the regulating gene enhances or suppresses the expression of the regulated gene. We use Pearson's correlation coefficients in order to distinguish between different types of relationships, with positive coefficients indicating an enhancing relationship and negative coefficients suggesting a suppression of the expression level of the regulated gene.
5 Performing the GRN reconstrution analysis
When the parameters are all set-up, you can click on the Run button to complete the analysis.
As soon as the analysis is over, a new plot will appear in your track. You can click on the "View interactively" button to explore the results of the network analysis in the interactive plot page.
6 Network visualization
The analysis results are visualized as a network, where each node represents a gene and each edge represents a regulatory relationship. The hub gene and its interacting partners are color-coded separately. Red arrows signify positive regulatory relationships (higher expression of the regulator brings higher expression for the regulated), while green arrows negative regulation (higher expression of the regulator brings lower expression for the regulated).
At the beginning only all the interacting partners of the hub gene are visualized. You can filter these based on the importance score (i.e. strength) of their interactions in the "Data to show" menu.