TABLE OF CONTENTS
In the process of analyzing and understanding the data, a crucial step is to characterize the biological states present in the data. For example, in single cell RNA-seq datasets, the objective is often to find and characterize the true biological states (clusters) of the data. The Clustering analysis in combination with the Brute force analysis can be used to find these biological states. To characterize the biological states, we developed the Marker set analysis. The output of the Marker gene analysis is a list of the most important features for each biological state.
The Marker gene analysis uses a three-step approach to obtain the ranked marker feature lists for each biological state:
- It performs a pair-wise differential analysis of all states against all other states separately and ranks the results of each pair-wise comparison by log fold change.
- The most upregulated features receives the lowest rank number (top ranking marker features) and the most downregulated features received the highest rank number (this can be inverted based on whether you want to obtain the most upregulated features or the most downregulated features).
- For each state, the rank numbers of all features are combined through a meta-analysis. Learn more about marker gene meta-analysis in the section on Useful concepts.
The marker gene analysis can be computed on both the original features (e.g. genes) and the engineered features (e.g. pathways).
1 Creating a Plot
The first step of the analysis is to create a plot by clicking on the create plot icon. This will lead to a section where the algorithm of interest can be selected, in this case Cluster Marker Gene Analysis.
To ensure the plots are efficiently organized, a name and description must be assigned to the analysis under the appropriate fields. Under the "Choose algorithm to run your analysis", "Marker Gene Analysis" must be selected.
2 Selecting data
In the field "Choose track element", input can be Normal or Engineered. For more information about the data to use as input, see section on Useful concepts.
Using the "Select Cells" button, you can choose the observations to use as input. For more information see the section on Cell/sample selection.
3 Setting parameters
Once all input tracks have been selected, design and algorithm parameters can be specified.
3.1 Setting parameters - Design
Here you can define for which groups you wish to perform the marker gene analysis. Groups are defined by metadata categories. You can choose between which metadata groups you wish to perform the marker gene analysis.
- Observation: set metadata variable whose groups you wish to compare. Note that this is supported only for categorical variables.
- Covariates: the covariates to include during the modelling. The covariates can be chosen from the variables present in the metadata.
In the "Observation" drop down menu you can only select categorical metadata variables. However, categorical variables that have 400 or more unique values will not be available as calculating the pairwise differntial analyses necessary to perform the marker gene analysis would take a very long time. Thus this limimitation is in place to ensure optimal perfomance and run time.
3.2 Setting parameters - Algorithm
Select the of marker list to obtain from the Type dropdown menu. It can be either:
- Unique (upregulated) and Unique (downregulated) should be used to obtain the set of markers that are uniquely upregulated or downregulated (respectively) in each biological state.
- Enriched should be used to obtain the (non-unique) set of markers that are consistently upregulated in each biological state.
- With Custom, you can define which type of marker list you would like to obtain. If the type of marker list to obtain is set to Custom, you need to set the following parameters:
- Metric upper limit: for each biological state, the features that have a log fold change above this number (when compared to at least another biological state) will be filtered out.
- Metric lower limit: for each biological state, the features that have a log fold change below this number (when compared to at least another biological state) will be filtered out.
- Invert rank: by default, the features are ranked from the most upregulated (rank 1) to the most downregulated (rank n). To do the opposite, use this option.
- These three parameters are actually used to defined to the Unique (upregulated), Unique (downregulated) and Enriched:
- Unique (upregulated): Metric upper 100, Metric lower 0, Invert rank - no.
- Unique (downregulated) : Metric upper 0, Metric lower -100, Invert rank - yes.
- Unique (enriched): Metric upper 100, Metric lower -100, Invert rank - no.
4 Performing the marker gene analysis
When the parameters are all set-up, you can click on the Run button to compute the marker gene analysis.
As soon as the marker set analysis is computed, the marker gene table you just created will appear in your track. The table is interactive and sortable. You can click on the "View interactively" button to explore the results of the marker gene analysis in the interactive plot page.
The columns in the table are:
- Rank: the rank for the features on that row.
- All other columns: the groups you chose to include in the marker set analysis.
Results of the marker gene analysis can be further visualized with a feature heatmap, by passing the top 5 or 10 features for example. Additionally, the marker gene lists can make great supplementary tables in your research publication.
5 Marker gene analysis results customization
In the visualization box, you can customize the marker gene table in many ways.
5.1 Marker gene analysis results selection
After you click on Table settings, you can define the following parameters:
- Ranking method: the meta-analysis aggregation method to use to obtain the ranks. The choice is between Product, Sum, Median, Fisher, Range, Standard deviation, Variance and Absolute. Learn more about these ranking methods in the section on Useful concepts.
5.2 Marker gene analysis results export
In the export tab you can export your marker gene analysis results
6 Useful Links