Rule-based meta-analysis

Modified on Wed, 13 Sep, 2023 at 3:01 PM

TABLE OF CONTENTS

Introduction
1 Algorithm settings
2. Accessing Results

Introduction

An common goal of gene expression studies is to find an association between the expression of a certain gene and some biological phenomenon. These associations predicted by gene expression studies need to confirmed with expensive and laborious wet lab validation techniques, thus it is worthwhile to narrow the focus of validation efforts on only a handful of genes of interest. Usually this is done by using a threshold on some kind of ranked list of genes, like the results of a marker gene or differential gene expression analysis. This kind of approach can further be strengthened by combining multiple ranked gene lists each with its own threshold with additional criteria on gene meta-data. To this end we can use the rule-based meta-analysis in which we specify rules (i.e. thresholds) genes have to pass in multiple gene ranked lists and what kind of gene meta-data values the gene needs to have in order to be considered a gene of interest for further study.

1 Algorithm settings

1.1 Creating a plot

As a first step of the analysis, a plot must be created by clicking on the create plot icon. This will lead you to a section where the analysis of interest (in this case Rule-based meta-analysis) can be selected.

The analysis can be effectively organized by assigning a name and description in the respective fields. Subsequently under the "choose algorithm to run your analysis", gene prioritization must be selected.

1.2 Selecting data

The UniApp performs the rule-based meta-analysis by combining the results obtained from differential analysis, geneset enrichment analysis, variation analysis, text mining, patent mining (Section [link]), cluster marker genes analysis, gene association analysis or results of other rank-based meta analyses. In addition to rank-based results the rule-based meta-analysis can also use gene meta-data databases as input.

1.3 Setting Parameters

In this example, parameters need to be set for all the different analysis to be executed:

DGEA: Differential gene expression analysis
Marker genes: cluster marker genes
Variation analysis: based on Scent algorithm
Geneset enrichment analysis
Gene prio: gene prioritization
Text mining mentions: the number of times the genes were mentioned in the literature
Databases | Genecards: selecting specific metadata for the genes using the Genecards database

1.3.1 DGEA - Differential gene expression analysis

For DGEA ranked list the user can select:
- Name: name of the plot
- Absolute: p-value Rank and log2FC cutoffs
- Rank: p-value and Rank cutoffs
- Number or Percentage of top genes based on p-value and log2FC criteria
- MoSCoW Criteria: select whether it is a Must have, Should have, Could have, Won't have

1.3.2 Marker genes

For marker gene ranked list the user can select the following:

Which clusters to include under select columns
Rank (top number) or top percentage of genes criteria for all clusters together
Or set separate criteria for individual clusters
MoSCoW Criteria: select whether it is a Must have, Should have, Could have, Won't have

1.3.3 Variation analysis

For Variation analysis ranked list the user can select the following:

Rank cutoff
Number or Percentage of top genes based on the analysis results
Invert rank specifying wheter the ranks should be inverted
MoSCoW Criteria: select whether it is a Must have, Should have, Could have, Won't have

1.3.4 Geneset enrichment analysis

For Geneset enrichment analysis ranked list, the user can select the following:

Absolute Min and Max p-value cutoff
Absolute Min and Max log2FC cutoff
Absolute MoSCoW Criteria: select whether it is a Must have, Should have, Could have, Won't have
Rank Top p-value and Top log2FC values of genes based on the analysis results

1.3.3 Text mining mentions

For text mining ranked lists the user can select the rank (top # or top % of genes) criteria. TBD

1.3.4 Gene prioritization

For gene prioritization ranked list, the user can select:

Absolute: user can select the minimum and maximum cutoff values for the AUC and score
Rank: top number/percentage of genes based on AUC and score

MoSCoW Criteria: select whether it is a Must have, Should have, Could have, Won't have

1.3.4 Databases | Gene cards

For databases like GeneCards, the user defines if the query gene has appropriate gene metadata values as described in the selected database.

When setting up the rule-based you can set the rules for the gene metadata table via a gene selection pivot table. You can use it to make a selection of your genes of interest based on if they satisfy gene metadata criteria. You can make selection for both categorical and numerical variables. The resulting gene selection table will display all your genes of interest and wheter they pass your selection criteria.

You can finalize your gene selection by clicking the Finalize gene selection button in the the Finalize gene selection tab. Gene which pass your selection criteria have the "True" value, while those that don't have "False" value.

2. Accessing Results

Once you have performed the rule-based meta-analysis it will appear as an interactive plot in your track. Click on the View interactively button to interactively explore the results of the rule-based meta-analysis via a pivot table. Addiotionally you can also visualize and assess the similarity of sets of features that have passed the thresholds in the rule-based meta-analysis by with a Ven diagram, upset plot or Jaccard similarity.

2.1 Pivot table

This will be the initial setup of the pivot table:

The Name variable has the names of the specific rules you have set. The ThreshholdTest variable indicates what kind of analysis type served a basis for the rule. The Moscow variable indicates which Moscow criteria was applied to each of the rules.

You can drag and drop the the labels into the pivot table to set it up, like so:

From the Select input menu, you can select genes whose results you would like to examine in the pivot table. For example we can search for the ABCA5 gene:

Once selected you can see the gene is added to the pivot table as a new variable. Drag it into the pivot table to see if ABCA5 passed the set rules:

The gene ABCA5 has only passed the gene prioritization rule. You can customize this pivot table further by adding more gene, subsetting the rules or by visualizing results in different ways.

2.2 Venn diagram

To examine the set similarity of features that have passed thresholds in the rule-based meta-analysis as a Venn diagram click on the Venn diagram button. From Select input tab, Datasets to plot menu you can select which datasets to plot on the Venn diagram. For each of the datasets only genes that have passed the set threshold criteria will be included in the Venn diagram.

The Venn diagram is only practical up to a certain number of input sets. If you exceed this number a message will be displayed informing you to reduce the number of input sets.

2.3 UpSet plot

To examine the set similarity of features that have passed thresholds in the rule-based meta-analysis as a UpSet plot click on the UpSet plot button. From Select input tab, Datasets to plot menu you can select which datasets to plot on the UpSet plot. For each of the datasets only genes that have passed the set threshold criteria will be included in the UpSet plot.

The UpSet plot is only practical up to a certain number of input sets. If you exceed this number a message will be displayed informing you to reduce the number of input sets.

The UpSet plot shows intersections in a matrix, with the rows of the matrix corresponding to the sets, and the columns to the intersections between these sets, like in the example below. The size of these sets are represented as bar graphs. For each set, the cells that are part of an intersection are filled in if the set is part of the intersection. If there are multiple filled-in cells, they are connected with a line, this denotes the intersect of multiple sets.

When interpreting the UpSet plot it is important to remember that the intersection size is calculated using the "distinct" mode. See image below:

2.4 Jaccard similarity

Jaccard similarity coefficient is used to examine the similarity of sets. To examine the set similarity of features that have passed thresholds in the rule-based meta-analysis using Jaccard similarity click on the Jaccard similarity plot button. From Select input tab, Datasets to plot menu you can select which datasets to plot on the Jaccard similarity plot. For each of the datasets only genes that have passed the set threshold criteria will be included in the Jaccard similarity plot.

The plot is generated in the following way. First the pair-wise Jaccard similarity coefficients are calculated for the sets specified as input. Then a principle component analysis is performed using the table of pair-wise Jaccard similarity coefficients as input. Finally a 3D PCA plot is rendered using the first three principle components. Each set is represented as a dot on the plot and can be color-coded for groups. Sets (i.e. dots) that are closer together have higher set similarity.