Gene expression entropy

Modified on Wed, 13 Sep, 2023 at 2:26 PM

TABLE OF CONTENTS

Introduction

Gene variation analysis is used to measure the restrictiveness or ubiquity of gene expression within a dataset. For example, often ideal candidate genes for drug discovery are desired to be restrictive in their expression to increase precision and avoid side effects.


The algorithm utilizes a statistical concept known as entropy. Entropy denotes the ‘surprise’ in the possible outcomes of a variable. The more deterministic the outcome, the less ‘surprise’. For example, a gene whose expression is equal in all of the cells (e.g. housekeeping genes) has little ‘surprise’ and low entropy score since its expression pattern is always the same. Contrarily, a cell type specific gene has very restricted expression making its expression harder to predict and hence more ‘surprising’, increasing its entropy score.


Variation analysis returns a list of features (e.g. gene names, cell identifiers) and associated heterogeneity/homogeneity scores, sorted in a descending order. High heterogeneity score denotes high entropy (eg. genes with restricted expression) while high homogeneity score denotes low entropy.


1 Performing the Analysis


1.1 Creating a plot


As a first step of the analysis, a plot must be created by clicking on the create plot icon. This will lead you to a section where the analysis of interest (in this case gene prioritisation analysis) can be selected.


In order to ensure efficient organisation, a name and description must be assigned to the analysis under the appropriate fields. Subsequently, select "Variation Analysis" under "Choose algorithm to run your analysis". Press "Select Algorithm" which will lead to the second column opening on the left.


1.2 Selecting data

Now the required project, track and plot can be selected. Variation analysis is done based on the results of Data Pretreatment, make sure to select the right track element. Press "Select Data" when the required fields are filled out and the third column will open on the left.


1.3 Selecting parameters

Press on "Gene variation analysis type" and the available parameters will be visible. There are two options: "Heterogeneity" and "Homogeneity". Selecting "Heterogeneity" will return a list of features where the top ranking features are the most heterogenous or in other words the most variably expressed. Selecting "Homogeneity" will return a list where top ranking features have the highest homogeneity score meaning they have a low entropy score and are expressed homogenously.



1.4 Running the analysis



Once the parameters are selected, press "Run" at the top right corner of the window. You will be automatically returned to the list of plots and a new placeholder for your analysis will be visible.


2. Accessing the results of the analysis



After running the analysis your "Plots" folder should look like the snippet above containing the new plot (named "Variation Analysis" in this example). Find your new plot and press "Select". 



The selected analysis will appear at the top of the list and a new option "View Interactively" will be available. Press on "View Interactively" and wait for your results to load.


3. Interactive plot


Once the results have loaded, a list of features and their corresponding entropy score will appear. To see export options, press on "Export" tab. The table can be exported in the CSV, TSV or XLSX format.


4. Useful Links


Example uses of entropy-based variation analysis:

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article