TABLE OF CONTENTS
- 1 Creating a plot
- 2 Selecting data
- 3 Setting parameters
- 4 Performing the trajectory inference
- 5 Trajectory inference interactive plot page and settings
- 6 Useful links
In many processes, cells change continuously throughout time. For example, a cell could change from one cell type to another cell type, or a young and immature cell could evolve into a specialized cell (differentiation processes). Ideally, the expression levels of each individual cell should be tracked to see how it changes over time. The problem is that cells are destroyed (lysed) during the sequencing process where the RNA is extracted, making it impossible to track the expression profile of each individual cell throughout time. Instead, we would need to sample at multiple time-points and obtain snapshots of the gene expression profiles of the different cell types.
Taking different snapshots at different time points could be time-consuming and expensive. But if we think about it, when we take a single snapshot, the different cells will be at different time points in their life: this means that we could reconstruct the expression trajectories of the cells by using just one snapshot, since each snapshot may contain cells at varying points along the developmental progression. Trajectory inference algorithms are statistical methods to order the cells along one or more trajectories which represent the underlying developmental trajectories. This ordering is referred as pseudotime.
Differentiation trajectories that can be modelled be pseudotime be cyclical, linear, bifurcating, multifurcating or in the form of a tree, connected or disconneteced graph.
While there are trajectory inference algorithms capable of infering all mentioned trajectories, we propose that each complex trajectory should be deconstructed into multiple linear trajectories for a more comprehensive understanding of the underlying biological processes.
The UniApp enables you to perform linear pseudotime analysis (SCORPIUS), in which the computed trajectory will be linear. With this methods, you will be able to find out how the cells are ordered throughout time, and you will also be able to see how a certain features (e.g. genes) are expressing themselves over time.
These pseudotime algorithms always find a differentiation trajectory, even when the trajectory itself does not make sense at all. Before trying to compute a pseudotime trajectory, you have to think about the biological processes at play in the data, and if it makes sense for these biological process to have a differentiation trajectory between them.
Scenario: In your scRNA-seq dataset of the bone marrow you have identified three populations of interest based on canonical marker genes. These are the hematopoietic stem cells, common myeloid progenitor cells and common lymphoid progenitor cells. You have observed that there is a phenotypic continuum on the dimension reduction plot which hints that there is a bifurcating differentiation trajectory occuring from the stem cells to the common myeloid progenitor cells and common lymphoid progenitor cells. You would now like to know what are the genes that are driving this differentiation trajectory. To answer this question use can use the trajectory inference module to first examine each of these two trajectories separately, first the trajectory from stem cells to the common myeloid progenitor cells and secondly from stem cells to the common lympoid progenitor cells.
As a first step of the analysis, a plot must be created by clicking on the create plot icon in your analysis track. This will lead to the create plot page. Firstly we should enter the plot name and filling in the plot template to provide the proper context for performing this analysis:
Next you can choose the "Trajectory inference" algorithm from the "Choose algorithm to run your analysis".
The trajectory inference algorithm only excepts normalized scRNA-seq data pretreatment analysis steps as input. In the menu "Choose a project" you can choose from which project to select input. From tge "Choose track element", the input data can be selected. To confirm your selection click on the "Select track element" button. Observations can be selected via the "Select observations" button.
The trajectory inference algorithm only accepts normalized scRNA-seq data as input.
- PCA dimensions: the components (dimensions) to calculate during the principal component analysis (PCA) step.
- K: the k parameter for the k-means clustering step.
- Stretch factor: a stretch factor for the endpoints of the trajectory curve, allowing the curve to grow to avoid bunching at the end.
- Random seed: since the algorithm has some stochastic elements, the same random seed must be used to reproduce the same results when using the same parameters.
These algortihms do not infer the trajectory direction, but only the trajectory itself.
The trajectory inference interactive plot page can display three types of plots: trajectory plot, expression line plot and expression dot plot.
The trajectory plot shows you the inferred pseudotime trajectory. Each dot on the plot is a cell and the black line is the infered trajectory.
Once the trajectory inference result is available, you can color code the trajectory plot by any feature or variable you want. To color code the plot by a metadata variable, you need to select Metadata, and then select the variable you want to visualize. To color code the plot by a feature (e.g. a gene), you need to select Original feature, and then select the feature of interest.
The trajectory plot can be customized in different ways. Clicking on "Visualization settings" enables you to change the structure of the plot (dot size, grid, etc.), and the color coding of the clusters.
Bothe te expression line and dot plots have the same way of selecting input. If you want to visualize a feature, you need to select "Original feature", and then select the feature of interest.
The expression plot can be color coded by any metadata variable you want. To color code the plot by a metadata variable, you need to select Metadata, and then select the variable you want to visualize.
Numeric metadata variables cannot be used to color code the expression plot.
The parameters exclusive to the expression dot plot are:
- Dots size: how big/small the dots representing the original data will be on the plot.
The parameters exclusive to the smoothed line plot are:
- Color: the color scheme to use to color code the clusters.
- Smoothed expression line width: how thick the smoothed expression line should be.
- Show original size: whether or not to show the original (non-smoothed) data in the background.
- Dots size: if you decide to show the original data, this parameter defines how big/small the dots representing the original data will be on the plot.
The parameters common to both the expression line and expression dot plot are:
- Quantilize groups: smooth the groups together to obtain a clearer line plot. More on this in the next subsection.
- Invert pseudotime: whether or not to invert the pseudotime axis.
- Scale expression (0 to 100): whether or not to scale the expression from 0 to 100 (where 0 is mapped to the original minimum value, and 100 to the maximum original value).
- Hide legend: whether to hide the legend or not.
- Hide grid: whether to hide the grid or not.
This option should only be used if the groups are showing themselves up clearly in succession (for example, when group 1 is in the first part of the plot, group 2 in the middle, and group 3 in the last part). If that is not the case (i.e. random distribution of the groups), this option must not be used, since it would produce nonsensical results. Remember that this option is mainly used to improve the visualization of the data: if the data after quantilization shows something that is completely different from the data before the quantilization, then this option must not be used.
To easily find patterns in the expression data, we use the LOESS regression to smooth the original data into a line.
LOESS regression is a nonparametric technique that uses local weighted regression to fit a smooth curve through data points. The procedure originated as LOWESS (LOcally WEighted Scatterplot Smoother). LOESS is based on the idea that any function can be well approximated in a small neighborhood by a low-order polynomial. LOESS can be useful for fitting a line to data points where there are noisy data values and sparse data points, and can reveal trends in data that might be difficult to model with a parametric curves (like linear regression).
The main idea of LOESS is to iteratively fit a low-degree polynomial to a subset of the data, for each point in the dataset. The polynomial is fit using weighted least squares, giving more weight to points near the point whose response is being estimated and less weight to points further away. The LOESS fit is completed after the regression function values have been computed for each of the n data points.
The low-degree polynomials fit to each subset of the data are almost always of first (local linear regression) or second degree (local polynomial fits). Using a zero degree polynomial (Nadarya-Watson estimator, local constant fitting) turns LOESS into a weighted moving average. Such a simple local model might work well for some situations, but may not always approximate the underlying function well enough.
To decide how the regression should be performed, you can to set the following parameters:
- Regression model: the LOESS regression model to use. You can choose between Local linear regression (linear fit), Local polynomial fit (polynomial fit) and Nadaraya-Watson estimator (local constant fitting).
- Regression span: it indicates how much data you want to use to perform the local regression at each iteration. 0.75 means that 75% of the data is used at each iteration.
Was this article helpful?
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
We appreciate your effort and will try to fix the article