Visualizing single nucleotide polymorphisms (SNPs) location
Single Nucleotide Polymorphism, a.k.a. SNPs, represent the most commont type of genomic variants. They consists in the switch of a single nucleotide at a specific locus in the genome, for example from adenine (A) to cytosine (C). Each SNP can have different effects on the transcription and translation machinery of one or more genes, depending on its position (e.g., intron, enhancer, exon, etc.), as well as the alteration of the amino acid sequence. The effect of a SNP can be negligible, as in the case of synonymous variants, or impactful, as in the case of frameshift and nonsense variant, and in same cases the consequence of these variants can lead to diseases.
Lolliplots allow to visualize the location of SNPs within and around a specific gene. Particularly, we provide two distint lolliplots:
- Gene lolliplot: the first type of plot focuses on the genomic region of the chosen gene. All SNPs occurring in this region are visualized, specifying their known effect on transcripts/proteins and possible clinical relevance.
- Enhancer / promoter lolliplot: the second lolliplot visualize SNPs that are located in enhancer / promoter regions surrounding the selected gene, along with transcription factors predicted to bind in those same regions.
1 Selecting the type of lolliplot
The first step of the analysis is to create a plot by clicking on the create plot icon. This will lead to a section where the algorithm of interest can be selected. In the case of lolliplots, you can select either "Gene lolliplot" and "Enhancer / promoter lolliplot". Regardless of which algorithms you select, you can customize the name and the description fields, to ensure the plots are efficiently organized.
2 Selecting data
The lolliplot creation is based on information collected from publicly available on-line repositories.
Particularly, information about SNPs is fetched from ClinVar and the Genome Aggregation Database (gnomAD). ClinVar is a large initiative sponsored by the US National Library of Medicine, which has the objective of cataloguing and characterize all variantions in the human genome. GnomAD is a initiative that aims at harmonizing sequencing data from a variety of large scale projects. Among other information, gnomAD provides frequency estimates for SNP variants.
Information on promoter / enhancer regions, as well as the transcription factors supposed to bind on these regions, is extracted from the GeneCard repository.
If the "Gene lolliplot" was selected, the user should indicate which version of ClinVar and gnomAD they desire to use, as shown below:
If the "Enhancer/promoter lolliplot" was selected, the user should indicate the Genecards version as well:
As a rule of thumb, it make sense to always select the most recent version of all datatabases, i.e., the version with the highest number. In contrast, the most common reason for selecting previous version is to fathfully reproduce past analyses.
3 Setting parameters
Once the imput databases have been selected, the Set parameters field will be displayed. If you have selected the "Gene lolliplot", then you should only specify the genes for which the plot should be created, see below. A separate plot is created for each selected gene.
Conversely, if you have selected "enhancer / promoter lolliplot", you should indicate indicate how to limit the number of transcription factors to be included in the plot.
There can be hundreds of transcription factors predicted to bind in the proximity of a gene. Moreover, each transcription factor may bind multiple times in the surrounding of a gene, at different enhancer or promoter regions. Attempting to include all of them in a single lolliplot may result in a confusing picture.
We provide two mechanism to indicate which transcription factors should be included in the enhancer / promoter lolliplot:
- Specifying the names of the transcription factors to plot. The user can directly select the genes to include in the plot out of the list of transcription factors predicted to bind in the surrounding of the gene (according to Genecards).
- Specifying the number of transcription factors to plot. In this case, the transcription factors are first ranked according to how many distinct locations in the surrounding of the gene they are predicted to bind. The top n transcription factors are then plotted, where n is the number indicated by the user.
If you are not sure how many transcription factors should be included, you can start with a low number, for example 5, and then rise this number afterwards if you deem it fit for you analysis.
4 Creating the lolliplot
When the parameters are all set-up, you can click on the Run button to complete the analysis.
As soon as the analysis is over, a new table will appear in your track.
5 Lolliplot visualization
Once the lolliplot have been created, the user can...