TABLE OF CONTENTS
- Navigation
- 1. Dataset name
- 2. Creating a new experiment and data upload - Analysis technology
- 3. Upload data matrices
- 4. Creating a new experiment and data upload - Annotate your files
- 5. Setting parameters
- 6. Completing dataset upload
- 7. Examples of datasets for each technology type
NOTICE: All data is expected to be clear of sensitive information that should not be distributed such as patient names, home addresses, or other (direct/indirect) traceable personal data.
NOTICE: after clicking proceed to data staging, it is at this moment prudent to stay by the computer since the upload process will have start over from the beginning if the application goes into slumber mode. We are currently working on improving this step and will update our customers on when this will be implemented
Navigation
In this section, we will explain how to start your journey with UniApp. This is one of the most crucial steps of your data analysis, as the correctness of your data upload will dictate how your downstream analyses will go. Double checking and the four eyes principle are warranted. In the first step, the data and metadata of your experiment should be uploaded from your device by clicking on Upload dataset.
This will bring you to a wizard that helps you to upload and annotate your dataset
1. Dataset name
In the first step, you have to provide your dataset name. You can choose any name, but it is a good practice to have a unique and informative name.
2. Creating a new experiment and data upload - Analysis technology
In the second step, the technology used to generate the experiment's data must be stated.
In case you are uploading data that needs to be processed by our consulting team, select 'Upload to Unicle consulting'.
3. Upload data matrices
The data matrix and metadata need to be uploaded in the second column. If you upload data to our consulting team, please upload one compressed/zipped file as this will decrease upload time.
How to compress files on your computer: select all files that you would like to upload, right click for the context menu and select 'Compressed (zipped) folder). This will create a compressed folder with all your selected files. Alternatively, you can use software such as https://www.7-zip.org/ .
For all other uploads you have to upload exactly two files: data matrix and metadata. But first, ensure that your data matrix and metadata are formatted correctly.
3.1 Basic data matrix format (bulk, scRNAseq, microarray, proteomics)
The data matrix must be in the following form:
Feature | Observation 1 | Observation 2 | Observation 3 | Observation 4 |
---|---|---|---|---|
Feature 1 | 0 | 2 | 0 | 56 |
Feature 2 | 20 | 12 | 20 | 0 |
Feature 3 | 0 | 25 | 31 | 15 |
Feature 4 | 7 | 32 | 7 | 40 |
Feature 5 | 6 | 0 | 6 | 0 |
Feature 6 | 7 | 0 | 7 | 17 |
The features and observations should be unique (if there are duplicates, UniApp will take care of that, but it is not recommended to have duplicated names). Empty and missing values are not allowed but for metabolomics and proteomics and in this case they should be indicated with: NA. The data must be uploaded as a .csv or a .txt file. Any other formats are not supported. The first column of the data is dedicated to the feature IDs, while all the other columns are dedicated to the expression/abundance of each feature in each sample. The features can be your genes, metabolites, or protein IDs (or names), while the observations are your sample names or cell names.
The expression/abundance of each feature must be in plain numeric format (using the scientific notation is not allowed).
The metadata should be in the following form:
Observation | Condition | Batch |
---|---|---|
Observation 1 | Control | 1 |
Observation 2 | Control | 2 |
Observation 3 | Treatment | 1 |
Observation 4 | Treatment | 2 |
As for the data, the metadata can be uploaded as a CSV or a TXT file. Any other formats are not supported (like the Excel format or raw data files). In case of the example presented above only the Condition and Batch columns are present however, your metadata can have many columns (e.g. clinical data). The first column of the metadata is dedicated to the observation names, while all the other columns are dedicated to any relevant information associated with the observations (groups, progress-free survival, clusters, etc.). The more columns containing relevant information, the better.
Check if your data and metadata match before uploading.
The observation names in the metadata must match with the observation names in the data file. If the observations do not match, the data file will be used as the ground truth to generate a metadata file that is consistent with the data. These means that observations that are in the metadata but not in the data will be removed, and observations that are in the data but not in the metadata will be added (with empty entries).
After you have selected your data matrices, click upload files.
When working with large files, making a compressed/zipped csv or txt (.csv.zip/.txt.zip) file for upload will result in shorter uploading times.
3.2 10X data (scRNAseq)
10X data can be uploaded directly to the UniApp for single cell RNA data. The following files need to be uploaded to the UniApp:
- mtx file
- gene file
- barecode file
- metadata matrix
3.3 GCT
GCT files can be directly uploaded to our system.
3.4 Anndata
Anndata can be uploaded as a h5ad file.
3.5 Seurat objects
Seurat objects can be uploaded a .rds files.
3.6 Metabolomics
Data for metabolomics experiments looks very similar to the basic data uploads, with inclusion of a HMDBID column that needs to be left empty.
Feature | HMDBID | Observation 1 | Observation 2 | Observation 3 |
---|---|---|---|---|
Feature 1 | 2 | 0 | 56 | |
Feature 2 | 12 | 20 | 0 | |
Feature 3 | 25 | 31 | 15 | |
Feature 4 | 32 | 7 | 40 | |
Feature 5 | 0 | 6 | 0 | |
Feature 6 | 0 | 7 | 17 |
For the metadata of metabolomics, an extra column should be added named 'injection order'.
Observation | Injection order | Condition |
---|---|---|
Observation 1 | 1 | Control |
Observation 2 | 2 | Control |
Observation 3 | 3 | Treatment |
Observation 4 | 4 | Treatment |
3.7 Spatial single cell 10X
For the time being, all spatial upload will have to go through our team.
The following files need to be uploaded:
- 10X file
- json file
- image
3.8 Gene metadata
The gene metadata is a versatile dataset type designed to store a broad array of gene information. Within this file format, the initial column contains gene identifiers, while subsequent columns are flexible, capable of holding categorical or numerical values.
This type dataset can be visualized and explored using the gene metadata table algorithm. Additionally, it can be used as input in the rank and rule-based meta-analysis allowing you to integrate any available gene information with the omic results you have created within the UniApp.
An example of a correctly formated gene metadata file is shown below:
Gene | Gene type | Disease score | Source |
---|---|---|---|
BRCA1 | Oncogene | 0.84 | CancerDatabase |
BRCA1 | Oncogene | 0.90 | Gene Expression Atlas |
TP53 | Tumor supressor | 0.77 | Gene Expression Atlas |
EGFR | Receptor | 0.23 | Gene Expression Atlas |
Notice that duplicate gene indentifiers are allowed. It is possible handle these duplicates in downstream analyses.
4. Creating a new experiment and data upload - Annotate your files
Once you have annotated your file click on the "Proceed to data staging" button.
5. Setting parameters
In the setting parameters tab you will be able to annotate your data in data staging: specify data matrix type, organism of origin and gene name identifier.
In the Parameters data tab you can more precisely define your data matrix type and provide information that are used for certain downstream algorithms.
- Select data matrix type: specify data matrix type, meaning if the data matrix has been previously normalized or not. A data matrix can be either raw or normalized. You can check if your data matrix has been previously normalized by taking a look at the digits in your data matrix. If the numbers in your data matrix are predominantly integers, no normalization was previously performed. This means you should perform data normalization in the subsequent Data pretreatment module. If the numbers are predominantly decimals the matrix is already normalized. In this case you can skip data normalization in the Data pretreatment module.
- Select organism: specify from which organism the data was derived from. Current options are: human, mouse or rat. You can identify human data by the use of all capital letters for the features. Murine features are usually depicted with 1 capital letter, followed by small letters.
- Select gene identifier: select type of gene identifier used in the first column of the data matrix.
MAKE SURE TO DOUBLE CHECK THIS STEP WITH YOUR SUPERVISOR IF NECESSARY, WRONG ANNOTATIONS WILL LEAD TO ERRORS IN YOUR DOWNSTREAM ANALYSIS OR WILL IMPEDE DATA UPLOAD.
6. Completing dataset upload
Click on 'Complete Dataset Upload to successfully upload your new dataset.
7. Examples of datasets for each technology type
Here you can find a link to an example of a dataset (metadata + data) for each technology type. This can be used as a template to model your own data to, to check whether the upload functions in the UniApp, see compare to your data files to investigate where a possible problem with the upload might be in your data.
7.1 Bulk RNA seq
https://drive.google.com/drive/folders/1Tm70Zv-DqsghuLAQAa-g28VRlGqAra_L
7.2 Micro array gene expression
https://drive.google.com/drive/folders/1ois5--P5fZ5HDiouY-DWgk1Q1gK19bDs
7.3 Gene expression spatial data
Expected soon.
7.4 Single cell RNA seq
https://drive.google.com/drive/folders/1t3eM3YxVm6NoGr3GBTvlAh53xUoBih2k
7.5 Proteomics
https://drive.google.com/drive/folders/13Ix4-0ZWUnC_u3GHmYfbP-sVTTsN5_9l
7.6 Metabolomics
https://drive.google.com/drive/folders/137CahatOugkvx1wd5MngG3hOP7cv3AC8
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article