Biostar Beta. Not for public use.
Single cell-seq data preprocessing-How to detect the gene/transcript distribution for each single cell
0
Entering edit mode
12 months ago
sreekalasn • 0

Hello everyone, I have an expression matrix log TPM+1 for 14,000 cells and 23,000 genes (GSE87544). In the paper (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5782816/#SD9), the authors analysed 14,000 cells and reduced the data to 3000 cells and 2000 genes, before using Seurat for cell clustering.

I am new to single cell seq and in the learning process. I would appreciate help regarding the pre-processing of single-cell seq data (or finding gene/transcript distribution as in this case), since I could not find sources discussing the data pre-processing in detail.

Thank you very much!

scRNA-seq • 155 views
ADD COMMENTlink
3
Entering edit mode
12 months ago
United States

A good primer about pre-processing single-cell RNA-seq is Aaron Lun's paper and the numerous simpleSingleCell vignettes (Starting from "UMI" or "Droplet-based data").

A good intro focused on QC of scRNA-seq data is also part of the scater package documentation.

ADD COMMENTlink
0
Entering edit mode

Thank you so much. I found these sources very useful

ADD REPLYlink
1
Entering edit mode
12 months ago
geek_y 9.7k
Barcelona/CRG/London/Imperial

2000 genes could be the most variable genes across cells which will be used for PCA and then t-SNE/UMAP.

Filtering cells should be defined in methods of the paper. Abnormally high UMI counts, high mitochondrial genes, low number of genes captured, low sequencing depths, doublets etc can be some of the reasons to filter scRNA data. It also depends on the version of Seurat.

A quick read at paper says "From the 14,000 cells analyzed, 3,319 cells have more than 2,000 genes detectable in a single cell".

Its sad that you did not keep minimal effort to read the paper you are interested in.

ADD COMMENTlink
0
Entering edit mode

I did go through the paper multiple times. However, the authors have not described in detail how they filtered the data and found the "highly variable genes". They have referenced another article, but again, I could not understand the filtering part. Hence, I posted the question here hoping to receive some help. Thanks for your heads up on the plausible factors to filter scRNA data.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1