Question

Single cell-seq data preprocessing-How to detect the gene/transcript distribution for each single cell

0

Entering edit mode

4.9 years ago

sreekalasn • 0

Hello everyone, I have an expression matrix log TPM+1 for 14,000 cells and 23,000 genes (GSE87544). In the paper (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5782816/#SD9), the authors analysed 14,000 cells and reduced the data to 3000 cells and 2000 genes, before using Seurat for cell clustering.

I am new to single cell seq and in the learning process. I would appreciate help regarding the pre-processing of single-cell seq data (or finding gene/transcript distribution as in this case), since I could not find sources discussing the data pre-processing in detail.

Thank you very much!

scRNA-seq • 2.1k views

ADD COMMENT • link updated 4.9 years ago by Friederike 8.9k • written 4.9 years ago by sreekalasn • 0

score 3 · Answer 1 · 2019-05-21

3

Entering edit mode

4.9 years ago

Friederike 8.9k

A good primer about pre-processing single-cell RNA-seq is Aaron Lun's paper and the numerous simpleSingleCell vignettes (Starting from "UMI" or "Droplet-based data").

A good intro focused on QC of scRNA-seq data is also part of the scater package documentation.

ADD COMMENT • link 4.9 years ago by Friederike 8.9k

0

Entering edit mode

Thank you so much. I found these sources very useful

ADD REPLY • link 4.9 years ago by sreekalasn • 0

score 1 · Answer 2 · 2019-05-21

1

Entering edit mode

4.9 years ago

GouthamAtla 12k

2000 genes could be the most variable genes across cells which will be used for PCA and then t-SNE/UMAP.

Filtering cells should be defined in methods of the paper. Abnormally high UMI counts, high mitochondrial genes, low number of genes captured, low sequencing depths, doublets etc can be some of the reasons to filter scRNA data. It also depends on the version of Seurat.

A quick read at paper says "From the 14,000 cells analyzed, 3,319 cells have more than 2,000 genes detectable in a single cell".

Its sad that you did not keep minimal effort to read the paper you are interested in.

ADD COMMENT • link 4.9 years ago by GouthamAtla 12k

0

Entering edit mode

I did go through the paper multiple times. However, the authors have not described in detail how they filtered the data and found the "highly variable genes". They have referenced another article, but again, I could not understand the filtering part. Hence, I posted the question here hoping to receive some help. Thanks for your heads up on the plausible factors to filter scRNA data.

ADD REPLY • link 4.9 years ago by sreekalasn • 0