RNASeq differential expression: How to deal with few genes with extremely high expression levels
1
0
Entering edit mode
7.1 years ago
komal.rathi ★ 4.1k

Hi everyone,

I have a set of 4 KO and 4 Control samples from mice and I am performing a differential expression on it. My pipeline is STAR alignment to mm10 followed by RSEM. I then use the expected counts from RSEM and normalize them using Voom because I want to perform a differential gene expression using limma.

Here are some QC plots:

Boxplot of Samples before and after normalization: https://ibb.co/d1czbF

PCA of Samples before and after normalization: https://ibb.co/bE85GF

Reads distribution: https://ibb.co/j6R33v

First my controls and KOs do not group as I would expect. But my main concern is the gene expression - you can see in the read distribution plots that there are a few genes with extremely high expression levels. Top 8/20 highest expressing genes belong to chromosome M. I am normalizing the expression levels using Voom (limma) but I wanted to know if this distribution will affect any downstream differential expression results and if yes, how can I fix it?

Thanks!

RNA-Seq mtDNA limma voom • 1.9k views
ADD COMMENT
4
Entering edit mode
7.1 years ago

But my main concern is the gene expression - you can see in the read distribution plots that there are a few genes with extremely high expression levels.

This is very typical of RNA-seq experiments. You should take the log if you want to see something in the distribution of the un-normalized counts.

First my controls and KOs do not group as I would expect.

If you look at the PCA after normalization, you can see that the sample 1142 HP is a clear outlier that totally dominate the PC1. But on the PC2, the control and KOs samples group relatively well.

I wanted to know if this distribution will affect any downstream differential expression results and if yes, how can I fix it?

Voom normalization is robust to highly expressed genes so it should be ok.

ADD COMMENT
0
Entering edit mode

Thanks for the answer - that was very helpful. The reason I was just wondering if my downstream analysis for finding differentially expressed genes got affected because I only found 8 genes differentially expressed and I am quite used to seeing hundreds of genes popping up as differentially expressed.

ADD REPLY

Login before adding your answer.

Traffic: 3031 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6