for low coverage RNAseq how many reads assigned is the bare minimum for differential gene expression analysis
0
0
Entering edit mode
5.2 years ago
senowinski ▴ 30

With low coverage RNAseq of human tissue - ~6million reads aligned using STAR. Of the 84 samples I have a range of reads aligned to genes of between 2-7 Million reads. What is the bare minimum number of reads I can use for differential gene expression analysis? What is a sensible cut-off? Ideally I would like to retain as many samples as possible.

RNA-Seq • 2.6k views
ADD COMMENT
3
Entering edit mode

Depends on the genome. For example, you need more read depth for human alignments than you do for fly alignments.

What is the bare minimum number of reads I can use for differential gene expression analysis?

There's not really a bare minimum. Depends how sensitive your analysis is. Also depends on sequencing quality (how many good reads remain after processing) and genome size, as I mentioned already.

You should go ahead with the differential expression analysis. That part doesn't take that long. And if you decide to do more sequencing, you will have the differential expression pipeline already setup.

ADD REPLY
0
Entering edit mode

It's human alignments and when you say go ahead with the differential gene expression analysis, do you think I should try this analysis with all the samples?

ADD REPLY
0
Entering edit mode

Well, as I say, it depends if they are outliers on metrics other than read count.

ADD REPLY
2
Entering edit mode

What is your organism? Six million reads is low coverage for human, but it is not for yeast, for example. And how are the 84 samples distributed within treatments? Literature shows biological replicates are more important than read depth per sample when it comes to statistical power.

ADD REPLY
2
Entering edit mode

We normally talk about reads in the sample, rather than reads assigned to genes. A dirty little secrete that people often don't talk about is that often only around a third (total ribo-delpleted) to two thirds (polyA) reads map to exons. So when some says they have 20M polyA reads, the probably only really have 13M assigned to exons.

I'd normalise your sample with DESeq2s rLog and see which samples stand out on the PCA/MDS. Do you have two-read samples that are a million miles away from all the other samples? Do they have other thigns wrong with them (GC distribution, over-represented sequences etc). If your low coverage samples cluster on a PCA/MDS with the high coverage ones, I'd probably use them. If they are miles away I'd discard them.

As was pointed out by @h.mon, a lot of power in RNA-seq comes from replicates rather than read number.

ADD REPLY
1
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 1430 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6