Question

Good mean coverage but big std in coverage

0

Entering edit mode

5.6 years ago

Pin.Bioinf ▴ 340

Hello, I have computed the coverage of my reads with bamQC and I got, for one of my reads:

>>>>>>> Coverage

     mean coverageData = 33.5984X
     std coverageData = 4,599.979X

     There is a 1,3% of reference with a coverageData >= 1X
     There is a 1,14% of reference with a coverageData >= 2X
     There is a 0,85% of reference with a coverageData >= 3X

This 33X is good for rnaseq analysis, but seeing the std so big and a very low percentage of reference above 3X, seems very bad. What should my conclusion be? Even if its a high mean coverage if it has a standard deviation so high, then this should be a bad coverage? Or is this mean coverage still good even if I have a huge std?

Please, help me interpret these results.

Thank you

bamqc qualimap coverage rnaseq • 2.6k views

ADD COMMENT • link 5.6 years ago by Pin.Bioinf ▴ 340

0

Entering edit mode

What is the motivation to do this analysis, so what question do you aim to answer? With the things given, it is hard to interpret anything, as the question is lacking.

ADD REPLY • link 5.6 years ago by ATpoint 81k

0

Entering edit mode

I want to know if my read is good enough to use it for differential expression analysis. I have read 2X is enough coverage for that. What do you think?

ADD REPLY • link 5.6 years ago by Pin.Bioinf ▴ 340

0

Entering edit mode

Fold coverage is generally a useless concept in RNAseq.

ADD REPLY • link 5.6 years ago by Devon Ryan 104k

0

Entering edit mode

So how do i know if my fastq file is good enough for differential expression analysis? This sample mapped 4,7% of reads against reference

ADD REPLY • link 5.6 years ago by Pin.Bioinf ▴ 340

0

Entering edit mode

4.7% is crazy low, there is obviously something wrong, focus on figuring out why such a small percentage of your reads are aligning.

ADD REPLY • link 5.6 years ago by Devon Ryan 104k

0

Entering edit mode

Okay, thank you, but i have read 2X is the minimum mapped coverage for using the reads for differential expression analysis. I eant to know if this fastq is good enough or if i need to repeat the sample sequencing

ADD REPLY • link 5.6 years ago by Pin.Bioinf ▴ 340

2

Entering edit mode

I don't know where you read that, but please follow guidelines such as this one:

Bioconductor RNA-seq workflow: gene-level exploratory analysis and differential expression

ADD REPLY • link 5.6 years ago by WouterDeCoster 47k

0

Entering edit mode

Hello Corentin, thank you for your detailed answer. Then, looking at the % of uniquely mapped reads should be enough? How can I decide if the percentage of uniquely mapped reads is good enough for analysis?

ADD REPLY • link 5.6 years ago by Pin.Bioinf ▴ 340

0

Entering edit mode

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

ADD REPLY • link 5.6 years ago by WouterDeCoster 47k

0

Entering edit mode

"Good enough" is subjective and depends on your experiment, your genome, your reads quality etc... But the higher the percentage of mapped read the better.

You can look in the literature and tutorials for additional steps to check and analyse your data. RNA-seq is a popular method and a lot of resources are available that will explain it more comprehensibly and better than me.

ADD REPLY • link 5.6 years ago by Corentin ▴ 600

0

Entering edit mode

This is the table of some of the samples after mapping:

Sample Name % Aligned   M Aligned
I_S1    4.7%    1.0
I_S2    48.7%   3.2
I_S3    49.0%   3.0
I_S4    46.6%   7.6
I_S6    82.4%   49.3

ADD REPLY • link 5.6 years ago by Pin.Bioinf ▴ 340

2

Entering edit mode

Something appears to have gone wrong with your first 4 samples, presumably they're heavily contaminated or something. Blast some of the reads that didn't align to try and determine what went wrong.

ADD REPLY • link 5.6 years ago by Devon Ryan 104k

score 2 · Answer 1 · 2018-09-14

Hi Pin.Bioinf,

You've tagged rnaseq. In RNA-Seq data, the coverage is not uniformly distributed. Expressed transcripts' exons have coverage, while introns and intergenic regions have nearly no coverage. Than difference in abundance lead to different read-depth levels. Thus, don't use coverage statistics in RNA-Seq.

Cheers,

Michael

score 2 · Answer 2 · 2018-09-14

Hello,

From this report only it would seems that almost all your reads mapped to a very small portion of your genome. Do not forget that rna-seq will only map to protein-coding parts of your genome which is estimated (in humans) as 2% of the genome total size. Moreover, in rna-seq the coverage depends on the expression level of the transcript, if you have a heavily expressed transcript then you will have a deep coverage, this could explain the high std.

If you want a qc statistic on your alignment you can check how many of your reads mapped to the reference (for example with samtools flagstat).

And to have more insight you can try to visualize the alignment with tools like IGV https://software.broadinstitute.org/software/igv/. Just upload your bam file and you will be able to see the amount of reads mapped along the reference (on your case probably long regions of 0x coverage and then peaks where a transcript is expressed).

Regards,