The optimal minMQS parameter in featureCounts for RNA-Seq quantification
2
0
Entering edit mode
5.2 years ago
Gary ▴ 480

I have 6 mouse RNA-Seq data, and use STAR for the alignment and featureCounts for the quantification. The command I run featureCoutns is below.

rc <- featureCounts(files = c("Chuong753.bam", "Chuong754.bam", "Chuong755.bam", "Chuong756.bam", "Chuong757.bam", "Chuong758.bam"), annot.ext = "mm10ncbiRefSeqCurated.gtf", isGTFAnnotationFile = TRUE, GTF.featureType = "exon", GTF.attrType = "gene_id", minMQS = 10, strandSpecific = 0, nthreads = 6, verbose = TRUE)

I found that there are a lot of Unassigned_MappingQuality reads (the detail below). Should I set minMQS=3 or 0 to increase the number of Assigned reads? Many thanks.

> rc$stat
                          Status Chuong753 Chuong754 Chuong755 Chuong756 Chuong757 Chuong758
1                       Assigned  28552790  19795064  26274194  26601820  21264775  21703604
2            Unassigned_Unmapped         0         0         0         0         0         0
3      Unassigned_MappingQuality  10233734   6718392  10369784   9299763   9249801  12905108
4             Unassigned_Chimera         0         0         0         0         0         0
5      Unassigned_FragmentLength         0         0         0         0         0         0
6           Unassigned_Duplicate         0         0         0         0         0         0
7        Unassigned_MultiMapping         0         0         0         0         0         0
8           Unassigned_Secondary         0         0         0         0         0         0
9         Unassigned_Nonjunction         0         0         0         0         0         0
10         Unassigned_NoFeatures   2596367   1732967   2577948   1882419   2235492   2757615
11 Unassigned_Overlapping_Length         0         0         0         0         0         0
12          Unassigned_Ambiguity    237389    166711    228953    216951    178357    218899
featureCounts minMQS quantification RNA-Seq • 2.3k views
ADD COMMENT
3
Entering edit mode
5.2 years ago
Gordon Smyth ★ 7.0k

The question was also asked on Bioconductor

https://support.bioconductor.org/p/117597/

and was answered by the developer. The recommendation is not to set a minMQS filter at all for RNA-seq analyses.

ADD COMMENT
1
Entering edit mode

Thank you for pointing that out.

In this situation, I think a cross-post was helpful (since my interpretation was different than provided in that answer), but it is definitely helpful to have access to both sets of answers here!

ADD REPLY
1
Entering edit mode
5.2 years ago

Unfortunately, I can't really say for certain what you should do - that can vary from project-to-project.

I would typically iteratively inform upstream decisions from downstream results (so, if you see something that needs to be fixed, you can focus on seeing the effect of changing a parameter on something that most clearly seems wrong).

I'm also not sure about your reference. I would typically expect this to be for an annotated genome reference. However, if this is transcriptome alignment and/or a de novo assembly + annotation, perhaps that could be part of the cause?

I'm also not entirely certain how serious this issue is: if you pick a threshold (such as FPKM > 0.1), do you detect expression in >60% of your genes? Or, what fraction of genes still have at least a certain number of counts (1,10, 50, etc.)?

ADD COMMENT

Login before adding your answer.

Traffic: 2517 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6