Question

The optimal minMQS parameter in featureCounts for RNA-Seq quantification

0

Entering edit mode

5.2 years ago

Gary ▴ 480

I have 6 mouse RNA-Seq data, and use STAR for the alignment and featureCounts for the quantification. The command I run featureCoutns is below.

rc <- featureCounts(files = c("Chuong753.bam", "Chuong754.bam", "Chuong755.bam", "Chuong756.bam", "Chuong757.bam", "Chuong758.bam"), annot.ext = "mm10ncbiRefSeqCurated.gtf", isGTFAnnotationFile = TRUE, GTF.featureType = "exon", GTF.attrType = "gene_id", minMQS = 10, strandSpecific = 0, nthreads = 6, verbose = TRUE)

I found that there are a lot of Unassigned_MappingQuality reads (the detail below). Should I set minMQS=3 or 0 to increase the number of Assigned reads? Many thanks.

> rc$stat
                          Status Chuong753 Chuong754 Chuong755 Chuong756 Chuong757 Chuong758
1                       Assigned  28552790  19795064  26274194  26601820  21264775  21703604
2            Unassigned_Unmapped         0         0         0         0         0         0
3      Unassigned_MappingQuality  10233734   6718392  10369784   9299763   9249801  12905108
4             Unassigned_Chimera         0         0         0         0         0         0
5      Unassigned_FragmentLength         0         0         0         0         0         0
6           Unassigned_Duplicate         0         0         0         0         0         0
7        Unassigned_MultiMapping         0         0         0         0         0         0
8           Unassigned_Secondary         0         0         0         0         0         0
9         Unassigned_Nonjunction         0         0         0         0         0         0
10         Unassigned_NoFeatures   2596367   1732967   2577948   1882419   2235492   2757615
11 Unassigned_Overlapping_Length         0         0         0         0         0         0
12          Unassigned_Ambiguity    237389    166711    228953    216951    178357    218899

featureCounts minMQS quantification RNA-Seq • 2.3k views

ADD COMMENT • link updated 5.2 years ago by Gordon Smyth ★ 7.0k • written 5.2 years ago by Gary ▴ 480

score 3 · Answer 1 · 2019-02-05

3

Entering edit mode

5.2 years ago

Gordon Smyth ★ 7.0k

The question was also asked on Bioconductor

https://support.bioconductor.org/p/117597/

and was answered by the developer. The recommendation is not to set a minMQS filter at all for RNA-seq analyses.

ADD COMMENT • link 5.2 years ago by Gordon Smyth ★ 7.0k

1

Entering edit mode

Thank you for pointing that out.

In this situation, I think a cross-post was helpful (since my interpretation was different than provided in that answer), but it is definitely helpful to have access to both sets of answers here!

ADD REPLY • link 5.2 years ago by Charles Warden 8.2k

score 1 · Answer 2 · 2019-02-05

Unfortunately, I can't really say for certain what you should do - that can vary from project-to-project.

I would typically iteratively inform upstream decisions from downstream results (so, if you see something that needs to be fixed, you can focus on seeing the effect of changing a parameter on something that most clearly seems wrong).

I'm also not sure about your reference. I would typically expect this to be for an annotated genome reference. However, if this is transcriptome alignment and/or a de novo assembly + annotation, perhaps that could be part of the cause?

I'm also not entirely certain how serious this issue is: if you pick a threshold (such as FPKM > 0.1), do you detect expression in >60% of your genes? Or, what fraction of genes still have at least a certain number of counts (1,10, 50, etc.)?