Question

Htseq-count Why Do I have more nofeature readswhen i'm changing nonunique option ?

0

Entering edit mode

6.2 years ago

bjmnpnd • 0

Hello everyone,

I am using htseq-count (version 0.9.1) on single-end RNA seq data. I did the analysis with the option --nonunique none and --nonunique all, however I obtained different results.

non unique none

htseq-count -f bam -a 0 -s no --nonunique none -m union -r pos -t gene -i Name No_treated_qual_MPOB_sort.bam my_file.bam my_file.gff3 > output_nonunique_none.txt

non unique all

htseq-count -f bam -a 0 -s no --nonunique all -m union -r pos -t gene -i Name No_treated_qual_MPOB_sort.bam my_file.bam my_file.gff3 > output_nonunique_none.txt

In both cases I have 17 344 033 SAM alignment pairs processed.

But the result of __no_feature and __ambiguous is different between these 2 command line. I don't understand why ?

To my opinion when I read this manual, I should obtain the same number of __no_feature and __ambiguous with the 2 command line and I should obtain a different number of count for reads which are mapping against a gene.

Result with non unique none

__no_feature                  3198841
__ambiguous                   187272
__too_low_aQual               0
__not_aligned                 1582688
__alignment_not_unique        1449930
sum_of_reads_map_against_gene 10925302
total                         17344033

Result with non unique all

__no_feature                  4114357
__ambiguous                   203260
__too_low_aQual               0
__not_aligned                 1582688
__alignment_not_unique        1449930
sum_of_reads_map_against_gene 11851412
total                         19201647

So if someone can explain me why do I obtain more no feature reads and ambiguous reads in the case of the --nonunique all ?

RNA-Seq rna-seq • 2.8k views

ADD COMMENT • link updated 6.2 years ago by Carlo Yague 8.6k • written 6.2 years ago by bjmnpnd • 0

score 1 · Answer 1 · 2018-02-22

1

Entering edit mode

6.2 years ago

Carlo Yague 8.6k

Well, it is simple: multi-mappers (only considered with --nonunique all) can map to 0, 1 or more features. Therefore, the 1449930 alignments not unique are distributed in the "no feature", "ambiguous" and "map_against_gene" categories.

ADD COMMENT • link 6.2 years ago by Carlo Yague 8.6k

0

Entering edit mode

I don't undestand in the manual, it says : __alignment_not_unique : reads (or read pairs) with more than one reported alignment.

That means reads can map on 2 features or more not 0 (no_feature) and 1.

ADD REPLY • link 6.2 years ago by bjmnpnd • 0

1

Entering edit mode

Let me clarify:

A read can align to multiple sections of the genome, generating multiple alignments. When you include those reads in the analysis, all their possible alignments will be compared to the gene annotation and an alignment can correspond to 0 (no feature), 1(map against gene) or more(ambiguous) features.

To summarize:

1 read -----> alignment 1 -----> ambigous OR map against gene OR no feature
       -----> alignment 2 -----> ambigous OR map against gene OR no feature
       -----> alignment n -----> ambigous OR map against gene OR no feature

Usually, multimappers (reads with multiple alignments) are omitted from the analysis because you don't know from which section of the genome it comes from.