Number of reads in different alignments
1
0
Entering edit mode
5.7 years ago

Dear all,

Just to confirm: it is normal to have a different number of reads when aligning the same reads against two different genomes?

I think the answer is YES because the aligner generates different hits (reads) according to the reference genome. Also, the number of reads of the starting fastq file is different from that of the SAM file, thus SAM does not report a number that is tethered to the starting file.

For instance, I have:

x=$(zcat <file>_1.fq.gz | wc -l)
L=$(($x/4))
echo $L
1 190 389 447 # same number on the paired file
samtools flagstat <align_first_reference>.sam |  grep 'total' | cut -d+ -f1
2 383 795 990
samtools flagstat <align_second_reference>.sam |  grep 'total' | cut -d+ -f1
2 381 177 452

Thank you.

alignment SAM • 1.1k views
ADD COMMENT
0
Entering edit mode

You also need to account for secondary alignments that are bound to be different as well.

ADD REPLY
3
Entering edit mode
5.7 years ago
Joe 21k

If they're different genomes, of course. The sequence won't be exactly the same so it's to be expected that different numbers will map.

ADD COMMENT
1
Entering edit mode

Also, filled gaps in newer versions of the reference may cause reads to become multimappers and potentially trigger the aligner to flag them as unmapped, given they align to too many regions. HISAT2 has such an issue if I remember correctly, producing notably different alignments when using hg19 vs hg38. What kind of references are you using?

ADD REPLY

Login before adding your answer.

Traffic: 3280 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6