bowtie reported reads and aligned reads number doesn't match
1
0
Entering edit mode
5.4 years ago

Hi, I have been trying to run bowtie on several libraries and every time I check my results I realized the number of reads with at least one reported alignment and the number of reported alignments to 1 output stream(s) doesn’t match and I don't understand why.

I use the following command:

bowtie -v 1 -S -a INDEX my_library.fq bowtie_output.sam

Bowtie output:

reads processed: 14500699

reads with at least one reported alignment: 11397231 (78.60%)

reads that failed to align: 3103468 (21.40%)

Reported 301564237 alignments to 1 output stream(s)

As you can tell the two numbers in bold don’t match. Moreover, the last number (reported alignments to 1 output) is even bigger than the number of reads processed. I wonder where does this come from and how can I fix it?

Any help would be appreciated. Thanks

bowtie • 4.2k views
ADD COMMENT
1
Entering edit mode

I have updated my answer, if this solve your question you can accept it, thanks

ADD REPLY
0
Entering edit mode

Could you retry without options about filtering results, like :

bowtie -S INDEX my_library.fq bowtie_output.sam
ADD REPLY
0
Entering edit mode

yes, if I remove all filters the numbers are now equal but I don't know what does this means still.

reads processed: 14500699

reads with at least one reported alignment: 11896128 (82.04%)

reads that failed to align: 2604571 (17.96%)

Reported 11896128 alignments to 1 output stream(s)

ADD REPLY
3
Entering edit mode

In the bowtie manual :

Specifying -a instructs Bowtie to report all valid alignments

Multiple mapped reads are "valid alignments" so they are reported too

Try your original command without -a

Please, never use a command line with options wrote by someone else. Try to use options wisely and if you don't know if this or that option fit your data or what you want to achieve, simply don't use it

Furthermore, if you can, prefer bowtie2 over bowtie

ADD REPLY
4
Entering edit mode
5.4 years ago

A read can be aligned on multiples locations (multi mapping), this is why you got more alignments than reads

If you add reads with at least one alignment (11397231) + reads that failed to align (3103468), you got the number of processed reads (14500699)

In bowtie manual :

$ ./bowtie -a -v 2 e_coli --suppress 1,5,6,7 -c ATGCATCATGCGCCAT

  • gi|110640213|ref|NC_008253.1| 148810 10:A>G,13:C>G
  • gi|110640213|ref|NC_008253.1| 2852852 8:T>A
  • gi|110640213|ref|NC_008253.1| 4930433 4:G>T,6:C>G
  • gi|110640213|ref|NC_008253.1| 905664 6:A>G,7:G>T
  • gi|110640213|ref|NC_008253.1| 1093035 2:T>G,15:A>T

Specifying -a instructs Bowtie to report all valid alignments, subject to the alignment policy: -v 2. In this case, Bowtie finds 5 inexact hits in the E. coli genome; 1 hit (the 2nd one listed) has 1 mismatch, and the other 4 hits have 2 mismatches. Four are on the reverse reference strand and one is on the forward strand. Note that they are not listed in best-to-worst order.

The 5 alignments are correct and will be reported if you use -a option

If you lower the -v option with -v 1, you will have only one result :

  • gi|110640213|ref|NC_008253.1| 2852852 8:T>A

so -a option or not you will have one result for this read

ADD COMMENT
0
Entering edit mode

yes, I checked that but the last number (301564237) is way bigger than the processed reads and I don't understand where it come from. In all the other post I have seen the two numbers in bold are exactly the same.

ADD REPLY
0
Entering edit mode

In all the other post I have seen the two numbers in bold are exactly the same.

Please share these posts you mentionned

An alignment is not a read

Take a look at this answer here : understanding multi-mapped reads

ADD REPLY
0
Entering edit mode

posts:

Also someone in my lab tried the same with one of their datasets (also single end reads) and their results look like this:

reads processed: 210944752
reads with at least one reported alignment: 207403255 (98.32%)
reads that failed to align: 3541497 (1.68%)
Reported 207403255 alignments to 1 output stream(s)
ADD REPLY
1
Entering edit mode

the "other posts" you mention here are terrible examples btw: they all start with "I have serious issues with my mapping" or such, so I would certainly not take those a 'good examples' ;) , all of them point to issues where the mapping went awol.

ADD REPLY
0
Entering edit mode

What are the data you are working on ?

How did you generate your index ? What was the command line ?

What is the Bowtie version you used ? And your colleague Bowtie version ?

What was the command line of your colleague ?

Do you have a complete log file from Bowtie ?

ADD REPLY
0
Entering edit mode

Its a small RNA seq experiment from crustacean embryos. I used

bowtie build file.fa INDEX_name for the index

I am using bowtie version 1.1.2. My colleague is also using bowtie 1. The command line was the same just -v 1.

Thanks for all the help !

ADD REPLY
0
Entering edit mode

What do you mean by

the same just -v 1

?

ADD REPLY
0
Entering edit mode

In the (unlikely) case there are only unique mappings (and thus no multi-mappings as Bastien Hervé correctly points out) you might get this kind of result yes.

In most 'normal' cases you will end up with something more similar to your original post

ADD REPLY
0
Entering edit mode

It seems odd to me to have 0 multi mapped over 210944752 reads.

My best hint is in the bowtie version. Maybe authors changed the meaning of reported alignments from uniquely mapped reads in old version to uniquely + multimapped reads in the recent one

ADD REPLY

Login before adding your answer.

Traffic: 2044 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6