sam file not showing multi mapped reads
1
1
Entering edit mode
5.6 years ago
Moses ▴ 150

Hi,

I have paired end reads Illumina 1.9 encoding. I am trying to figure out a way to see the reads that are multimapped, from the sam file.

To test this I created a database (an index file) using bowtie2 from a fasta file that contains a whole genome for Bacteroides vulgatus ATCC 8482 species but I copied the same genome twice in the same fasta file before creating the index file. I did this intentionally to check for duplicate reads. I deally any read that will be mapped to this genome should be mapped twice since I have the exact same sequence in my database twice. To create the index file I issued the following command:

$BT2_HOME/bowtie2-build /home/mstambou/represantative_gut_species/duplidate_genomes.fna duplicate_genomes_indices/duplicate_genomes_db

and then to map the reads to the created database I issued the following command:

$BT2_HOME/bowtie2 --threads 24 -x duplicate_genomes_indices/duplicate_genomes_db -1 /home/team/hutternhower14/info/trimmed_1/gut_DNA/SRR769540-trimmed.1.fq -2 /home/team/hutternhower14/info/trimmed_1/gut_DNA/SRR769540-trimmed.2.fq -S X317802115/mapping_duplicate/SRR769540_duplicate.sam

the output of my last command is the following:

4280027 reads; of these:
  4280027 (100.00%) were paired; of these:
    4144369 (96.83%) aligned concordantly 0 times
    2 (0.00%) aligned concordantly exactly 1 time
    135656 (3.17%) aligned concordantly >1 times
    ----
    4144369 pairs aligned concordantly 0 times; of these:
      0 (0.00%) aligned discordantly 1 time
    ----
    4144369 pairs aligned 0 times concordantly or discordantly; of these:
      8288738 mates make up the pairs; of these:
        8251346 (99.55%) aligned 0 times
        16 (0.00%) aligned exactly 1 time
        37376 (0.45%) aligned >1 times
3.61% overall alignment rate

This output is in agreement of what I expect to see, it's saying non of the reads (other than 2 for some reason) are mapped 1 time, and most of them (the ones that are mapped) are mapped more than 1 time. But when I later check the sam output file I only see each read ID once (i.e. each read only has one row in the sam output). I thought I should see more than one occurrence of the read IDs that are multi mapped. I really don't understand the reason behind this. I am pasting a sample of my sam file just for reference (the rows are sorted alphabetically using the read IDs).

@HD VN:1.0  SO:unsorted
@SQ SN:CP000139.1   LN:5163189
 @SQ    SN:copy_2   LN:5163189
@PG ID:bowtie2  PN:bowtie2  VN:2.3.4.2  CL:"/home/mstambou/bowtie2/bowtie2-2.3.4.2/bowtie2-align-s --wrapper basic-0 --threads 24 -x duplicate_genomes_indices/duplicate_genomes_db -S X317802115/mapping_duplicate/SRR769540_duplicate.sam -1 /home/team/hutternhower14/info/trimmed_1/gut_DNA/SRR769540-trimmed.1.fq -2 /home/team/hutternhower14/info/trimmed_1/gut_DNA/SRR769540-trimmed.2.fq"
SRR769540.1001033.1 89  copy_2  4066024 0   99M =   4066024 0   GTATCACATTTATCACAGTAGGTCTGATGGCAATCGCATTTATGTGTTTCTCTGGTTTGAACATCTAAAAAGAAAGGAATAAGACAATGGATATGAGCA     
SRR769540.1001033.2 165 copy_2  4066024 0   *   =   4066024 0   TGGCTTACTCTGACGTGCCTGCTCCGTTGAAGGGCTTGGGTATCACATTTATCACAGTAGGTCTGATGGCAATCGCATTTATGTGTTTCTCTGGTTTGAAC   B   
SRR769540.1001041.1 99  copy_2  2179441 1   101M    =   2179495 155 TATCTCCATTAGCGAAAACAGGCGCATGATACATCTTTCCACAGGCCTTTGCATAAGCAATCTGAGTGGACTCTTCATCCAAAGCCTCCCAGCGAGCTTTT   :??BDB:2C<<A;EEEAECD??+1???;B4?B>B*BA   
SRR769540.1001041.2 147 copy_2  2179495 1   101M    =   2179441 -155    AAGCAATCTGAGTGGACTCTTCATCCAAAGCCTCCCAGCGAGCTTTTACACGCATCTCCTGCACTGCTTCATAAGCTAACTTCTGTACGTGAAAGCGATCG   CCAC>A>>>A>C    
SRR769540.1001043.1 101 copy_2  2179939 0   *   =   2179939 0   TAGCGGTTCTGAAGACCAGCTAAGTTCTGGAACACGCATGACATGGACAGATTTCTGGAACGAATGACCGAAAT  7>=AA=7<=B<4,<<7+233+?>=2?= 
SRR769540.1001043.2 153 copy_2  2179939 1   101M    =   2179939 0   CCAATAAATCCCCGTCAACTCCAGATAAAACACCTAAGAGCTTGCAACTGATCGGATAGCTATCAATATAGTTCTTTTAAAAAAGACGCAAAATCTTGCGT       
SRR769540.1001566.1 89  copy_2  4992489 0   7M1I5M1D88M =   4992489 0   GATTTCTCTTTCTCTTTTTAGTTTTCTTAGTCAAAATATGACTATGAAAAGCGTGCTTTCTTTTGATTTTACCTGTTCCGGTAAGGGTAAACCTTTTTTTA   DCDCDCDDCC?CDDDDCDDCDDDDCCDDDDDDEECEC   
SRR769540.1001566.2 165 copy_2  4992489 0   *   =   4992489 0   GTTCCTTAACCTGGCTTAGATTTGTTGTATCAACAGTTGTAGAGTAGCACAGATTTCTCTTTCTCTTTTTAGTTTTCTTAGTCAAAATATGACTATGAAAA       
SRR769540.1001567.1 89  copy_2  4992500 0   101M    =   4992500 0   CTCTTCTTAGTTTTCTTAGTCAAAATGTGACTGTGAAAAGCGTGCTTTCTTTTGATTTTACCTGTTCCGGTAAGGGTAAACCTTTTTTTAGAACCGGAGTT   A?>CCCCACDCCCCACDDDDCDBCC:CAC   
SRR769540.1001567.2 165 copy_2  4992500 0   *   =   4992500 0   TTTACGCTTTTGATTACTTCATTGCTAAGAGTTCCTTAACCTGGCTTACATTTGTTGTATCAACAGTTGTAGAGTAGCACAGATTTCTTTTTCTCTTCTTA       
SRR769540.1001573.1 69  CP000139.1  4992236 0   *   =   4992236 0   CGCTGCGTGAAGTAGCGGCGCTAACATTCAAAAAAAGTAAAACTATGCCAAGATCAGTAAATCATGTTGCTTCAAAAGCAAGAAGAAAGAAAATTTTGAAA       
SRR769540.1001573.2 137 CP000139.1  4992236 0   84M1D17M    =   4992236 0   AAGTAACCTCTGGTTAATTTCAAAATTTTCTTTCTTCTTGCTTTTGAAGCAACATGATTTACTGATCTTGGCATAGTTTTACTTTTTTTGAATGTTAGCGC   C

As you can see each read ID is only present once. Am I issuing the wrong command in bowtie? that maybe it's only showing me the primary hits? Im very new to this field and am sorry in advance for the novice question.

thanks.

bowtie2 SAM • 2.0k views
ADD COMMENT
3
Entering edit mode
5.6 years ago

The default is to show only the primary hits (it's usually what you want). If you want it to report the secondary alignments then use -k, though you may need to filter groups of reads afterward and you're not guaranteed that the primary hit will be the best possible hit that could be reported (though in practice it's highly likely to be).

ADD COMMENT
0
Entering edit mode

Thank you for your answer. Yeah right after the post I read more about bowtie2 and found that out. I used -a parameter that reports all the alignments that it finds for the reads. So your saying if I leave it to the default settings then bowtie2 will guarantee to return the best possible hit for that read?

ADD REPLY
2
Entering edit mode

Yes, by default it'll return the best possible hit (with a few caveats, having to do with seed mismatches and the -D and -R options).

ADD REPLY
0
Entering edit mode

I see. Thanks for all the input

ADD REPLY
0
Entering edit mode

How would it determine what the 'best' mapping is if both regions are (as in this toy model) exactly the same? My initial guess is that it would simply report the first mapping if there are two equally good mappings. But in the above example, copy 2 is reported. Maybe after indexing, copy 2 is placed ahead of copy 1? Or maybe bowtie2 does not report the first mapping but the last mapping?

ADD REPLY
1
Entering edit mode

In the case of equally high scoring alignments the result is pseudo-random.

ADD REPLY

Login before adding your answer.

Traffic: 2269 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6