How to get cigar string from bwa or bowtie
1
0
Entering edit mode
5.2 years ago

I have two sequenes and I want to get the CIGAR string of a local alignment. I've tried with bwa and bowtie

The first one give me this, all *. When I change a nucleotide in the query, all it does is change the sequence reported in the output (8th column), but nothing else!

$ ./bwa mem   ../seq1.fa ../seq2.fa 
[M::bwa_idx_load_from_disk] read 0 ALT contigs
@SQ SN:subject  LN:1141
@PG ID:bwa  PN:bwa  VN:0.7.17-r1198-dirty   CL:./bwa mem ../seq1.fa ../seq2.fa
[M::process] read 1 sequences (13 bp)...
[M::mem_process_seqs] Processed 1 reads in 0.000 CPU sec, 0.000 real sec
subject 4   *   0   0   *   *   0   0   AGCAAAGCAAGTT   *   AS:i:0  XS:i:0
[main] Version: 0.7.17-r1198-dirty
[main] CMD: ./bwa mem ../seq1.fa ../seq2.fa
[main] Real time: 0.002 sec; CPU: 0.006 sec

With bowtie, if I change a base in the query it gets me the same CIGAR (12M). Doesn't seems to be logic to me.

$ bowtie  seq1.fa.fai -f  seq2.fa  --sam
@HD VN:1.0  SO:unsorted
@SQ SN:subject  LN:1141
@PG ID:Bowtie   VN:1.2.2    CL:"/Users/juan/Documents/manu/dev/bioinfoch/2/E_TEs/bowtie-1.2.2-macos-x86_64/bowtie-align-s --wrapper basic-0 seq1.fa.fai -f seq2.fa --sam"
subject 0   subject 331 255 12M *   0   0   AGCAAAGCAAGT    IIIIIIIIIIII    XA:i:0  MD:Z:12 NM:i:0  XM:i:2
# reads processed: 1
# reads with at least one reported alignment: 1 (100.00%)
# reads that failed to align: 0 (0.00%)
Reported 1 alignments
➜  E_TEs 
$ bowtie  seq1.fa.fai -f  seq2.fa  --sam
@HD VN:1.0  SO:unsorted
@SQ SN:subject  LN:1141
@PG ID:Bowtie   VN:1.2.2    CL:"/Users/juan/Documents/manu/dev/bioinfoch/2/E_TEs/bowtie-1.2.2-macos-x86_64/bowtie-align-s --wrapper basic-0 seq1.fa.fai -f seq2.fa --sam"
subject 0   subject 331 255 12M *   0   0   AGCAAAGTAAGT    IIIIIIIIIIII    XA:i:1  MD:Z:7C4    NM:i:1  XM:i:2
# reads processed: 1
# reads with at least one reported alignment: 1 (100.00%)
# reads that failed to align: 0 (0.00%)
Reported 1 alignments
bowtie • 2.9k views
ADD COMMENT
2
Entering edit mode

The sequence is to short to work with bwa mem.

Further more a M in a CIGAR String means alignment match. This can be a match or a mismatch to the reference. The MD tag gives you the information whether it is a match or a mismatch to the reference.

There are also tools that can convert "normal" CIGAR strings to extended ones. See How to convert from a cigar string to extended cigar string?

ADD REPLY
0
Entering edit mode

first read has sam flag = 4 your read is unmapped, no cigar string.

ADD REPLY
0
Entering edit mode

I see, with bwa mem. What about the CIGAR in bowtie?

ADD REPLY
0
Entering edit mode
5.2 years ago

M means that the nucleotide maps to the REF. Maps here means that there are same number of nucleotide in the REF (i.e. No INDELs), but they can be either match or mismatch

Bowtie: Here 12M means the "map" is over 12 nucleotide sequence (i.e. without INDELs), but it can be either match or mismatch.

If you need to see the differences, see the MD tag MD:Z:7C4 (7 Match, one C mismatch and then 4 match)

BWA: BWA is not able to find any match. Note that bwa is looking for full length match of 13bp (vs bowtie of 12bp).

ADD COMMENT
0
Entering edit mode

is there a way to get a string represeting the alignment straightforward like MATCH-MATCH-MISMATCH

ADD REPLY
0
Entering edit mode

Parse MD tag. Matches will appear as numbers, mismatches as [ATCG]

ADD REPLY

Login before adding your answer.

Traffic: 3195 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6