How to extract sequence message from SAM?
1
0
Entering edit mode
7.0 years ago
Jeason Rad ▴ 30

Hi, all. I have a PE-alignment SAM file, how can I extract the pos/start/end/strand/sequence from each flagment? I used this command-line to extract pos/start/end/strand,

bedtools bamtobed -i reads.bam -bedpe | awk -v OFS="\t" '{if($9=="+"){print $1,$2,$6}else if($9=="-"){print $1,$2,$6}}' > fragments.bed

but how to put the sequence message into it?

SAM BED sequencing • 1.4k views
ADD COMMENT
0
Entering edit mode
cut -f 10 file.sam

can be used to extract sequence, but not suit for pair-end sequence.

chr16   85790994        85791072        +
chr8    17447157        17447198        -
chr1    39340268        39340412        +
chr2    227662155       227662242       +
chr12   93416078        93416128        +

This is the bed file from bamtobed. How can I extract the sequence between col2 & col3 from my BAM file? Any suggestions will be grateful!

ADD REPLY
0
Entering edit mode
7.0 years ago

pipe your output of bamtobed into a loop with samtools faidx:

 echo -e "ref\t6\t22\t+\nref\t8\t18\t+" |  | while IFS=$'\t' read -a B; do echo -ne "${B[0]}\t${B[1]}\t${B[2]}\t${B[3]}\t" &&  samtools faidx reference.fa "${B[0]}:${B[1]}-${B[2]}" | grep -v ">" | tr -d "\n" && echo ; done

note: for minus strand, I haven't rev-complemented the sequence. Furthermore , I'm lazy , as the input is BED, there will be +1 shift for the first base of the DNA.

ADD COMMENT

Login before adding your answer.

Traffic: 2133 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6