Extract specific fragments from SAM and count them
2
1
Entering edit mode
7.0 years ago
Jeason Rad ▴ 30

Hi, friends. I have a .sam file & .vcf SNPs file. I want to extract the sequnece fragments which overlaping given heterozygous SNPs sites. Then I want to classify the fragments by the different base on SNPs site and count the number N0 & NX. Can anybody tell me how to code the shell script or which software can do that.

SNP SAM • 3.0k views
ADD COMMENT
0
Entering edit mode

enter image description here

ADD REPLY
1
Entering edit mode
7.0 years ago
guillaume.rbt ★ 1.0k

If you want to extract reads at a given position you can use samtools (ex chromosome 5 position 513931) :

samtools view -b file.bam chr_5:513931-513932 > reads_position.bam

with a list of positions extracted from your vcf, you could loop over it to extract each position :

for i in $(cat list_positions_SNP); do samtools view -b file.bam ${i} > reads_${i}.bam;done
ADD COMMENT
0
Entering edit mode

Thx, friend. But I puzzle more about how to number No and Nx basing on the different base at SNVs site. I had uploaded a sketch map about it. What I want is not the total fragments on given region, but the number of No & Nx.

ADD REPLY
1
Entering edit mode

maybe this information is already in your vcf file, you should look for the field "AD" (allelic depth), which should report the coverage of both alt and ref alleles

ADD REPLY
1
Entering edit mode

yes, AD is what I want. But my ultimate aim is to classify alt alleles fragments by their start or end sequencing sites (so as ref). enter image description here I mean I need to keep classifying and number the fragments by their positions (if fragment start or end in binding site). In the end, figure out No, Nx, no, nx; Any suggestions will be appreciated.

ADD REPLY
0
Entering edit mode

I use this command-line getting each fragment's pos/start/end/sequence in BAM.

bedtools bamtobed -i reads.bam -bedpe | awk -v OFS="\t" '{if($9=="+"){print $1,$2+4,$6+4,$10}else if($9=="-"){print $1,$2-5,$6-5,$10}}' > fragments.bed
ADD REPLY
0
Entering edit mode

Ok I understand. Indeed it's not that trivial. You have to parse the CIGAR string to know if your read is alt or ref at this position.

This topic has already been discussed here : Extracting Reads Containing A Specific Variant From A Bam File

It seems that the person that have posted the question have written a script that does the trick, maybe you can ask him.

ADD REPLY
0
Entering edit mode
7.0 years ago
BioinfGuru ★ 1.7k

Convert to position sorted bed files and use bedtools intersect

ADD COMMENT
0
Entering edit mode

But if I turn SAM to BED, I don't know what the base on SNPs site. I had uploaded a sketch map about it.

ADD REPLY
0
Entering edit mode

could you re-write "I don't know what the base on SNPs site" please? It makes no sense.

ADD REPLY

Login before adding your answer.

Traffic: 2013 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6