Retrieve The Reads From Bam File
2
0
Entering edit mode
11.6 years ago
rehma.ar ▴ 290

Dear all,

i have a bam file created by mapping(paired ends) a sample file with the referece file using stampy. now i want to retrieve those reads that have one end mapped anywhere in the chromosomes and the other end is mapped to mitochondria. i think i can use samtools for that but not quite sure how.

if anyone has some idea about that plz suggest.

samtools • 10k views
ADD COMMENT
2
Entering edit mode
11.6 years ago
Ketil 4.1k
samtools view something.bam | grep mitochondrion | grep -v '\t=\t'

Where your bam file is named "something.bam" and your mitochondrion sequence is named "mitochondrion", and \t is the Tab character.

ADD COMMENT
1
Entering edit mode

faster: add the flag "-F 2 " for samtools and use awk '($7!="=")' instead of grep

ADD REPLY
0
Entering edit mode

i used this command samtools view -F 2 ABGSA0012MS20U10.sorted.bam Ssc9MT | awk '($7!="=")' yes it is faster but the confusion is that above command given by " Ketil " is slow but the resulting file has more lines. i used following command to count lines of both files wc -l < file.txt

and the result is 452 and 76778. the difference is alot.what should i trust.

ADD REPLY
0
Entering edit mode

not sure if using 'Ssc9MT' will select the reads if any of the forward/reverse is mapped to this segment.

ADD REPLY
0
Entering edit mode

Searching for the specific sequence as part of the samtools command will give you one line per read mapped to that sequence. Using grep will give you one line for each read where the read or its mate mapps to the sequence. The former is probably okay for your purposes. And a lot faster.

ADD REPLY
0
Entering edit mode

If stampy writes out XA tags, you might have the string "Ssc9MT" in the line somewhere other than as the read chromosome name or the pair chromosome name. The -F 2 method requires it to be exactly the read chromosome name, so you'd expect it to give approximately half of the results of the grep method.

ADD REPLY
0
Entering edit mode

yeah matted your logic is right except that grep command also shows the lines when MRNM($7) is equal to "=" that i do not need. which is obtained by awk '($7!="=") in the Pierre Lindenbaum's command.

thank you all for your time and support.

ADD REPLY
0
Entering edit mode
11.5 years ago
rehma.ar ▴ 290

there is another problem now

i am using this command

samtools view -bh -F 2 FILE.bam mt > out.bam

to get the discordant reads(one end aligns to mitochondria and the other to chromosome) and also those reads that align to mitochondria only. out put file is a bam file. but when i want get fastq out of this bam file using command

bamToFastq -i out.bam -fq out.1.fq -fq2 out.2.fq

it shows that there are so many reads that are marked as pairs but the but the other pair is not in the bam file. the error looks like this

*WARNING: Query HWI-ST170_235:2:22:12110:179082#0 is marked as paired, but it's mate does not occur next to it in your BAM file. Skipping.

can anyone plz tell what,s going wrong.

ADD COMMENT
0
Entering edit mode

please, ask this as a new question.

ADD REPLY

Login before adding your answer.

Traffic: 2601 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6