Biostar Beta. Not for public use.
Question: How to remove singletons in a BAM file using samtools
0
Entering edit mode

Dear All,

I want to remove the singletons from the aligned bam file.

Downloaded fastq files using

fastq-dump --split-files SRR1517848

Then aligned the paired end fastq files, using BWA by:

bwa mem hg19.fa R_1.fa R_2.fa -o SRR1517848.sam

Then converted it to BAM format and I used samtools to remove the singletons using : according to the paper, article

samtools view -@ 8 -F 0x04 -b SRR1517848.sam > SRR1517848.bam

Then I looked into the stats using samtools flagstat command by;

samtools flagstat SRR1517848.bam

which gave;

   Before filtering 
    4614120 + 0 in total (QC-passed reads + QC-failed reads)
    4549753 + 0 mapped (98.61% : N/A)
    4609656 + 0 paired in sequencing
    2304828 + 0 read1
    2304828 + 0 read2
    4461356 + 0 properly paired (96.78% : N/A)
    4517788 + 0 with itself and mate mapped
    27501 + 0 singletons (0.60% : N/A)
    39296 + 0 with mate mapped to a different chr
    33576 + 0 with mate mapped to a different chr (mapQ>=5)

And after filtering

4549753 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
4464 + 0 supplementary
0 + 0 duplicates
4549753 + 0 mapped (100.00% : N/A)
4545289 + 0 paired in sequencing
2281318 + 0 read1
2263971 + 0 read2
4461356 + 0 properly paired (98.15% : N/A)
4517788 + 0 with itself and mate mapped
27501 + 0 singletons (0.61% : N/A)
39296 + 0 with mate mapped to a different chr
33576 + 0 with mate mapped to a different chr (mapQ>=5)

So my Questions:

1) So the command to remove the singletons by -F 0x04 have only removed unmapped reads ?

2) In both before and after stats the singletons remain, and what are those?

Before: 27501 + 0 singletons (0.60% : N/A)

After : 27501 + 0 singletons (0.61% : N/A)

3) Is there a way to remove singletons in variant detection GATK pipeline ? what benefits does it have?

Thanks

ADD COMMENTlink 14 months ago neranjan • 20 • updated 14 months ago finswimmer 11k
3
Entering edit mode

As per the SAM flag guide the flag that you are filtering for i.e. 0x04 is for unmapped reads. This explains the removal of unmapped reads.

In case you want to filter for singletons i.e. reads for which the mate is unmapped you an select that as the condition. This gives a SAM flag value of 0x08. You can hence change your command to

samtools view -@ 8 -F 0x08 -b SRR1517848.sam > SRR1517848.bam

This should remove the singleton reads.

ADD COMMENTlink 14 months ago rizoic • 190
Entering edit mode
2

I'd also add the header to the bam file (by adding the -h parameter), to avoid problems in downstream analysis.

ADD REPLYlink 14 months ago
michael.ante
♦ 3.3k
Entering edit mode
0

Thanks when I used the above command it gave singletons removed reads:

4549662 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
4373 + 0 supplementary
0 + 0 duplicates
4522161 + 0 mapped (99.40% : N/A)
4545289 + 0 paired in sequencing
2263971 + 0 read1
2281318 + 0 read2
4461356 + 0 properly paired (98.15% : N/A)
4517788 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
39296 + 0 with mate mapped to a different chr
33576 + 0 with mate mapped to a different chr (mapQ>=5)

As for SAM flag guide when I use 0x08 I am removing mate unmapped (0x8) reads, and with a warning: Flag(s) and 0x8 cannot be set when read is not paired

eg: R1 and R2 are mate pairs, where:
R1 = mapped (mate is unmapped)
R2 = unmapped .
So with this I am removing the mate unmapped reads only ?
i.e so I am removing R2 reads and keeping the R1 ?

2) My Second question is How important is to remove the singletons in GATK variant detection pipe line? Is this a step that I need to be doing? or can I avoid this?

ADD REPLYlink 14 months ago
neranjan
• 20
Entering edit mode
0

Thanks when I used the above command it gave singletons removed reads:

4549662 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
4373 + 0 supplementary
0 + 0 duplicates
4522161 + 0 mapped (99.40% : N/A)
4545289 + 0 paired in sequencing
2263971 + 0 read1
2281318 + 0 read2
4461356 + 0 properly paired (98.15% : N/A)
4517788 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
39296 + 0 with mate mapped to a different chr
33576 + 0 with mate mapped to a different chr (mapQ>=5)

As for SAM flag guide when I use 0x08 I am removing mate unmapped (0x8) reads, and with a warning: Flag(s) and 0x8 cannot be set when read is not paired

eg: R1 and R2 are mate pairs, where:
R1 = mapped (mate is unmapped)
R2 = unmapped .
So with this I am removing the mate unmapped reads only ?
i.e so I am removing R2 reads and keeping the R1 ?

2) My Second question is How important is to remove the singletons in GATK variant detection pipe line? Is this a step that I need to be doing? or can I avoid this?

ADD REPLYlink 14 months ago
neranjan
• 20
Entering edit mode
1

In your e.g. R1 is the singleton read since its mate is unmapped and thus R1 read will be removed when you filter for the flag 0x08.

ADD REPLYlink 14 months ago
rizoic
• 190
Entering edit mode
0

Thanks for the clarification. So why is it important to remove the singleton ?
(So my thinking is::: since its mate can not be mapped to the chromosome , the mapped read can not be accurately determined. )

ADD REPLYlink 14 months ago
neranjan
• 20

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0