Biostar Beta. Not for public use.
How To Filter Mapped Reads With Samtools
49
Entering edit mode
6.2 years ago
sohadb1357 • 500

Hi

How do I filter a bam file with some tools (Specifically -how I can remain with the unmapped reads only?). I have single-end mapping,

I searched for hours but everywhere I see the suggestion of samtools view -u -f 4 that (as I understood) doing the oposite thing - filtering out the unmapped reads.

Thanks!' Ohad

samtools • 139k views
ADD COMMENTlink
113
Entering edit mode
15 months ago
Netherlands

Hi, You get a bam(machine readable sam) file after mapping, and it contains information about mapped and unmapped reads.

To get the unmapped reads from a bam file use :

samtools view -f 4 file.bam > unmapped.sam, the output will be in sam

to get the output in bam use : samtools view -b -f 4 file.bam > unmapped.bam

To get only the mapped reads use the parameter 'F', which works like -v of grep and skips the alignments for a specific flag.

samtools view -b -F 4 file.bam > mapped.bam

From the manual; there are different int codes you can use with the parameter 'f', based on what you want :

-f INT Only output alignments with all bits in INT present in the FLAG field. INT can be in hex in the format of /^0x[0-9A-F]+/ [0]

Each bit in the FLAG field is defined as:

Flag        Chr     Description
0x0001      p       the read is paired in sequencing
0x0002      P       the read is mapped in a proper pair
0x0004      u       the query sequence itself is unmapped
0x0008      U       the mate is unmapped
0x0010      r       strand of the query (1 for reverse)
0x0020      R       strand of the mate
0x0040      1       the read is the first read in a pair
0x0080      2       the read is the second read in a pair
0x0100      s       the alignment is not primary
0x0200      f       the read fails platform/vendor quality checks
0x0400      d       the read is either a PCR or an optical duplicate

Like for getting the unique reads (a single read mapping at one best position); I use :

-q INT Skip alignments with MAPQ smaller than INT [0]

samtools view -bq 1 file.bam > unique.bam

HTH

ADD COMMENTlink
4
Entering edit mode
samtools view -b -F 4 file.bam > mapped.bam

Does it really get all mapped reads because using the above gives me less reads than:

samtools view -b -F 4 -f 8 file.bam > onlyThisEndMapped.bam
samtools view -b -F 8 -f 4 file.bam > onlyThatEndMapped.bam
samtools view -b -F12 file.bam > bothEndsMapped.bam
samtools merge merged.bam onlyThisEndMapped.bam onlyThatEndMapped.bam bothEndsMapped.bam
ADD REPLYlink
3
Entering edit mode

From my understanding your first command extracts all mapped reads, but does not extract mates that were unmapped. The command in your second set:

samtools view -b -F 8 -f 4 file.bam > onlyThatEndMapped.bam

Extracts UNMAPPED reads who's mate is mapped. Thus, you will have more reads with the second set of commands because you are including unmapped mates of mapped reads.

ADD REPLYlink
1
Entering edit mode

To make sure I have this correct...

The -f options flags to keep the reads 'Only output alignments with all bits set in INT present in the FLAG field' The -F option flags to remove the reads 'Only output alignments with all bits set in INT present in the FLAG field'

using this page I explored the int meanings but I still a bit confused.

In your first command:

samtools view -b -F 4 -f 8 file.bam > onlyThisEndMapped.bam

The reads that are unmapped are removed (-F 4) and the mates that are unmapped are kept (-f 8). Does this mean it extracts the mapped reads and mates that are unmapped (I assume mate means the other read for paired-end reads)?

My first post so go easy on me :)

ADD REPLYlink
1
Entering edit mode

i agree with 5heikki's comment for filtering mapped PE reads if mapped means either both read and mate mapped or either mapped. From what I tested with pysam, this is their default definition of mapped in the fetch() function.

Edit: I changed my mind and realized that the command -f 4 -F 8 doesn't really generate useful alignments ( alignment record about the read itself and read itself being unmapped while its mate being mapped) so it's best to just use -F 4 to get mapped alignments for PE from sam/bam files. If you want to have only read1 mapped, use flag -f 64 -F 4, for read2, use flag -f 128 -F 4.

ADD REPLYlink
0
Entering edit mode

does the below apply to single end sequencing data? samtools view -b -F 4 -f 8 file.bam > onlyThisEndMapped.bam samtools view -b -F 8 -f 4 file.bam > onlyThatEndMapped.bam samtools view -b -F12 file.bam > bothEndsMapped.bam samtools merge merged.bam onlyThisEndMapped.bam onlyThatEndMapped.bam bothEndsMapped.bam

ADD REPLYlink
0
Entering edit mode

After mapping via BWA, you will get a sam file. So do you mean, we need to convert the sam file to bam file and then using samtools view -b -F 4 file.bam > mapped.bam ?

ADD REPLYlink
1
Entering edit mode

You don't need to convert sam to bam file, its upto you as its just compressed version of sam, so saves some space.
Use this with sam,
samtools view -F 4 file.bam > mapped.bam

ADD REPLYlink
0
Entering edit mode

hey dude,

i am really thankful for your useful answer which helped me a lot

ADD REPLYlink
0
Entering edit mode

Hi,

Just a quick question, why does an alignment with a MAPQ score smaller than 1 mean a uniquely mapping read?

For example, I have the following read which is not unique/uniquely mapping (has the XS:i flag). But this has a MAPQ of 6 therefore would not be filtered out with this criteria? Any ideas?

HISEQ2500-09:128:H9FFTADXX:2:   163    scaffold_7359    909    6    100M    =    1189    380    TGATTATAAGGTTTTACATCACACATCTGAAAATTGATTGGGTACTTTCTTAAAAATTACATCATGATTATCATTTTTTTATTGTATGCAATTTTAATAA    @@@DDB?DDHH?DEBHGE??@FGHIIGGHEGDEECHEFGHII?BFBDFEHGHDCBAFGHIGIHIIGAEHCHIIIIIIIFEDCCCC>@DCACCCCCDCCAC    AS:i:-6    XS:i:-5    XN:i:0    XM:i:1    XO:i:0    XG:i:0    NM:i:1    MD:Z:71A28    YS:i:-16    YT:Z:CP
ADD REPLYlink
0
Entering edit mode

sorry, can I use samtools view -b -F 4 file.bam > mapped.bam for accepted_hits.bam produced by cufflinks for paired-end reads????

ADD REPLYlink
2
Entering edit mode

samtools view -b -f 2 file.bam > mappedPairs.bam

ADD REPLYlink
0
Entering edit mode

thank you then by this command i can filter my accepted_hits.bam from tophat

samtools view -b -f 2 accepted_hits.bam > mappedPairs.bam

ADD REPLYlink
3
Entering edit mode
ADD REPLYlink
0
Entering edit mode

thank you so much

ADD REPLYlink
0
Entering edit mode

I was reviewing this question and its answers. I tried your last suggestion in order to get the unique reads (a single read mapping at one best position) in my sam file. But I didn't get precisely the Unique mapped reads. I got things like this, instead :

SRR4293695.122264806    272 chr1    14671   1   29M *   0   0   GGGGGGAAAGGTGTCATGGAGCCCCCTAC   .F)<F7F)FFFFF)<)AFF7FFFF<AAAA   NH:i:3  HI:i:2  AS:i:26 nM:i:1
SRR4293695.122264806    272 chr9    14782   1   29M *   0   0   GGGGGGAAAGGTGTCATGGAGCCCCCTAC   .F)<F7F)FFFFF)<)AFF7FFFF<AAAA   NH:i:3  HI:i:3  AS:i:26 nM:i:1
SRR4293695.122264806    16  chr16   14356   1   29M *   0   0   GGGGGGAAAGGTGTCATGGAGCCCCCTAC   .F)<F7F)FFFFF)<)AFF7FFFF<AAAA   NH:i:3  HI:i:1  AS:i:26 nM:i:1

I am not sure if I have made something wrong. So probably, I should see instead the NH number as is suggested in : Efficiently extract alignments with NH:i:1 field.

ADD REPLYlink
10
Entering edit mode
19 months ago
rgiannico • 100
Italy

I had the same issue but with Paired End Reads, and I solved using samtools and bamToFastq. You can find bamToFastq here: https://code.google.com/p/hydra-sv/ 

  • If you need unmappedR1.fastq (containing both paired and unpaired R1 unmapped reads) and unmappedR2.fastq ( containing both paired and unpaired R2 unmapped reads)

Use samtools -f 4 to extract all unmapped reads:

samtools view -b -f 4 file.bam > file_unmapped.bam
bamToFastq -bam file_unmapped.bam -fq1 unmappedR1.fastq -fq2 unmappedR2.fastq
  • If you need unmappedpairedR1.fastq (containing only paired R1 unmapped reads) and unmappedpairedR2.fastq (containing only paired R2 unmapped reads). Meaning you need all paired reads where at least one of them is unmapped.

Use samtools -F 2 to discard only reads mapped in proper pair:

samtools view -b -F 2 file.bam > file_unmapped.bam
bamToFastq -bam file_unmapped.bam -fq1 unmappedpairedR1.fastq -fq2 unmappedpairedR2.fastq
ADD COMMENTlink
0
Entering edit mode

I tried, but I couldn't install the hydra package, because I can't sudo. Is there a way to install it without root access, as other programs allow? (my server don't allow root access outside of the university)

Thanx!

ADD REPLYlink
0
Entering edit mode

Better to ask Aaron Quinlan, the developer. Try https://groups.google.com/forum/#!forum/bedtools-discuss

ADD REPLYlink
0
Entering edit mode

but with -f 4 only the mates are extracted that did not map, leaving out the part that did map ( -f 8). Or am I wrong? By this the information about pairing is lost, which is the valuable thing about paired-end datasets. There are cases, where one mate maps to one reference, but both do map to another.

ADD REPLYlink
6
Entering edit mode
13 months ago
Paul ♦ 1.3k
European Union

Very good is from Broad Institute their web app for check all bitwise flags:

https://broadinstitute.github.io/picard/explain-flags.html

ADD COMMENTlink
4
Entering edit mode
17 months ago
Lee Katz ♦ 2.9k
Atlanta, GA

bam2fastq is really helpful with this kind of question. Really, really helpful.

ADD COMMENTlink
2
Entering edit mode
13 months ago
WashingtonDC

PROCESSORS=10
Single_End_Layout:
samtools view --threads $PROCESSORS -b -F 4 in.bam > mapped.bam
samtools view --threads $PROCESSORS -b -f 4 in.bam > unmapped.bam
Paired_End_Layout
samtools view --threads $PROCESSORS -b -f 2 in.bam > mapped.bam
samtools view --threads $PROCESSORS -b -F 2 in.bam > unmapped.bam

ADD COMMENTlink
0
Entering edit mode
12 months ago
ibseq12 • 0
London

does the below apply to single end sequencing data?

samtools view -b -F 4 -f 8 file.bam > onlyThisEndMapped.bam samtools view -b -F 8 -f 4 file.bam > onlyThatEndMapped.bam samtools view -b -F12 file.bam > bothEndsMapped.bam samtools merge merged.bam onlyThisEndMapped.bam onlyThatEndMapped.bam bothEndsMapped.bam

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1