Question

Get all the nucleotides from a bam file that is aligning to specific position in the genome

0

Entering edit mode

5.8 years ago

rsafavi ▴ 60

Hello,

I have an alignment bam file, and I would like to retrieve all the bases aligning to that specific location in the genome. For example, if I am interested in position X in the genome, I would like to extract all the nucleotides that are aligning to that position X from my alignment file

samtools pysam genome position alignment • 5.6k views

ADD COMMENT • link updated 3.7 years ago by Kevin Blighe 87k • written 5.8 years ago by rsafavi ▴ 60

0

Entering edit mode

Hello rsafavi,

why do you like to do this and how should your desired output look like?

The procedure would be to get all reads that overlap the desired position using samtools view. Due to the mapping position given for each read, one can than find the base on the location with respect to the CIGAR value.

fin swimmer

ADD REPLY • link 5.8 years ago by finswimmer 16k

score 2 · Answer 1 · 2018-07-25

2

Entering edit mode

5.8 years ago

Tm ★ 1.1k

If you mean, you want to extract read base which is aligning to particular position of your genome, then you can try samtools mpileup or gatk Pileup. You can explore more about mpileup format here. Samtools mpileup will give you output in form of 5 columns :

Sequence identifier
Position in sequence (starting from 1)
Reference nucleotide at that position
Number of aligned reads covering that position (depth of coverage)
Bases at that position from aligned reads
quality of those bases (OPTIONAL)

From this you can idea about the base present in reference genome and nucleotide of the reads mapping to that particular position.

If you have just handful of positions to check then you can also view your alignment in genome viewer like IGV or Tablet to visualise the bases at that particular position.

ADD COMMENT • link 5.8 years ago by Tm ★ 1.1k

0

Entering edit mode

I am working with Nanopore cDNA data. I tried samtool mpileup before like this:

samtools mpileup intersectedReads.bam -r chr1:13625035-13625035 > intersectedReads.pileup

as a test case, but I got:

chr1 13625035 N 0 * *

chr1 13625036 N 0 * *

which is not correct, since when I look at the genome browser I see two reads aligning to that region

ADD REPLY • link 5.8 years ago by rsafavi ▴ 60

0

Entering edit mode

Its strange. And why 3rd column which represents reference base is "N"?. What is the actual reference nucleotide in that position? I suggest you to give it a try again considering the broad region (~5000k).

ADD REPLY • link 5.8 years ago by Tm ★ 1.1k

score 1 · Answer 2 · 2018-07-26

1

Entering edit mode

5.8 years ago

rsafavi ▴ 60

I figured it out. I had to put -Q 0 ( base quality) for Nanopore data

ADD COMMENT • link 5.8 years ago by rsafavi ▴ 60

score 0 · Answer 3 · 2020-08-22

0

Entering edit mode

3.7 years ago

Kevin Blighe 87k

Posted an answer here: A: Extract Base Based On Position From Bam File

Kevin

ADD COMMENT • link 3.7 years ago by Kevin Blighe 87k