Biostar Beta. Not for public use.
Get all the nucleotides from a bam file that is aligning to specific position in the genome
0
Entering edit mode
15 months ago
rsafavi • 40

Hello,

I have an alignment bam file, and I would like to retrieve all the bases aligning to that specific location in the genome. For example, if I am interested in position X in the genome, I would like to extract all the nucleotides that are aligning to that position X from my alignment file

ADD COMMENTlink
0
Entering edit mode

Hello rsafavi,

why do you like to do this and how should your desired output look like?

The procedure would be to get all reads that overlap the desired position using samtools view. Due to the mapping position given for each read, one can than find the base on the location with respect to the CIGAR value.

fin swimmer

ADD REPLYlink
2
Entering edit mode
17 months ago
toralmanvar • 760

If you mean, you want to extract read base which is aligning to particular position of your genome, then you can try samtools mpileup or gatk Pileup. You can explore more about mpileup format here. Samtools mpileup will give you output in form of 5 columns :

  • Sequence identifier
  • Position in sequence (starting from 1)
  • Reference nucleotide at that position
  • Number of aligned reads covering that position (depth of coverage)
  • Bases at that position from aligned reads
  • quality of those bases (OPTIONAL)

From this you can idea about the base present in reference genome and nucleotide of the reads mapping to that particular position.

If you have just handful of positions to check then you can also view your alignment in genome viewer like IGV or Tablet to visualise the bases at that particular position.

ADD COMMENTlink
0
Entering edit mode

I am working with Nanopore cDNA data. I tried samtool mpileup before like this:

samtools mpileup intersectedReads.bam -r chr1:13625035-13625035 > intersectedReads.pileup

as a test case, but I got:

chr1 13625035 N 0 * *

chr1 13625036 N 0 * *

which is not correct, since when I look at the genome browser I see two reads aligning to that region

ADD REPLYlink
0
Entering edit mode

Its strange. And why 3rd column which represents reference base is "N"?. What is the actual reference nucleotide in that position? I suggest you to give it a try again considering the broad region (~5000k).

ADD REPLYlink
1
Entering edit mode
15 months ago
rsafavi • 40

I figured it out. I had to put -Q 0 ( base quality) for Nanopore data

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1