Calculate nucleotide frequency for a position in a bam file using Bio::DB Perl module
0
0
Entering edit mode
8.5 years ago

Hi everyone,

I am new to Bioperl so please bear with me.

I have a bam file and a chr-position file that looks like this:

10    1156771
10    37484026
10    78483209
10    82960189

I have to calculate the number of A, T, G, C, N, a, t, g, c (forward strand and reverse strand) for each of the positions in the position file, such that the output looks like:

chr pos      depth A T G C a t g c N
10  1156771  3     1 2 0 0 0 0 0 0 0
10  37484026 5     0 0 3 2 0 0 0 0 0

Now, I know this exact task can be done by:

  1. samtools mpileup -> pileup2indel.pl/Rsamtools or other parsing scripts
  2. bam-readcount (directly takes position file and bam file to generate that output)
  3. other existing tools

And I have done it before too using bam-readcount! But I have been told to do it using Bio::DB::Sam module.

Anyway, the following is my code to generate pileup at a particular position, the only thing I can't figure out now is getting the reference base ($refBase):

#!/usr/bin/perl
use Bio::DB::Sam;
my $sam = Bio::DB::Sam->new(-bam => "sample.bam", -fasta => "hg19.fasta");
my $cb = sub {
        my ($seqid, $pos, $pileups) = @_;
        my $refBase = $sam->segment($seqid,$pos,$pos)->dna;
        # cannot access reference base. $refBase is always empty.
        print "\n$pos\t$refBase=>";
        my @tmp;
        my @strtmp;
        for my $pileup (@$pileups){
                my $al = $pileup->alignment;
                my $strand = $al->strand;
                push(@strtmp,$strand);
                my $qBase = substr($al->qseq, $pileup->qpos, 1);
                # to account for reverse strand
                # if $strand is 1 then $qBase will remain same
                # if $strand is -1 then $qBase will change to small
                if($strand==-1){
                        $qBase = lc $qBase;
                }
                push(@tmp,$qBase);
                #print "$qBase";
        }
        $scalstrand = join("",@strtmp);
        $scal = join("", @tmp);
        print $scal,"\t";
        print $scalstrand,"\t";
        my $len = length($scal);
        my $A = $scal =~ tr/A//;
        my $T = $scal =~ tr/T//;
        my $C = $scal =~ tr/C//;
        my $G = $scal =~ tr/G//;
        my $a = $scal =~ tr/a//;
        my $c = $scal =~ tr/c//;
        my $g = $scal =~ tr/g//;
        my $t = $scal =~ tr/t//;
        # print the depth and counts of A,T,G,C,a,t,g,c
        print "Depth:$len, A:$A, T:$T, C:$C, G:$G, a:$a, t:$t, c:$c, g:$g\n";
        @tmp=();
};
# call pileup for chr1:10000 only
$sam->pileup('1:10000-10000', $cb);

Thanks!

pileup perl bioperl • 3.7k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Yes, because they asked me to delete it there as it was not the relevant community. So now it is only on Biostars. Is that ok?

ADD REPLY

Login before adding your answer.

Traffic: 1400 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6