Phred score check for BAM files
0
2
Entering edit mode
7.0 years ago
Shicheng Guo ★ 9.4k

Hi All,

I finish a perl script to check the phred score for bam files. I just want to make sure there is no bugs in it.

Thanks.

Usage: perl bam2phred.pl input.bam

#!/usr/bin/perl
use strict;

if ($ARGV[0] =~ /^-[h?]/) {
    print "Usage: determine-phred FILE
Reads a bam, possibly gzipped and returns the phred-scale, 
  either 64 or 33, based on a quick scan of the data in the file.
";
    exit 0;
}
my $cnt;
my $dphred = 64;
my $input=$ARGV[0];
if ($ARGV[0] =~ /\.bam$/) {
    $ARGV[0] = "samtools view '$ARGV[0]'|";
}
my $qual;
my $comm;
my $fmt;
if (@ARGV > 1) {
    my @mult = @ARGV;
    for my $f (@mult) {
        @ARGV = ($f);
        determine();
        print "$f\t$dphred\n";
    }
} else {
    determine();
    print "$input: $dphred\n";
}

sub determine {
    $_ = <>;
    # bam
    $fmt = 'bam';
    $qual = (split(/\s+/, $_))[10];
    if (!$qual) {
    die "$input: Unknown file format\n";
    }

    my $ssiz=7000;          # sample size
    while($qual) {
        for (my $i =length($qual)/2; $i < length($qual); ++$i) {
            if (ord(substr($qual,$i,1)) < 64) {
                $dphred = 33;
                $cnt=$ssiz; # last
                last;
            }
        }
        $qual = '';
        last if ++$cnt >= $ssiz;    # got enough
        if ($fmt eq 'fq') {
            # fastq
            scalar <>;      # id
            scalar <>;      # read
            scalar <>;      # comment
            $qual = <>;
            chomp $qual;
        } else {
            # sam
            $qual = (split(/\t/, $_))[10];
        }
    }
}
phred score 33 64 bam • 3.3k views
ADD COMMENT
0
Entering edit mode

I am wondering, isn't this step (know the scoring method) necessary before aligning the reads/alignment ? Since some aligners expect us to provide this information.

ADD REPLY
1
Entering edit mode

Yes, the score needs to be known prior to alignment. Also, the sam format specifies that all quality scores be in ASCII-33.

ADD REPLY
0
Entering edit mode

Hi Shicheng Guo, what is the main purpose of this code ? In what kind of situation, this can be used. I assume this can be only used if you want to realign BAM (alignment) file.

ADD REPLY
0
Entering edit mode

I hope to achieve the phred score from bam files which download from public database with this script.

ADD REPLY
0
Entering edit mode

But what is the use of getting the phred score information after alignment has been done (because BAM/SAM will already have phred converted to +33 base quality). Please correct me if I am wrong.

Please check this SAM specification.

ADD REPLY
0
Entering edit mode

The results are not consistent with determine-phred. I'm not sure which one is correct. if there is '?' or numbers, it should be Q33. With this case, determine-phred is right. It seems that the determine-phred accepts a revised sam format ( deleting the lines start with @ ). A phred score reference is here and here. Thank you.

The code is used on condition that you have gotten bam or sam files after mapping. Just change Q63 (if your fastq is) to Q33 for the bam files. @EagleEye .

Update:

  1. ~/local/bam2phred.pl test.bam will output test.bam: 64

  2. samtools view test.bam | gawk 'BEGIN{FS="\t"}!/^@/{split($11, aa, ""); for (i in aa) data[aa[i]]}END{ asorti(data); for (j in data) printf data[j]; print ""}' will output #&'()*+,-./0123456789:;<=>?@ABCDEFGHIJ, which is Q33.

I cannot read perl so far, it seems there should be bugs in your script or the similar one determine-phred.

ADD REPLY

Login before adding your answer.

Traffic: 1679 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6