How to calculate standard deviation of coverage for each scaffold in a pileup file?
0
0
Entering edit mode
6.1 years ago
PollardMD • 0

A previous member of my research group wrote a perl script that printed scaffold length and mean coverage for each scaffold within a .pileup file created by the samtools mpileup function. I would like to add to this script so that the output file will also have a column for the standard deviation of coverage for each scaffold, but unfortunately I have no experience with perl. How would I go about doing this? Here is the current version of the script, for reference:

my $file = "example.pileup";
my $outfile = "example.threecolumn.txt";


my $ds;


open FPIN, "<".$file or die "Cannot open $file";

while (<FPIN>) {

    die unless (/^([^\t]+)\t[^\t]+\t[^\t]+\t([^\t]+)\t/);

    my ($scaf, $num) = ($1, $2);

    if ($ds->{$scaf}) {
        $ds->{$scaf}->{"count"} = $ds->{$scaf}->{"count"} + 1;
    } else {
        $ds->{$scaf}->{"count"} = 1;
        $ds->{$scaf}->{"sum"} = 0;
    }

    $ds->{$scaf}->{"sum"} = ($ds->{$scaf}->{"sum"} + $num);
}

close FPIN;

open FPOUT, ">".$outfile or die;

foreach my $scaf (keys %$ds) {

    print FPOUT $scaf."\t".($ds->{$scaf}->{"sum"} / $ds->{$scaf}->{"count"})."\t".$ds->{$scaf}->{"count"}."\n";

}

close FPOUT;

Many thanks in advance for any suggestions.

coverage pileup perl sequencing • 1.2k views
ADD COMMENT

Login before adding your answer.

Traffic: 2287 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6