Illegal Division by zero
1
0
Entering edit mode
5.6 years ago

I am trying to work on TEclass from https://academic.oup.com/bioinformatics/article/25/10/1329/269651

When I'm using the TEclassBuild.pl I get an Illegal division by zero error at line 179 message.

Looking at the subroutine that handles that;

#-------------------------------------------------------------------------------
# Purpose   : Builds a feature-vector for each sequence. The features are  
#             oligomer frequencies.
# Usage     : process_sequence($string, $number, *filehandle) 
# Arguments : sequence (string), prefix (number), output (filehandle) 
# Returns   : Nothing, prints.
# Globals   : Uses the global variables of the build_tetramer_hash (or pentamer) 
#             subroutines
#*******************
sub process_sequence {
    my $sequence = uc($_[0]);
    my $prefix = $_[1]; # The the SVM "classifier"
    my $OUT = $_[2];    # Filehandle

    # The vector of SVM features. Initially every element is zero
    my @features; 
    foreach my $x (0..$oligos) {
        $features[$x] = 0;
    }
    my $iterations = length($sequence)-$mer-1;
    # Counts the number of occurences of each x-mer in the sequence
    foreach my $x (0..$iterations) {
        if ( exists( $x_mers{substr($sequence, $x, $mer)} ) ) {
            $features[$x_mers{substr($sequence, $x, $mer)}]++;
            $features[$oligos+1]++;
        }
    }
    # Calculates and prints the frequencies for each oligomer
    print $OUT $prefix, "\t";
    foreach my $x (0..$oligos) {
        $features[$x] /= $features[$oligos+1]; 
        print $OUT "", ($x+1), ':', $features[$x], "\t";
    }
    print $OUT "\n";    
}

Line 179 is

    176: # Calculates and prints the frequencies for each oligomer
    177: print $OUT $prefix, "\t";
    178: foreach my $x (0..$oligos) {
    179:    $features[$x] /= $features[$oligos+1]; 
    180:    print $OUT "", ($x+1), ':', $features[$x], "\t";
    181:}

Line 179 shows that they add a +1 to the denominator. Wouldn't that mean that I am not dividing the $features[$x] with 0? Any suggestions? I ran it a couple of times with different parameters but still I receive the same error message.

PERL Transposable Elements Repeats • 4.1k views
ADD COMMENT
1
Entering edit mode

They add +1 to the index of features.

It seems the if statement some 8 lines back is never true.

ADD REPLY
0
Entering edit mode

Thanks for the suggestion. I tried the suggestion below first but if that doesn't work, I'll take a look further into this. Thanks!

ADD REPLY
1
Entering edit mode

Debug the values of the array at different stages of the script. Check that the array is initialized to non-zero values, or that a pseudocount is applied correctly.

ADD REPLY
0
Entering edit mode

Still not sure what's happening. I tried changing

$features[$x] /= $features[$oligos+1];

to

$features[$x] /= $features[$oligos]+1;

Thanks for the tip. The job is now running a bit further. Hopefully this works properly. Thanks

ADD REPLY
1
Entering edit mode

That's not a good idea.

$features[$oligo+1]

does have a special meaning (summing all occurances). You should find out why it is zero, not replace it with something else.

ADD REPLY
0
Entering edit mode
$features[$oligo+1]

I tried redoing this and ensuring that @features does not contain any zero values. It still gives me an illegal division by zero error message.

ADD REPLY
3
Entering edit mode
5.6 years ago
Michael 54k

You are trying to keep the total sum of all occurrences in $features[$oligos+1] (accessing the n+1th element ) with $oligos being some integer, a global variable from out of nowhere.

Note: this part of the array @features has never been initialized as 0. It works anyway in perl, because ($undef_variable)++ == 1 plus a warning (if warnings are on). Note2: The code contains some important variables defined in global scope, therefore one needs to guess what they are meant to do.

$features[$x] /= $features[$oligos+1]; 
<=> $features[$x] = $features[$x] / $features[$oligos+1];

Now, the only way to get a division by zero is that $features[$oligos+1] == 0, which again means if ( exists( $x_mers{substr($sequence, $x, $mer)} ) ) was never True.

If this part of the code is supposed to find your kmers in 'their' kmers $x_mers and further assuming that this is the correct way of pattern matching, that would simply mean that your $sequence does not contain any of the n_mers in $x_mers and the code does not protect against this case.

#-------------------------------------------------------------------------------
# Purpose   : Builds a feature-vector for each sequence. The features are  
#             oligomer frequencies.
# Usage     : process_sequence($string, $number, *filehandle)

# Look at the required arguments, I guess you should have a $number instead


# Arguments : sequence (string), prefix (number), output (filehandle) Not sure what is meant by prefix
# Returns   : Nothing, prints.
# Globals   : Uses the global variables of the build_tetramer_hash (or pentamer) 
#             subroutines
## global %x_mers (a catalogue of some or all k-mers, with the rank of the k-mers among all k-mers?),
## $oligos (possibly the total number of all kmers in %x_mers, not needed), $mer, this is actually a rather bad idea (possibly x_mers is a huge hash)  

#*******************
sub process_sequence {
    my $sequence = uc($_[0]);
    my $prefix = shift; # The the SVM "classifier"
    my $OUT = shift;    # Filehandle
    my $mer = shift; # mer should be a parameter of the fuction
    ## 
    my $total = 0; # total number of found kmers for normalization
    my $n_oligos = scalar(keys %x_mers) ;
    # The vector of SVM features. Initially every element is zero

    my @features = (0) x $n_oligos; # does the same as the foreach

    my $iterations = length($sequence)-$mer-1;

    # just another example of non-defensive programming: what guarantees that 
    # length ($sequence) > $mer-1, what happens if iterations is negative?!?

    die "input sequence must be longer than $mer+1" unless $iterations > 0; # you could argue that the function should simply return silently in this case

    # Counts the number of occurences of each x-mer in the sequence
    # this is probably fine if the sequence is short in comparison to the
    # kmer catalogue, however there should be more efficent ways to count
    # k-mers in perl in general
    foreach my $x (0..$iterations) {
        if ( exists( $x_mers{substr($sequence, $x, $mer)} ) ) {
            $features[$x_mers{substr($sequence, $x, $mer)}]++;
           $total++;
        }
    }
    # Calculates and prints the frequencies for each oligomer
    warn "None of the $mer mers in the input sequence was found in the kmer catalogue, printing all NA frequencies" 
       if ($total < 1);

    print $OUT $prefix, "\t";
    foreach my $x (0..$n_oligos-1) { # note the -1 .. because scalar gives the number entries
        print $OUT "", ($x+1), ':', ($total)? ($features[$x]/$total):'NA', "\t"; # protects against division by zero,  possibly this output should not be used, or print 'NA' or 'inf'
    }
    print $OUT "\n";    
}
ADD COMMENT
0
Entering edit mode

Hi, thanks for the help. I'll try to apply this next. Apologies if I am not good at this. I am trying to pick up this tool and trying it on my data set and the readme's suggested dataset. Both were not working so I'm trying to check if there is a fix for it. I'll try to apply this changes as soon as I can and try to understand it. Again thank you very much. Cheers

ADD REPLY
1
Entering edit mode

Ok, so this is not your code? In this case (aside from flaming the author), just to get a minimal quick fix, just replace line 179 in the original script:

- $features[$x] /= $features[$oligos+1];

+ $features[$x] = ($features[$oligos+1] > 0)? $features[$x] / $features[$oligos+1] : 'NA'; # or 0

But you still should check why your sequence contains no known k-mers.

ADD REPLY
0
Entering edit mode

Hi thanks to your help and the author's help (He got back to me! :D). You guys helped me what was going on. My stuff had masked sequences and that's what's causing the issue. I didn't get it at first but looking at what you said and what he said helped me dig further into the code.

ADD REPLY
0
Entering edit mode

Could you tell me the author's help?I have the same problem.Thanks

ADD REPLY

Login before adding your answer.

Traffic: 2592 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6