Tajima-Nei Distance estimate with BioPerl
0
0
Entering edit mode
3.1 years ago

Hi

I have been trying to estimate Tajima-Nei distance for my data (if you wanna see the files I leave the link below).

I´m following this protocol from BioPerl: https://metacpan.org/pod/Bio::Align::DNAStatistics

I have 314 sequences in a fasta file and another file with the list of IDs. Fasta:

>AVP78031.1
----atgttgtttttcttgtttcttcagttcgccttagtaaactc---------------
------------------------------------ccagtgtgttaacttgacaggcag
a----------------accccactcaatcccaattat--actaattcttcacaaagagg
...

IDs:

AVP78031.1
...

I´m using this Perl script to calculate the Tajima-Nei Distance in a pairwise comparison (314 * 314):

use strict;
use warnings;
use Bio::AlignIO;
use Bio::Align::DNAStatistics;

my $file = $ARGV[0];
my $idfile = $ARGV[1];

if ($file eq "" ) {
  $file = "NT_MSA_S_protein.fasta";
} elsif ($idfile eq "" ) {
  $idfile = "NT_ID_S_protein.csv";
}


#### Considerando un archivo
my @contentIDS;

open (LIST, $idfile) or die;
while (my $l = <LIST>) {
  $l =~ s/\n//g; # delete newline
    $l =~ s/\r//g; # delete CR
  next if (length($l) < 1);
  push @contentIDS, $l;
}
close LIST;

#### .... IDs list 
my $stats = Bio::Align::DNAStatistics->new();
my $alignin = Bio::AlignIO->new(-format => 'fasta', -file   => $file);  ### $file: MSA file
while (my $aln = $alignin->next_aln) {
  #print "reading...A\n"; ### DIAG
  my $matrix = $stats->distance(-align => $aln, -method => 'Tajima-Nei');
  #print "reading...B\n"; ### DIAG
  ### Obtaining values for each pair (DISTANCE!)
  WL1:
  foreach my $aaa (@contentIDS) { ### ID #1
    WL2:
    foreach my $baa (@contentIDS) { ### ID #2
    next (WL2) if ($aaa eq $baa);
  my $data =  $matrix->get_entry($aaa, $baa);
  #($data = 0) if ($data < 0);
    print "DISTANCE\t$aaa\t$baa\t$data\n";
    } # END WL2
  } # END WL1
}

exit;


#

This script work it when I tried with small data, however, when I try with my real data this is the error message

MSG: Must provide a DNA alignment to Bio::Align::DNAStatistics, you provided a protein
---------------------------------------------------
Can't locate object method "get_entry" via package "0" (perhaps you forgot to load "0"?) at Tajima-Nei_Distance_NV.pl lin$

This is weird because I review my data for ambiguous characters and the characters are in majority "atcg" and on some occasions "n", at least that there are other ambiguous characters (maybe) that represent protein sequence. I really don´t understand the message because the fasta file is clearly a nucleotide sequence.

Link: https://github.com/MauriAndresMU1313/Example_Tajima-Nei_Distance_Bioperl/tree/main

Anyone with experience using Bioperl and estimation Tajima-Nei distance?

Any comment or help is welcome!! Thank!

DNA Perl BioPerl Tajima-Nei_Distance • 516 views
ADD COMMENT

Login before adding your answer.

Traffic: 1886 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6