Scripts Calculate Tetranuclear Frequency
2
0
Entering edit mode
11.7 years ago
Shuixia100 ▴ 120

Hi there,

Does anyone as a scripts for tetranucleotide frequency calculation? I have tried to use the BioStrings in Bioconductor but Im looking for something that could calculate the tetranucleotide frequency from both forward and reverse trands simultaneously and input into ESOM Tools for binning of metagenomic contigs.

plus any suggestion on metagenomic binning is greatly welcomed.Is there a established program to do the binning ? Ive tried TETRA, it is just too slow for NGS sequences.

THX Kylie

• 4.9k views
ADD COMMENT
1
Entering edit mode
11.7 years ago

In R, with seqinr:

library(seqinr)
seq=read.fasta("sequence.fa")[[1]]
count(seq,freq=TRUE,wordsize=4) # for the tetranucleotide frequency (edited)
count(rev(comp(seq)),freq=TRUE,wordsize=4) # but this is exactly the reverse complement of the previous vector (if you see what I mean)
ADD COMMENT
0
Entering edit mode

Hi, I think op is asking for counts for all 4-mer words (tetra-nucleotides), I think you have to use wordsize=4 as parameter for count.

ADD REPLY
0
Entering edit mode

Sorry, I meant to do that, and then I forgot...

ADD REPLY
0
Entering edit mode

Then what would be the scripts, plus do you have some documentation for beginers to seqinr?

ADD REPLY
0
Entering edit mode

Are you unable to click on the link I provided or to read my updated code in my answer?

ADD REPLY
0
Entering edit mode

sorry dont notice the update THX

ADD REPLY
0
Entering edit mode
11.7 years ago

As for binning, there's a recent paper entitled A comparative evaluation of sequence classification programs . Authors assessed not only sensitivity/precision but also runtime performance.

ADD COMMENT

Login before adding your answer.

Traffic: 2255 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6