Binning Tools for Long Reads/Contigs
1
0
Entering edit mode
5.1 years ago
vijinim ▴ 100

Majority of the currently available metagenomics binning tools are designed to work with short reads and contigs obtained from short reads.

Does someone know if there are any tools available to bin long reads or contigs obtained from long reads?

Thank you very much! :)

metagenomics binning long reads contigs • 3.0k views
ADD COMMENT
1
Entering edit mode

I think Kraken (and possibly centrifuge) can take long reads. Kraken I’m fairly sure can work on contigs too.

ADD REPLY
0
Entering edit mode

Thank you very much. I will try it and see. :)

ADD REPLY
0
Entering edit mode

What is the difference between binning long contiguous sequences assembled from short reads and binning long contiguous sequences obtained from long reads?

ADD REPLY
0
Entering edit mode

I believe there is no difference apart from the effects of the error rates of short reads and long reads.

However, I tried to bin a simulated dataset of reads from 2 bacterial genomes (with 20kb - 21kb read lengths and 10% error rate) and the tool failed to identify two bins. It produced only one bin with a few sequences and most of the remaining sequences were not binned. The tool used is MaxBin 2.2.4

ADD REPLY
0
Entering edit mode

And how different where the two genomes? No tool will successfully separate e.g. Escherichia coli O157:H7 Sakai and Escherichia coli O157:H7 EC4115..

ADD REPLY
0
Entering edit mode

I used Escherichia coli CFT073 and Staphylococcus aureus JP080. When we get short reads and bin the contigs, MaxBin produces 2 bins with good results.

Similarly, I tried MaxBin with long reads from the same 2 genomes but it gave only 1 bin.

ADD REPLY
0
Entering edit mode

Does maxbin use also depth of coverage? That could be the reason as you don't get that dimension with long reads..

ADD REPLY
1
Entering edit mode

In this approach, tetranucleotide frequencies and scaffold coverages are combined to organize metagenomic sequences into individual bins, which are predicted from initial identification of marker genes in assembled sequences.

..

Despite careful selection of initialization conditions, the EM algorithm sometimes may still group scaffolds from several composite genomes into one bin. To alleviate this problem, all bins are recursively checked for the median number of marker genes. If the median number of marker genes of any bin is at least 2, the bin will be treated as a dataset waiting to be binned, and the whole EM algorithm will be applied to split the bin.

In case MaxBin works at the protein level for the detection of those marker genes, I think your 10% simulated error rate will lead to a single bin..

ADD REPLY
0
Entering edit mode

Yes. I think this is the issue. I will find another software to do binning. Thank you very much for your insights and explanations. :)

ADD REPLY
0
Entering edit mode
5.1 years ago
vijinim ▴ 100

I found a tool named MEGAN-LR which can bin metagenomic long reads and contigs. Although it is based on taxonomical binning, I'm going to try it and see.

Thank you all for your insights and ideas. :)

ADD COMMENT
0
Entering edit mode

There seems to be no proper tool to do de novo (taxonomy independent) binning of long reads. All available methods are based on taxonomical binning.

ADD REPLY
0
Entering edit mode

Can you not use the information that you received from alignment by LAST for functional binning (assuming that is what you need apart from taxonomical binning)?

ADD REPLY
0
Entering edit mode

I'm not sure. But what I'm looking for is an alignment-free binning tool, and it seems there is no such tool for long reads at the moment.

ADD REPLY

Login before adding your answer.

Traffic: 2585 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6