Question: Binning Tools for Long Reads/Contigs
0
4 months ago
vijinim • 90

Majority of the currently available metagenomics binning tools are designed to work with short reads and contigs obtained from short reads.

Does someone know if there are any tools available to bin long reads or contigs obtained from long reads?

Thank you very much! :)

ADD COMMENTlink 4 months ago vijinim • 90
1

I think Kraken (and possibly centrifuge) can take long reads. Kraken I’m fairly sure can work on contigs too.

ADD REPLYlink 4 months ago
jrj.healey
12k
0

Thank you very much. I will try it and see. :)

ADD REPLYlink 4 months ago
vijinim
• 90
0

What is the difference between binning long contiguous sequences assembled from short reads and binning long contiguous sequences obtained from long reads?

ADD REPLYlink 4 months ago
5heikki
8.4k
0

I believe there is no difference apart from the effects of the error rates of short reads and long reads.

However, I tried to bin a simulated dataset of reads from 2 bacterial genomes (with 20kb - 21kb read lengths and 10% error rate) and the tool failed to identify two bins. It produced only one bin with a few sequences and most of the remaining sequences were not binned. The tool used is MaxBin 2.2.4

ADD REPLYlink 4 months ago
vijinim
• 90
0

And how different where the two genomes? No tool will successfully separate e.g. Escherichia coli O157:H7 Sakai and Escherichia coli O157:H7 EC4115..

ADD REPLYlink 4 months ago
5heikki
8.4k
0

I used Escherichia coli CFT073 and Staphylococcus aureus JP080. When we get short reads and bin the contigs, MaxBin produces 2 bins with good results.

Similarly, I tried MaxBin with long reads from the same 2 genomes but it gave only 1 bin.

ADD REPLYlink 4 months ago
vijinim
• 90
0

Does maxbin use also depth of coverage? That could be the reason as you don't get that dimension with long reads..

ADD REPLYlink 4 months ago
5heikki
8.4k
1

In this approach, tetranucleotide frequencies and scaffold coverages are combined to organize metagenomic sequences into individual bins, which are predicted from initial identification of marker genes in assembled sequences.

..

Despite careful selection of initialization conditions, the EM algorithm sometimes may still group scaffolds from several composite genomes into one bin. To alleviate this problem, all bins are recursively checked for the median number of marker genes. If the median number of marker genes of any bin is at least 2, the bin will be treated as a dataset waiting to be binned, and the whole EM algorithm will be applied to split the bin.

In case MaxBin works at the protein level for the detection of those marker genes, I think your 10% simulated error rate will lead to a single bin..

ADD REPLYlink 4 months ago
5heikki
8.4k
0

Yes. I think this is the issue. I will find another software to do binning. Thank you very much for your insights and explanations. :)

ADD REPLYlink 4 months ago
vijinim
• 90
0
4 months ago
vijinim • 90

I found a tool named MEGAN-LR which can bin metagenomic long reads and contigs. Although it is based on taxonomical binning, I'm going to try it and see.

Thank you all for your insights and ideas. :)

ADD COMMENTlink 4 months ago vijinim • 90
0

There seems to be no proper tool to do de novo (taxonomy independent) binning of long reads. All available methods are based on taxonomical binning.

ADD REPLYlink 9 weeks ago
vijinim
• 90
0

Can you not use the information that you received from alignment by LAST for functional binning (assuming that is what you need apart from taxonomical binning)?

ADD REPLYlink 7 weeks ago
drishti
• 0
0

I'm not sure. But what I'm looking for is an alignment-free binning tool, and it seems there is no such tool for long reads at the moment.

ADD REPLYlink 5 weeks ago
vijinim
• 90

Login before adding your answer.

Powered by the version 1.4