How to use MetaClusterTA for binning metagenomic / metatranscriptomic data?
0
0
Entering edit mode
8.0 years ago
crespialba ▴ 20

I am trying to use MetaClusterTA (http://i.cs.hku.hk/~alse/MetaCluster/download.html) for binning and annotating my metatranscriptomic dataset. To start with, I do it with a simulated dataset that I created with NeSSM and alignment performed by transabyss.

I followed the instructions in the MetaClusterTA README file, I downloaded the database and created the taxid file in the format as described.

However, when I try to run it fails. I tried several options, but none of them seems to be working.

Here is the output:

/software/MetaClusterTA/bin/MetaCluster_TA ~/simulated_dataset/simulation_1.fq ~/simulated_dataset/simulation_2.fq ~/transabyss-final.fa ~/taxids.csv --ReadLen 75 --Species 50 --MaxSpecies 1000

ReadLen:     75
CtgLenThresh:    500
AlignThresh:     76
MC3_Thresh: 0.94
before loading genomes.
0 out of 30269 genomes are loaded.
Finished counting occurences in strings. 
Finished mallocing vectors in nodes. 
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
alba      6834 98.4  3.1 17342688 16812812 pts/5 Sl+ 13:40   0:36 /software/MetaClusterTA/bin/MetaCluster
[...]

Finished mapping k-mers in strings. 
Finished sorting vectors in nodes. 
Finished turning capacity to number of unique id num. 
Genome DB Initialization is finished.
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
alba      6834  100  3.1 17342688 16812812 pts/5 Sl+ 13:40   0:37 /software/MetaClusterTA/bin/MetaCluster
[...]

initializing reads. 
Wed May 11 13:41:01 2016
initializing uset: 
Wed May 11 13:44:50 2016
NumNodes:   0
Size:   0
ReverSize:  512
sumlen: 0
MaxSpecies: 2
MinSpecies: 2
MaxSpecies is too large. We will start with half of the group number.
Size:0, classes:50
Segmentation fault (core dumped)

The aligned file looks like this:

>R184409 1327 23064 153518-,...,157386-
CCGGGCGACGGTTGTCCCGGTTTAAGCGTGCAGGTGGGTGGACCAGGCAAATCCGGTCTGCTGTAACACTGAGGCGTGATGACGAGGCACTACGGTGCTGAAGTGACAGATGCCCTGCTTCCAGGAAAAGCCTCTAAGCATCAGGTAACACGAAATCGTACCCCAAACCGACACAGGTGGTCAGGTAGAGAATACCAAGGCGCTTGAGAGAACTCGGGTGAAGGAACTAGGCAAAATGGTGCCGTAACTTCGGGAGAAGGCACGCTGGCGCGTAGGTGAAGGGACTTGCTCCCGGAGCTGAAGCCAGTCGAAGATACCAGCTGGCTGCAACTGTTTATTAAAAACACAGCACTGTGCAAACACGAAAGTGGACGTATACGGTGTGACGCCTGCCCGGTGCCGGAAGGTTAATTGATGGGGTTATCCGCAAGGAGAAGCTCTTGATCGAAGCCCCGGTAAACGGCGGCCGTAACTATAACGGTCCTAAGGTAGCGAAATTCCTTGTCGGGTAAGTTCCGACCTGCACGAATGGCGTAATGATGGCCAGGCTGTCTCCACCCGAGACTCAGTGAAATTGAACTCGCTGTGAAGATGCAGTGTACCCGCGGCAAGACGGAAAGACCCCGTGAACCTTTACTATAGCTTGACACTGAACATTGAGCCTTGATGTGTAGGATAGGTGGGAGGCTTTGAAGTGTGGACGCCAGTCTGCATGGAGCCAACCTTGAAATACCACCCTTTAATGTTTGATGTTCTAACGTGGACCCGTGATCCGGGTTGCGGACAGTGTCTGGTGGGTAGTTTGACTGGGGCGGTCTCCTCCCAAAGAGTAACGGAGGAGCACGAAGGTGGGCTAATCACGGTTGGACATCGTGAGGTTAGTGCAATGGCATAAGCCCGCTTGACTGCGAGAATGACAATTCGAGCAGGTGCGAAAGCAGGTCATAGTGATCCGGTGGTTCTGAATGGAAGGGCCATCGCTCAACGGATAAAAGGTACTCCGGGGATAACAGGCTGATACCGCCCAAGAGTTCATATCGACGGCGGTGTTTGGCACCTCGATGTCGGCTCATCACATCCTGGGGCTGAAGTAGGTCCCAAGGGTATGGCTGTTCGCCATTTAAAGTGGTACGCGAGCTGGGTTTAGAACGTCGTGAGACAGTTCGGTCCCTATCTGCCGTGGGCGCTGGAGAATTGAGGGGGGCTGCTCCTAGTACGAGAGGACCGGAGTGGACGCATCACTGGTGTTCGGGTTGTCATGCCAATGGCATTGCCCGGTAGCTACGTTCGGAACTGATAACCGCTGAAAGCATCTAAGCGGGAAGCC
>R184410 2220 66038 169714+,...,118906+
GGTAATGACTCCAACTTATTGATAGTGTTTTATGTTCAGATAATGCCCGATGACTTTGTCATGCAGCTCCACCGATTTTGAGAACGACAGCGACTTCCGTCCCAGCCGTGCCAGGTGCTGCCTCAGATTCAGGTTATGCCGCTCAATTCGCTGCGTATATCGCTTGCTGATTACGTGCAGCTTTCCCTTCAGGCGGGATTCATACAGCGGCCAGCCATCCGTCATCCATATCACCACGTCAAAGGGTGACAGCAGGCTCATAAGACGCCCCAGCGTCGCCATAGTGCGTTCACCGAATACGTGCGCAACAACCGTCTTCCGGAGACTGTCATACGCGTAAAACAGCCAGCGCTGGCGCGATTTAGCCCCGACATAGCCCCACTGTTCGTCCATTTCCGCGCAGACGATGACGTCACTGCCCGGCTGTATGCGCGAGGTTACCGACTGCGGCCTGAGTTTTTTAAGTGACGTAAAATCGTGTTGAGGCCAACGCCCATAATGCGTGCAGTTGCCCGGCATCCAACGCCATTCATGGCCATATCAATGATTTTCTGGTGCGTACCGGGTTGAGAAGCGGTGTAAGTGAACTGCAGTTGCCATGTTTTACGGCAGTGAGAGCAGAGATAGCGCTGATGTCCGGCAGTGCTTTTGCCGTTACGCACCACCCCGTCAGTAGCTGAACAGGAGGGACAGCTGATAGAAACAGAAGCCACTGGAGCACCTCAAAAACACCATCATACACTAAATCAGTAAGTTGGCAGCATCACCCACAAAATAGTCATGCATTGTGTGCAATAGAAACAGTTCAGATAAAGATAGGGATTAGACTGGCCCCCTGAATCTCCAGACAACCAGTATCACTTAAATAAGTGATAGTCTTAATACTAGTTTTTAGACTAGTCATTGGAGTACAGATGATTGATGTCTTAGGGCCGGAGAAACGCAGACGGCGTACCACACAGGAAAAGATCGCAATTGTTCAGCAGAGCTTTGAACCGGGGATGACGGTCTCCCTCGTTGCCCGGCAACATGGTGTAGCAGCCAGCCAGTTATTTCTCTGGCGTAAGCAATACCAGGAAGGAAGTCTTACTGCTGTCGCCGCCGGAGAACAGGTTGTTCCTGCCTCTGAACTTGCTGCCGCCATGAAGCAGATTAAAGAACTCCAGCGCCTGCTCGGCAAGAAAACGATGGAAAATGAACTCCTCAAAGAAGCCGTTGAATATGGACGGGCAAAAAAGTGGATAGCGCACGCGCCCTTATTGCCCGGGGATGGGGAGTAAGCTTAGTCAGCCGTTGTCTCCGGGTGTCGCGTGCGCAGTTGCACGTCATTCTCAGACGAACCGATGACTGGATGGATGGCCGCCGCAGTCGTCACACTGATGATACGGATGTGCTTCTCCGTATACACCATGTTATCGGAGAGCTGCCCACGTATGGTTATCGTCGGGTATGGGCGCTGCTTCGCAGACAGGCAGAACTTGATGGTATGCCTGCGATCAATGCCAAACGTGTTTACCGGATCATGCGCCAGAATGCGCTGTTGCTTGAGCGAAAACCTGCTGTACCGCCATCGAAACGGGCACATACAGGCAGAGTGGCCGTGAAAGAAAGCAATCAGCGATGGTGCTCTGACGGGTTCGAGTTCTGCTGTGATAACGGAGAGAGACTGCGTGTCACGTTCGCGCTGGACTGCTGTGATCGTGAGGCACTGCACTGGGCGGTGACTACCGGCGGCTTCAACAGTGAAACAGTACAGGACGTCATGCTGGGAGCGGTGGAACGCCGCTTCGGCAACGATCTTCCGTCGTCTCCAGTGGAGTGGCTGACGGATAATGGTTCATGCTACCGGGCTAATGAAACACGCCAGTTCGCCCGGATGTTGGGACTTGAACCGAAGAACACGGCGGTGCGGAGTCCGGAGAGTAACGGAATAGCAGAGAGCTTCGTGAAAACGATAAAGCGTGACTACATCAGTATCATGCCCAAACCAGACGGGTTAACGGCAGCAAAGAACCTTGCAGAGGCGTTCGAGCATTATAACGAATGGCATCCGCATAGTGCGCTGGGTTATCGCTCGCCACGGGAATATCTGCGGCAGCGGGCTTGTAATGGGTTAAGTGATAACAGATGTCTGGAAATATAGGGGCAAATCCACGGGGATACCAGTTCAACCGAAAACGCCAGAGGAGGGGATTACCCGCTGGCAGGGTAAATCTGTGG
[...]

The taxid.csv file looks like this:

superkingdom_id phylum_id       class_id        order_id        family_id       genus_id        species_id      path_to_genome
2       544448  31969   2085    2092    2129    134821  /home/alba/test/Ureaplasma_parvum_serovar_3_ATCC_700970_uid57711/NC_002162.fna
2157    28890   183963  2235    2236    2239    2242    /home/alba/test/Halobacterium_NRC_1_uid57769/NC_002607.fna
2       1224    1236    91347   543     590     28901   /home/alba/test/Salmonella_enterica_serovar_Typhimurium_LT2_uid57799/NC_003197.fna
2       1224    1236    118969  118968  776     777     /home/alba/test/Coxiella_burnetii_RSA_493_uid57631/NC_002971.fna
2157    28890   183939  2182    2183    2184    39152   /home/alba/test/Methanococcus_maripaludis_S2_uid58035/NC_005791.fna
2       1239    186801  186802  543349  2733    2734    /home/alba/test/Symbiobacterium_thermophilum_IAM_14863_uid58165/NC_006177.fna
2       1224    1236    135623  641     511678  668     /home/alba/test/Vibrio_fischeri_ES114_uid58163/NC_006840.fna
binning clustering metagenomics MetaCluster • 1.6k views
ADD COMMENT

Login before adding your answer.

Traffic: 2543 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6