Comparing the size of assemblies, contiguity and busco stats of multiple isolates genome
1
0
Entering edit mode
5.2 years ago
nagarsaggi ▴ 40

I have the spades assembly of 109 samples of a plant pathogenic fungi. I have done BUSCO analysis for all the isolates. I want to compare the size of the assembly and contiguity with the size of the input data. How do I calculate and extract the assembly stats of each isolate in a tabular form? I also want to compare the size of the assemblies with the BUSCO stats (complete, partial and duplicate busco), so how do I extract the busco stats from the "short summary file" to a table for each isolate?

Assembly • 1.3k views
ADD COMMENT
0
Entering edit mode
5.2 years ago
jean.elbers ★ 1.7k

You could play around with bash scripting and BBTools/BBMap's (https://sourceforge.net/projects/bbmap/) bbstats.sh or statswrapper.sh scripts for assembly statistics (note that these scripts flip N50 and L50 values from their definitions and likewise N90 and L90). In terms of the BUSCO stats, that is more of a text manipulation job using GNU core utilities or Perl, sed, awk, etc. If you post an example of the output and the desired result, perhaps someone can help you write a quick script to get the desired result.

ADD COMMENT

Login before adding your answer.

Traffic: 1526 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6