Determining reason for low quality assembly
0
0
Entering edit mode
4.1 years ago
liorglic ★ 1.4k

Hello,
I am running SPAdes v3.14.0 to assemble multiple varieties (individuals) of the same plant species (A. thaliana - estimated genome size 135Mb). The data (fastq) originate from public DBs (SRA/ENA), with slightly different data sets for each variety. For most varieties, results are more than satisfying. However, some varieties end up with poor results in terms of N50 and BUSCO scores. For example, I have two data sets which are pretty similar in terms of data coverage and read length - both have a single PE library, ~x40 coverage and read length of 51. I apply the same preprocessing steps to both data sets (trimmomatic for quality trimming and FLASH for PE read merging) and then run SPAdes with the same parameters. However, with one data set I get N50 of 13.5k and 95% complete BUSCOs, while the other one has N50 3.7k and 80% complete BUSCOs. I ran FastQC on the raw data, but everything seems to be fine and high quality for both data sets. Also no significant difference in the fraction of reads filtered/merged during preprocessing. I used GenomeScope to analyze the k-mer content of raw reads and look for contamination or heterozygosity, but I don't see anything suspicious.

Since these are genomes of the same species, I don't believe there is something in the genomic sequence of the specific variety that should make it much harder to assemble. Therefore, this probably has something to do with the data, but how can I tell exactly what went wrong? Any ideas for others things to check either regarding the raw data or the assembly process?

I've tried asking the SPAdes developers, but didn't get a satisfactory answer. Also, this might be a problem not specific to SPAdes.
I would appreciate any suggestions or ideas. Thanks!

Assembly spades • 774 views
ADD COMMENT

Login before adding your answer.

Traffic: 2506 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6