Hello
I'm a beginner in bioinformatic and I have to improve pretty bad single cells data (~10/20% of coverage against the genome reference of the species in question ..)
To improve these datas, i follow this workflow: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4749706/, which consists of using metagenomics contigs (from the same environment) that are more than 95% similar with single cells contigs in order to improve the single cells contigs.
The problem is that the data remains very bad,even when I play with the parameters
That's why I'm asking for your help, have people ever worked on raw data of bad qualities? How to improve them as best as possible?
Thank you
I don't fully understand the properties of the data set. Is this RNA-seq? If yes, which platform? What types of cells? What exactly do you mean with "10-20% coverage"?
It's a DNA data-set, cells of interests have been isolated,and after a whole genome amplification the amplified DNA obtained was sequenced in order to obtain SAGs (Single amplified genomes)
To evaluate the quality of these genomes, I used the software checkm which provides robust estimates of genome completeness and contamination by using collocated sets of genes that are ubiquitous and single-copy within a phylogenetic lineage.
And it's with this software that I find a completeness of 20%, 10% and even less, on my 10 SAGs.
what DNA did you sequence? mammalian? bacterial?
Bacterial genomes (is that why I chose to conserve metagenome contigs that align over 95% of my SAGs contigs, in order to work on the same lineage)
That is about what people get from
drop-seq
which is a technique for single cell expression data. Are you working with plain DNA sequence?