Dear community!
I recently got a dataset from an in-house immortalized drosophila cell line which was sequenced with ONT Nanopore. Because of low DNA concentration from the line, it was amplified with phi29 before sequencing.
However, when analyzing the dataset, I observe some weird stuff which I think are technical artifacts. Basically, the majority of reads are tandem duplicated short repeats. However, these repeats seem to not be 100% artefact, since they do align to Drosophila melanogaster, by BLAST search. A minority of reads seem to represent "somewhat true sequencing", where a portion of the read aligns to a gene, but another portion shows this repeatedly occurrence of 100-300bp tandem repeat. The repeats across the reads are not all the same, there are some hundred of different variants of varying length.
As examples, I'll attach some pictures below of 2 different reads:
Fig 1
Read1: BLAST, Tandem repeats to Drosophila Melanogaster
Fig 2
Read1: Self-BLAST, The repetitive region is the same across the whole read
Fig 3
Read2: BLAST, Example of a read aligning to a gene, but also containing these repeats
Fig 4
Read2: Self-BLAST
My background is in computer and informatics, so I'm weak on the chemical/biological part of things. I wonder if what I observe could be due to the amplification, e.g. short DNA fragments being amplified into a long fragment with tandem copies of itself? Has anyone observed anything like this before? Does anyone have any input as to what may be occurring here?
Thanks in advance! :)