We have generated sequencing data with the direct-cDNA sequencing kit from ONT. During the library prep, prior to adding the adapters on each cDNA, we performed a ligation step to concatenate cDNAs together.
The aim was to see if we could increase the total number of cDNAs sequenced whithout actually increasing the number of molecules sequenced (increase in total number of bases sequenced and in mean read length).
However, I'm having trouble to actually identify sequences that would come from such reads and I'm not sure of what would be the most efficient way to find such reads.
I tried to map my reads onto the genome and then look at the alignments. Correct me if I'm wrong, but I assumed I would see an increase in the number of chimeric reads versus the primary reads ? However I see no such thing and - even worse - I came to realize I always have 13-15% of chimeric reads in dataset generated by direct-cDNA Seq.
Could there be that some of those chimeric reads are not relevant ? If yes, how one would filter those while still retaining reads that are actually coming from 'real' cDNAs ? Nanopore reads being quite noisy I believe it makes the analyze even more complicated.
Otherwise, would there be a way to split my reads before alignment or identify reads that result from concatenated cDNAs ?
I thank you for your cooperation.