I got Illumina mate pair data. The insert size distribution is as follows:
- What do you think about this distribution in general?
- This library was done without targeting for a certain insert size length. What is the variation of the insert size if you enrich for a certain size? Do have any source or an example?
- I would like to use this data together with Illumina PE. For example using spades. We want to assemble Plasmids from 90kb to 150kb. Do you think this library is suitable? Would you target a specific size? What techniques do you use?
Perhaps the large inserts are actually not that large, but appear so due to linear representation of molecules that have circular topology? For example, Read 1 can be proximate to the 5'-end of a molecule (fasta file) and Read 2 to the 3'-end of a molecule. Then it appears that your insert size spans the whole molecule, when IRL the reads are actually proximate to each other when the molecule is presented in circular form. You could test this easily by extracting the large insert mates, and then mapping them to a fasta file where you have moved a few 10k bp from the 5'-end of the sequence to the 3'-end of the sequence..