Question

Is this mate pair insert size distribution to broad?

0

Entering edit mode

8.6 years ago

mschmid ▴ 180

I got Illumina mate pair data. The insert size distribution is as follows:

http://imgur.com/bhKrEpo

What do you think about this distribution in general?
This library was done without targeting for a certain insert size length. What is the variation of the insert size if you enrich for a certain size? Do have any source or an example?
I would like to use this data together with Illumina PE. For example using spades. We want to assemble Plasmids from 90kb to 150kb. Do you think this library is suitable? Would you target a specific size? What techniques do you use?

spades illumina mate-pair • 3.8k views

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.6 years ago by mschmid ▴ 180

0

Entering edit mode

Perhaps the large inserts are actually not that large, but appear so due to linear representation of molecules that have circular topology? For example, Read 1 can be proximate to the 5'-end of a molecule (fasta file) and Read 2 to the 3'-end of a molecule. Then it appears that your insert size spans the whole molecule, when IRL the reads are actually proximate to each other when the molecule is presented in circular form. You could test this easily by extracting the large insert mates, and then mapping them to a fasta file where you have moved a few 10k bp from the 5'-end of the sequence to the 3'-end of the sequence..

ADD REPLY • link 8.6 years ago by 5heikki 11k

score 0 · Answer 1 · 2015-10-06

0

Entering edit mode

8.6 years ago

Carlo Yague 8.7k

EDIT : I wrote this answer for a paired-end library. OP's question concerned mate-pair. My bad.

1- The inserts are MUCH too long. Are you sure the mates are paired correctly ? I had a similar distribution once but it was because I was pairing my reads incorrectly.

2- I have this kind variation with illumina paired-end RNA-seq. :
bioanalyzer

It's best if you can compare experimental results (such as this bioanalyzer profile) with the computation of insert size from your reads.

3- I don't know, I'll let others answer this one :)

ADD COMMENT • link 8.6 years ago by Carlo Yague 8.7k

0

Entering edit mode

How do you mean much too long? You also have a peak at 10kbp. Your peak is just more narrow (where you targeting this length?). And the broader peak is the PE fraction of your mate pair library I guess?

ADD REPLY • link 8.6 years ago by mschmid ▴ 180

0

Entering edit mode

The peak at 35 and 10380 bp are the peaks of the markers, unrelevant here. The broader peak represents the sizes of the my cDNA library prior to sequencing (adaptors + insert). Since adaptors are ~120 bp, my inserts are mostly between 80 and 900 bp, which is reasonable in my case (paired-end RNA-seq). But perhaps you have a whole different kind of library.

ADD REPLY • link 8.6 years ago by Carlo Yague 8.7k

2

Entering edit mode

Mate pair libraries are ment to have much longer inserts than pe libraries..

ADD REPLY • link 8.6 years ago by 5heikki 11k

0

Entering edit mode

5heikki, thanks I understand that. My question was more if the distribution is what you would expect and if you would try to narrow the distribution for de novo assembly?

ADD REPLY • link 8.6 years ago by mschmid ▴ 180

0

Entering edit mode

Oh, sorry. I missinterpreted. But you seem to have Paired End. I have Illumina Mate Pair (http://www.illumina.com/documents/products/technotes/technote_nextera_matepair_data_processing.pdf). Right?

ADD REPLY • link 8.6 years ago by mschmid ▴ 180

0

Entering edit mode

Whoops, my bad !

ADD REPLY • link 8.6 years ago by Carlo Yague 8.7k

0

Entering edit mode

While I agree with what you say carlo, in my experience Bioanalyser plots from the library often look different to the insert size distribution of the sequenced reads, for all sorts of reasons.

ADD REPLY • link 8.6 years ago by John 13k