How to treat the unpaired data after trimmomatic
2
0
Entering edit mode
8.0 years ago
liu.huand • 0

Hi, I'm totally new here and totally new to bioinformatic (I think I technically started learning this last week)

Story goes like this, I got my atac-seq fastq data last Monday, and I started to turn these fastq files into peaks. I learnt how to trim, align and visualize my data and a little bit QC afterward.

I chose Trimmomatic in the galaxy of my university (Illuminaclip Nextera pair end adapter) to trim my fastq file, and got 4 files, 2 paireds, and 2 unpaireds. I only aligned my paired fastq files with Bowtie2 -X 2000, and got mapped rate as 90%. I converted the BAM files (default output of the bowtie2 in our galaxy) into bedgraph then tdf in IGV for visualization. I plotted the distribution of the reads surrounding the TSS of my annotated genome and got highly enrichment near TSS.

OK, weird thing happened. I plotted the insert distribution using picard tool, and got this plot:

enter image description here

It appeared that I lost all the inserts smaller than 120 bp which is actually the nucleosome-free-regions that I need most.

Then I guessed I must have some data that were not mapped, so I went back to my fastq file, and found those unpaired data generated from Trimmomatic are huge. For example, each of the paired file is 8 GB, and one unpaired R1 file is 4.5GB, the other is 10mb.

I wonder whether my nucleosome-free-regions just lied in these unpaired data, and how I can combine this unpaired data with my paired data generated from Trimmomatic?

Thanks,
Huan

alignment sequence • 6.8k views
ADD COMMENT
2
Entering edit mode

Map unpaired reads as single-end data. You should have a look at what sequences get trimmed off – low quality, 3'-end adapter or something else? To investigate the issue, an alternative is to map untrimmed reads with a local mapper such as bowtie2 --local or bwa-mem. These mappers won't map adapter sequences. Sometimes this may be easier when you are not sure what trimmomatic is doing to your data.

ADD REPLY
0
Entering edit mode

Sorry, not sure how to upload the figure. You may find figure through this "http://postimg.org/image/m1fmud4u9/"

ADD REPLY
0
Entering edit mode

Well congratulations on your first week, looks like you already learned a lot. Good luck!

ADD REPLY
2
Entering edit mode
8.0 years ago

The insert size refers to the size of the DNA fragment that is sequenced. Instruments can only sequence fragments over a certain size.

ADD COMMENT
0
Entering edit mode

Hi, Istvan, Thanks for the reply. So, you mean, it's possible that my instrument only sequenced the inserts with size over 120bp, and discard fragment less than that? I was using Hiseq2500 and 125bp PE, and I will figure out the minimum size they sequenced.. Thanks!

ADD REPLY
0
Entering edit mode

I can't help but suspect that you are misusing the insert size concept and you are interpreting it it as something else that you are interested in measuring. I would suggest to post a new question in which you disentangle your question from trimmomatic and adapter clipping etc. These only confuse the issue and are not related to what you need.

ADD REPLY
0
Entering edit mode

Hi, Istvan, I solved this. I did misunderstood fragment size and insert size for paired end in the very beginning. But this figure I posted is truly the distribution of insert size. I spent a whole day trying different trimmomatic parameter to trim my fastq file and bowtie2 mapping. I found if make <keepbothread> true, which will not dump the reverse sequence, I got tons of insert size smaller than 100 which is what I need for my experiment.

ADD REPLY
0
Entering edit mode

The smallest fragment is a primer dimer (no insert). We know those cluster (and get sequenced) well.

ADD REPLY
0
Entering edit mode
8.0 years ago
Mike ★ 1.9k

Hi , Insert size and fragment size are always confusing, It would be great help if anybody explain these terms. Thanks,

ADD COMMENT
0
Entering edit mode

This is a good discussion about it: Fragment Size: TLEN vs. isize

I think the take-home message is "they're the same unless they definitely aren't" :)

ADD REPLY

Login before adding your answer.

Traffic: 2476 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6