Paired-end RNA Seq data: How to deal with unpaired data after trimmomatic
3
0
Entering edit mode
7.7 years ago

Hi everybody,

what is the best practice to deal with the unpaired data generated by trimming paired-end RNA-Seq data, when only one of the mates makes it through the trimming?

I have seen people recommend to only use the paired data remaining (and ignore the often small unpaired files), but I am afraid to lose crucial data. I could easily process the paired and two unpaired sets per sample separatly

My analysis pipeline is

fastqc - trimmomatic - fastqc - STAR - featureCounts - voom/limma

If trying to use all data, at what point would you recommend to put everything together (and how)?

Many thanks!

RNA-Seq rna-seq trimming • 5.0k views
ADD COMMENT
0
Entering edit mode

Hi guys,

thanks for the quick replies. The unpaired reverse reads are next to nothing (0.2% or something), the forward unpaired usually more like 2 - 5%. Does this sound normal to you?

ADD REPLY
0
Entering edit mode

There is no "normal". Ideally you should not have any. But this is biology and you live with what you have :-)

ADD REPLY
0
Entering edit mode

If you use BBDuk for trimming paired reads, you will not end up with any singletons, which can make the processing easier. Reads will either be retained as pairs or discarded as pairs. In situations where one read is trimmed down to nothing, the pair is discarded if a minimum length restriction is used. If no limitation is set, the read will be trimmed down to a minimum length of 1bp, so it will still be present and the fastq file will be valid and correctly paired, but it will typically be ignored downstream and only its mate will be used (since 1bp reads don't map).

ADD REPLY
1
Entering edit mode
7.7 years ago
kissaj ▴ 110

Chuck it, it is broken. It shouldn't be very much (%-wise). If it is, you have a problem.

ADD COMMENT
0
Entering edit mode
7.7 years ago
Tao ▴ 530

If you want to keep them, you might want to put unmapped reads into a separate singleton file and tophat allows singleton input besides pair-end input. Remember you should always keep paired reads in the same order in paired files after QC, because most aligner including tophat recognize the reads pair by their order in files, not by reads ID.

ADD COMMENT
0
Entering edit mode
7.7 years ago
igor 13k

STAR already performs soft-clipping, so you shouldn't need to trim the reads.

ADD COMMENT
0
Entering edit mode

I have primarily decided to use trimmomatic because of an adapter contamination in the raw data after demuxing.

For what it is worth, I decided to go all the way and use the program to trim bad bases, too, bascially using the options from the manual.

ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

Does this seem appropriate to you, or would you rather suggest to limit this to the adapter removal and use STAR to soft clip?

ADD REPLY

Login before adding your answer.

Traffic: 2691 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6