Question

Tophat - poor mapping in control vs treament

0

Entering edit mode

8.7 years ago

pveraw • 0

Hi everyone!

I'm analysing results from a RNA-seq experiment. My organism is an annotated yeast that has some effect on vitis. I have around 15 millions of reads per sample obtained with illumina pair-end, 90 bp length.

I have 2 strains, c13 and c24 which were cultivated in a control condition and one treatment, the experiment was performed with three replicates.

In summary I have:

C13_control (R1,R2,R3), C13_treat (R1,R2,R3)

C24_control (R1,R2,R3), C24_treat (R1,R2,R3)

I want to check the effect of my treatment over the yeast with Tuxedo protocol. At this point I have done Tophat alignment with my 12 samples. with my fasta genome indexed as DA and the following pipeline:

tophat -G annotation.gff3 -o out_folder DA c*_control_R*.fastq c*_treatment_R*.fastq

Checking my results I get something curious. Reads from my c13_control sample aligned only in a 4%, while the same strain, but on treatment map a 60% of the reads. In the other hand, read from my c24 strain map in both cases a 60%.

If I ignore this result and I continue the protocol, when I observe my differential expression in c13 the fold change is huge (even 30 times) and almost all the genes are significant, because obviously is comparing high expression in the treatment over my poor mapping in control, so I have the feeling that my control is not being a control. c24 instead, show a fewer number of genes with fold changes lower that 8 times.

I'm trying to find an explanation for this behaviour or find another approach to raise this problem, so I would appreciate any comment!

Thanks guys!

rna-seq • 2.6k views

ADD COMMENT • link updated 18 months ago by Ram 43k • written 8.7 years ago by pveraw • 0

0

Entering edit mode

I've updated your summary slightly, since I think you had the C13 samples listed twice originally.

Have you trimmed adapaters?

ADD REPLY • link 8.7 years ago by Devon Ryan 104k

0

Entering edit mode

Can you do wildcarding like that with tophat? What do your `align_summary.txt` `tophat.log` and `run.log` files look like?

ADD REPLY • link 8.7 years ago by mbio.kyle ▴ 380

0

Entering edit mode

Thank you for replying:

Yes Devon, I trimmed adapters and QC.

Sorry if I confused you mbio.kyle, I just used * to show that I did the same pipeline for every sample. For example for my control/treatment samples c13 replicate1 I did:

tophat -G annotation.gff3 -o out_folder DA c13_control_R1.1.fastq c13_control_R1.2.fastq
tophat -G annotation.gff3 -o out_folder DA c13_treatment_R1.1.fastq c13_treatment_R1.2.fastq

So far my tophat and run log files does not display any errors or warnings, my align summary are for example:

C13_control_R1:

Left reads:
          Input     :  13349110
           Mapped   :    468836 ( 3.5% of input)
            of these:       664 ( 0.1%) have multiple alignments (2 have >20)
Right reads:
          Input     :  13349110
           Mapped   :    468082 ( 3.5% of input)
            of these:       587 ( 0.1%) have multiple alignments (2 have >20)
 3.5% overall read mapping rate.

Aligned pairs:    165921
     of these:       193 ( 0.1%) have multiple alignments
                     141 ( 0.1%) are discordant alignments
 1.2% concordant pair alignment rate.

C13_treatment_R1:

Left reads:
          Input     :  14380800
           Mapped   :   8700067 (60.5% of input)
            of these:     15961 ( 0.2%) have multiple alignments (318 have >20)
Right reads:
          Input     :  14380800
           Mapped   :   8684246 (60.4% of input)
            of these:     15953 ( 0.2%) have multiple alignments (318 have >20)
60.4% overall read mapping rate.

Aligned pairs:   6724170
     of these:     12113 ( 0.2%) have multiple alignments
                   12362 ( 0.2%) are discordant alignments
46.7% concordant pair alignment rate.

Thanks!

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.7 years ago by pveraw • 0

2

Entering edit mode

You should blast a few of the unmapped reads and see if you received the wrong samples.

ADD REPLY • link 8.7 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks Devon, I patiently did a BLAST against nr database with the 12 million reads that didn't map and I only got ~95 reads mappig a bacteria, which seems to be contation, but I still guessing what is the other huge percentage lost.

ADD REPLY • link 8.6 years ago by pveraw • 0

2

Entering edit mode

Maybe throw them at https://onecodex.com/ and see if that reveals any more contamination

ADD REPLY • link 8.6 years ago by mbio.kyle ▴ 380

0

Entering edit mode

:D, you just save me!

ADD REPLY • link 8.6 years ago by pveraw • 0

0

Entering edit mode

Have you run FastQC on your raw data? Do the read qualities look OK?

ADD REPLY • link 8.6 years ago by Chris Cole ▴ 800

0

Entering edit mode

Hey! Yes, I did, actually the person that did this sequencing removed adapters and the quality is very good. But thank you!

ADD REPLY • link 8.6 years ago by pveraw • 0