Question

FastQC and adapters

0

Entering edit mode

6.9 years ago

ognjen011 ▴ 250

I have several questions about adapters that I haven't been able to answer, and some of them arise in relation to the QC. When looking at the base per position in read plot, the first n bases are non-uniform, and then they approach the uniform distributions at the other end. Why does this happen at all? Why does it happen only on one end?

My first assumption was those are adapters, but they should be at the end of the read, right? And also, there is very little soft-clipping by the aligner (over 99.6% of reads are all matches/mismatches), so there can't be any adapters there, right?

I also thought they might be biased sequences for breaking in fragmentation, but I've found no support for it, and the effect seems to pronounced to be explained by this.

Thanks in advance!

sequencing sequence quality adapters • 2.9k views

ADD COMMENT • link updated 6.9 years ago by mastal511 ★ 2.1k • written 6.9 years ago by ognjen011 ▴ 250

1

Entering edit mode

Regarding the non-uniform base distribution of FastQC, it has been stated in many threads here that this is normal and that the classification as a failure is misleading. I think it has confused many beginners including myself. The reason is that FastQC's evaluation is not based on empirical data and combines this with a suggestive "traffic light system" (fail, warning, pass), which is in my opinion not a good thing to do. I wish there was a new QC tool that would incorporate all the experiences we have gained since then.

ADD REPLY • link 6.9 years ago by Michael 54k

0

Entering edit mode

Thank you. Sorry for missing those and asking again, googling and searching here didn't help me initially!

ADD REPLY • link 6.9 years ago by ognjen011 ▴ 250

1

Entering edit mode

I think the question is if it is based on random priming. If so, that could explain it. If your reads align at so high rates, I would say, forget about the bias, everything is fine.

ADD REPLY • link 6.9 years ago by Michael 54k

0

Entering edit mode

Thank you. Unfortunately, I am having trouble confirming what random priming really is. Is this an example of the method, introduction of random primers for PCR? http://www.biotechniques.com/multimedia/archive/00009/03354st06_9908a.pdf

ADD REPLY • link 6.9 years ago by ognjen011 ▴ 250

1

Entering edit mode

In my -incomplete- understanding of lab processes, random priming is PCR using all possible primer sequences in a mix. Imagine you generate all 4^6 haxamers as oligo primers and use them in a single PCR reaction, any DNA sequence would be amplified.

ADD REPLY • link 6.9 years ago by Michael 54k

score 0 · Answer 1 · 2017-05-14

0

Entering edit mode

6.9 years ago

mastal511 ★ 2.1k

If you are referring to the FastQC per base sequence content plots, it depends on what type of data you have. For RNA-Seq data the pattern you see for the first n bases is thought to be due to the random-priming not being quite so random. I think also for Nextera data, the transposase may be biased towards certain sequences over others.

ADD COMMENT • link 6.9 years ago by mastal511 ★ 2.1k

0

Entering edit mode

Thanks for the reply. This is actually DNA-Seq data. Could it be explained in a similar manner?

ADD REPLY • link 6.9 years ago by ognjen011 ▴ 250