Question

controls for amplicon data - Which one to use and how?

0

Entering edit mode

9.0 years ago

Aurelie MLB ▴ 360

Hello,

I have some DNA-seq data for amplicons. I observed very weird variants with a proportion of roughly 30% that I do not see how to explain. They are in regions of quite bad sequencing quality so I would not trust them in theory. But those same exact variants seems to occur in all clones deriving from the same parental clone... I still think that it must be a bias but I am worried.

I was advised to use a control samples. The names that popped-up were "NA12878" or "Promega reference". But I was not very lucky googling that. I did found a PhiX control that seems to be what I need but I am not 100% sure how that would work. I have found that PhiX can be used to add diversity to help the sequencer. I am not sure why this is useful either.

Could someone help me get some clarity on this please?:

What control sample is the best/are available to use with Amplicon data?
Should I have a control in every lanes to double check that the bias do not come up from a technical bias in a lane?
And then what should I do with it? My first idea is to find all the variants in the control and declare that if I found those variants in my amplicon reads at the same position (within the reads), then they are not trustable and are due to technical bias ? Or is there smarter things to do?

Thanks a lot for any help!!!

amplicon DNA-seq control • 1.9k views

ADD COMMENT • link updated 22 months ago by Ram 43k • written 9.0 years ago by Aurelie MLB ▴ 360

Ram · Answer 1 · 2015-04-16

Spike in of Phi-X is a standard thing for Illumina sequencers (they even provide it as a product). 5% is a safe level. Add it in all your low diversity sequencing runs. There are a number of reasons. First reason comes back to the way clusters are identified, when the whole flow cell lane has the same sequence, it can cause some problems for the base-caller because the intensities will be very unbalanced. Moreover, many nearby or overlapping clusters with the same sequence can also confuse the base-caller because it doesn't know whether its one single cluster or a number of smaller ones. Second reason comes down to being able to determine how accurate the sequencing run was. If, after aligning Phi-X, you have a high error rate, then you know that there was an issue with the run. You can also have a look at the base quality of those mismatching bases. Generally for all amplicon work you should do some sort of quality trimming of the 3' end of the read and potentially also masking poor bases that appear internally (ie. bubble). You should be aiming for an error rate of 1/1000 generally, that is a phred quality score of 30.