Biostar Beta. Not for public use.
Fastqc To Check The Quality Of High Throughput Sequence
3
Entering edit mode
13 months ago
Varun Gupta ♦ 1.1k
United States

Hi I saw the video of fastqc under videos section on biostar. I have a question.

Why is it that often i find in my first 12-13 bases per base sequence content and per base gc content are quite wavy even though per base sequence quality** is very good. What can be done to fix them.

Have a look at the images

http://www.freeimagehosting.net/ffniw

http://www.freeimagehosting.net/96lzh

Regards

ADD COMMENTlink
1
Entering edit mode

can you post a plot or the numerical values?

ADD REPLYlink
0
Entering edit mode

(+1) definitely helps to see the fastQC plot.

ADD REPLYlink
0
Entering edit mode

I added the plots. Have a look

ADD REPLYlink
0
Entering edit mode

i added the plot have a look

ADD REPLYlink
0
Entering edit mode

Can you also tell if this is RNA-Seq data?

ADD REPLYlink
0
Entering edit mode

The data is RNA-Seq

ADD REPLYlink
5
Entering edit mode
2.4 years ago
Ryan Dale 4.8k
Bethesda, MD

With RNA-seq, this can happen due to biases in random hexamer priming during the RT step (explaining the first 6 bases) possibly combined with sequence specificity of the polymerase itself and/or artifacts from end repair (possibly explaining out to 13 bases).

Check out Hansen et al. (2010). Biases in Illumina transcriptome sequencing caused by random hexamer priming. NAR 38(12):e31 for more info as well as some ideas on how to correct for it.

ADD COMMENTlink
0
Entering edit mode

I read the publication and it is not clear to me if then it would be better to remove those 13 bp at 5'

What's the best practice?

ADD REPLYlink
0
Entering edit mode

I think the assumption is that for standard differential expression, any sequence bias in a gene is the same between samples so it's not a problem. However it is a problem for estimating expression in a single sample (i.e. FPKM), since transcripts compared to each other may have different biases.

Luckily, Cufflinks includes bias correction for this (e.g., http://genomebiology.com/2011/12/3/r22/)

ADD REPLYlink
2
Entering edit mode
16 months ago
Philadelphia

Hey Varun,

Have you checked whether those first few bases don't belong to any adaptor/barcode sequence ? Normally those sequences if left untrimmed may result into what you have mentioned above. I may be completely wrong but try to go through the FastQC report and if those sequences show up in Over-represented sequences section then you need to trim them off.

ADD COMMENTlink
1
Entering edit mode
3 months ago
University Park, USA

The origin of the sample also matters. If the sample preparation isolates certain parts of a genome, for example a CHip-Seq experiment we could expect that to be reflected in the sequence content of the reads.

ADD COMMENTlink
1
Entering edit mode
3.1 years ago
T • 40
Germany

If you have Illumina sequencing, this is a bias of random primers used by the technology and therefore expected.

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1