Quality control issues for mRNA sequencing fastq files based on FastQC, based on Per Base Sequence Content
1
0
Entering edit mode
5.8 years ago
svlachavas ▴ 790

Dear Community,

i would like to ask some comments and suggestions concerning the interpretation of some initial quality control results of fastq files. In detail, in a current RNA-Seq project for gene expression data, based on 2 different cancer cell lines, mRNA sequencing was performed for 12 samples (HiSeq4000, 2 * 75 bases) and very briefly:

a) purification of PolyA containing mRNA molecules using poly-T oligo attached magnetic beads from 1µg total RNA

b) a fragmentation using divalent cations under elevated temperature to obtain approximately 300bp pieces

c) double strand cDNA synthesis

d) Illumina adapters ligation and cDNA library amplification by PCR for sequencing.

So, based on some initial exploratory quality control results based on FastQC, the plethora of samples have failed/got a warning in the section of the "Per Base Sequence Content". This is evident in the relative figures i have also attached, for one sample (which is similar in the others), in which no other quality issues were evident, as also the RIN numbers were fine:

fastq.overall example1 example2 data.quality.extra

Thus, from a small interpretation of the plots, one could argue that these failures/warnings for each sample, are indicative of putatively overepresented sequences, which might have contaminated the library construction ?

Moreover, is there a way to adjust for this, or still downstream analysis is possible ?

Any suggestions or ideas would be grateful !!

fastqc mRNAsequencing RNA-Seq sequence • 1.4k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Dear WouterDeCoster, still not ok ?

ADD REPLY
0
Entering edit mode

I don't think a dropbox link will work. Try hosting the image as described in the linked post.

ADD REPLY
0
Entering edit mode

Moreover, is there a way to adjust for this, or still downstream analysis is possible ?

What are the downstream analyses you intend to perform?

ADD REPLY
2
Entering edit mode
5.8 years ago

Random priming generates a "non-random" sequence composition at the start of reads, see also this blog post of QCfail: Positional sequence bias in random primed libraries

ADD COMMENT
0
Entering edit mode

Thank you very much for your useful link-so, you think that downstream analysis is still feasible ? or you think it would be biased ?

ADD REPLY
1
Entering edit mode

Since it seems you didn't spend time reading the post carefully (I agree it's easier to just ask me again about it) I'll give you a quote:

Whilst the warnings generated by this problem reflect a real issue it’s not something which can be fixed, and doesn’t seem to have any serious consequences for downstream analysis.

ADD REPLY
0
Entering edit mode

Dear WouterDeCoster,

of course i have read your very useful post, and also found this specific part-my mistake here, as i did not mention specifically my target goal of downstream analysis, in order for your answer to be more helpful:

our goal, is to essentially test for the over- or under-representation of a small RNA-motif based on the groups of samples in specific target genes, which motifs have been created from a previous computational pipeline, and have been initially tested with in vitro assays-

that is why my extra question, as it is not directly intended for DE analysis

ADD REPLY

Login before adding your answer.

Traffic: 2701 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6