In my poly A capture RNA sequencing fastq output, I noticed that about 20% of the reads contain poly A in the middle (or even close to the front). I would like to understand more on this, because with DNA fragments mostly >300bps and read length 100 bps, we were not expecting to see poly A show up that frequently in reads. I would appreciate your thinking.
To my understanding, even it is a poly A capture sequencing, during the library preparation, the mRNA tail fragments (with polyA) are the only fragments will be selected. In this selection, adapter contains ~15bps poly T can bind anywhere on the polyA tail, which can be >200bps long. My theory is that, if it binds towards the 3' end of polyA, then that literally allows major part of the polyA tail gets amplified in PCR, and can potentially pass size selection, but the coding region in this case can be short at 5'.
E.g.
mRNA-coding-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA(...)AAAAAAAAAAAAAAAAAAAAAAA
and
TTTTTTTTTTTTTTTTT-AdapterSeq
Say the Ts-Adapter start binding at the highlighted A, then the last 2-3 A will be gone after first round of pcr, but the rest of As will remain till sequencing.
Any advise is appreciated. Thank you.
You should ask questions like this on SeqAnswers.com which is more geared towards experimental aspects of NGS.
okay will do. Thank you for all the comments you provided to my questions recently.
You are welcome.