Entering edit mode
6.4 years ago
jerrywu1987
▴
10
My data is 75 bp single end. Libraries were prepared using the QuantSeq 3' FWD kit and sequenced using the Illumina NextSeq 500.
I am trying to remove Poly-A tails using Trimmomatic V0.32. Is there anyone know how could I make it. Thanks very much.
Thanks for your answer. I've never used bbduk.sh before. Since different reads have different number of "A"s, is the "
literal=AAAAAA
" option is specific for 6 "A"s or a general option for multiple "A"s?Same concern at Trimmomatic, does a poly-A sequence line define a specific number of "A"s or represent multiple "A"s?
Thanks.
Once bbduk finds a stretch of A's then everything to the right will be removed if you are using
ktrim=r
(trim to the right after the k-mer match) option.I checked my reads and there was no very short reads less than K, but why no kmers were loaded?
Thanks
Can you post a couple of example reads that contain the poly-A?
@NB551191:77:H33JGBGX5:1:11101:16098:1073 1:N:0:CTGCGT GATATTTGTTGTTTTGTAAGTGTATGTATATACTCGTACGTTGAAATTTGAATTCATATGCAAAAAAAAAAAGAAAAAAAAAAAAA
@NB551191:77:H33JGBGX5:1:11101:15600:1235 1:N:0:CAGCGT ATGTTATCGCGGCTACTGGCAAACCTTAAGTGATACGGTATTCTTCTTTTCGGCAAAAAAAAAAAAAAAAAAAAGATCGGAATAGC
@NB551191:77:H33JGBGX5:1:11101:14926:1307 1:N:0:CAGCGT TTGATGCTACTATGCTGTACTCAGGATTCCATGCTGCATTGCGATGCTAAATTAAAGAACCTCTGTTACCTTAAAAAAAAAAAAAA
Hopefully those are not that actual reads since they seem to be missing Q scores. As long as your reads are in the right fastq format following will work. Adjust the length of A's so you get them all.
Complete Fastq format
Will result in this (
a.fq
contains your sequence)Sorry for the confusion.
Since you point out in last message that they lacked q-scores, so I resent the complete fastq format.
Thanks for your code, and it worked!
But I want to know more about WHY this worked. Why you reduced the k to 7 and didn't specify the mink value? What is the problem of the code in
bbduk.sh
documentation?Thanks so much!
BBDuk documentation refers to scanning for regular Illumina adapters (which are diverse in sequence and are long). So for those a longer value of
k
is appropriate. In your case we are looking for a stretch ofA's
so I suggested a smaller value ofk
which allows min 7 A's and above to be found. You can find in-line help forbbduk.sh
useful. Just runbbduk.sh
without any options and it will be printed to screen. For most purposes default values of parameters (even if we don't change them they are in use) are fine.That helped. Thank you so much.