Small RNA sequencing using Illumina 2 channel SBS: how to deal with Gs?
2
1
Entering edit mode
6 months ago

I'm working on a small RNA sequencing experiment (150 PE on NovaSeq 6000), and many reads look like this when the fragment size is smaller than 150 bp, with Gs completing the sequence up to 150:

@A00312:445:H3K2MDSX7:4:2322:15519:9455 1:N:0:GCCAAT CCTGGGGATAAACTGTAGGCACCATCAATACCCAACGTTCAAGATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCGCGTATGCCGTCTTCTGCTTGAAAAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG + FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFF:F,FFFFFFFFFFFFFF,FFFFFF:F,,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

Although it is a feature of small RNA sequencing using this method (see attached picture), I surprisingly can't find any option in classic softwares (trimmomatic, cutadapt etc) to automatically trim these Gs. Anyone has a lead, or alternatively a creative code to do that? I'm working with fastq files, so a suboptimal simple-ish method could be to turn them into simple fasta and trim all the repeated Gs but I feel like there must be a program out there already doing that with fastq files. I'm also not entirely sure on how to code that in an accurate way, e.g. after how many Gs is it appropriate to trim, etc.

Thanks Hive mind!

enter image description here

SBS small RNA • 565 views
ADD COMMENT
1
Entering edit mode
6 months ago

BBDuk has a "trimpolyg" flag. Or better yet, a "trimpolygright" flag which is what you probably want. However, after adapter-trimming, the poly-G should disappear anyway since it comes after the adapter sequence. So I'd do something like:

bbduk.sh in=r1.fq in2=r2.fq out1=trimmed1.fq out2=trimmed2.fq ref=adapters ktrim=r k=21 hdist=1 mink=9 tbo tpe

...and you can add "trimpolygright=6" to that if you want to, but then you would end up with reads where the adapter was not recognized having their poly-G tails removed, resulting in a small RNA plus an adapter containing sequencing errors, and that's not usually very helpful.... but, the option is there.

If the included adapter sequences (referenced by setting "ref=adapters") don't match, which is unlikely but possible, you can run:

bbmerge.sh in=r1.fq in2=r2.fq outa=found_adapters.fa

...to get the actual adapter sequences in your library.

ADD COMMENT
1
Entering edit mode
6 months ago
size_t ▴ 120

try this tool: fastp with option --trim_poly_g

force polyG tail trimming, by default trimming is automatically enabled for Illumina NextSeq/NovaSeq data

ADD COMMENT

Login before adding your answer.

Traffic: 1719 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6