Question

Removing adaptors from smallRNA seq data

0

Entering edit mode

4.9 years ago

juan.crescente ▴ 110

Hello!

I know this have been asked in many ways before but I've been struggling a while now so it's time to ask.

I'm trying to use small RNA seq data from: https://bmcplantbiol.biomedcentral.com/articles/10.1186/1471-2229-14-142

These sequences are ~34nt length so they have some kind of adaptor with no doubt.

They use ‘vector strip’ in the EMBOSS package, but I cannot find the suitable vector file.

I've tried with trimmomatic but I still get the same read length

java -jar trimmomatic-0.38.jar SE -phred33 /home/juan/Desktop/juan/bio/mrcv/data/sun/SRR1195024.fastq.gz /home/juan/Desktop/juan/bio/mrcv/data/sun/SRR1195024.trimmed.fastq.gz ILLUMINACLIP:adapters/TruSeq-Small-RNA.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:18

I've tried with cutadapt but I still get the same read length

cutadapt -a TGGAATTCTCGGGTGCCAAGG -o SRR1195024.trimmed.fastq.gz SRR1195024.fastq.gz

I've tried with trim galore but I still get the same read length

trim_galore --small_rna SRR1195025.fastq.gz .fastq.gz -o SRR1195025.trimm_gal.fastq.gz

Total reads processed: 14,011,412 Reads with adapters:
8,639,554 (61.7%) Reads written (passing filters): 14,011,412 (100.0%)

Trim galore seems to be doing it's work (61% of sequences with adapter) but then I open fastqc and see that the sequences are not the expected lenght, they're all 34nt.-

enter image description here

I expect to sea a peak in 21 / 24 nt., but it is flat as earth. Any ideas what am I doing wrong?

smallRNA adaptor trimming • 3.0k views

ADD COMMENT • link 4.9 years ago by juan.crescente ▴ 110

0

Entering edit mode

Convert a subset of the data to fasta and see if you can align the reads on the 3'-end to identify an adapter sequence. Did you check the methods section to see if they describe a kit/method used.

$ reformat.sh in=SRR1195024.fastq.gz out=stdout.fa | grep -v "^>" | head -200

ADD REPLY • link 4.9 years ago by GenoMax 141k

0

Entering edit mode

Yes, I see nothing with reformat. They do not specify adapters

ADD REPLY • link 4.9 years ago by juan.crescente ▴ 110

1

Entering edit mode

The adapter would likely not be in the same exact location (if it is indeed on 3'-end) so you may or may not see it right away, without actually trying to align the sequences.

I will leave this for you to consider:

There are two papers linked which seem to have sequences etc in their supplementary materials. Have you looked at those?

ADD REPLY • link 4.9 years ago by GenoMax 141k

0

Entering edit mode

I'm checking this MAS with your feedback. https://mafft.cbrc.jp/alignment/server/spool/_ho.190611011723805E0SZhm924bXDkjdeAHfqVlsfnormal.html

what papers are those? in Electronic supplementary material?

ADD REPLY • link 4.9 years ago by juan.crescente ▴ 110

0

Entering edit mode

When running cutadapt are you confident that you're using the correct adapter sequence? Running these adapter trimming software with no cuts happening makes me think that you're using an incorrect sequence. Do they specify the sequence in the manuscript? Does fastqc specify an overrepresented sequence?

ADD REPLY • link 4.9 years ago by shawn.w.foley ★ 1.3k

0

Entering edit mode

I see tons of over represented sequences, but I do not get hits with adapters anywhere

ADD REPLY • link 4.9 years ago by juan.crescente ▴ 110

0

Entering edit mode

Do a multiple sequence alignment of the last ~15 nucleotides of some hundred of reads and you should be able to identify the sequence of your adapter

ADD REPLY • link 4.9 years ago by Martombo ★ 3.1k

0

Entering edit mode

Done, tried it, still getting that weird distribution of reads length where almost all are 34nt.

=== Summary ===

Total reads processed:              14,011,412
Reads with adapters:                 8,639,554 (61.7%)
Reads written (passing filters):    14,011,412 (100.0%)

Total basepairs processed:   490,399,420 bp
Quality-trimmed:              33,500,011 bp (6.8%)
Total written (filtered):    447,282,744 bp (91.2%)

=== Adapter 1 ===

Sequence: TGGAATTCTCGG; Type: regular 3'; Length: 12; Trimmed: 8639554 times.

ADD REPLY • link updated 4.9 years ago by GenoMax 141k • written 4.9 years ago by juan.crescente ▴ 110

0

Entering edit mode

well it does seem like 60% of reads were trimmed, right?

ADD REPLY • link 4.9 years ago by Martombo ★ 3.1k

0

Entering edit mode

yes! that part looks good. The problem now is that I still see a huge and only peak in 34nt. I'm expecting to see 21 and 24 peaks (and some more).

ADD REPLY • link 4.9 years ago by juan.crescente ▴ 110

0

Entering edit mode

Reads which actually have the adapters should be data you are interested in. That looks to be a healthy (relatively) % above. Separate those reads and then do fastqc on them.

ADD REPLY • link 4.9 years ago by GenoMax 141k

0

Entering edit mode

I should only keep thos 61.7% of reads and then quality trim them? Is there a way to keep only those with trimmomatic? what I'm seeing is that it keeps all the reads

ADD REPLY • link 4.9 years ago by juan.crescente ▴ 110