Question

How to identify the SiRNA abundance from transcriptome data

0

Entering edit mode

6.6 years ago

akilabioinfo ▴ 10

Hi all,

I am having transcriptome data of c.elegans i would like to analyse the differential expression of siRNA between wild and mutant type. Is it possible to identify the siRNA using this data?

I have followed the below protocol, Is it correct?

After preprocessing the fastq file i have performed the alignment using HISAT2 with c.elegans mRNA and ncRNA (ncRNA-does not include the siRNA)
I have filtered the unmapped reads and i converted the bam file to fastq file. (because siRNA sequences have not been annotated)
I have filtered the fastq file using following criteria, ( I am looking into G-siRNA with 22 nucleotide length) Reads should start with G Read length 22
I have used salmon/sailfish for quantification using annotated GTF
DESEQ2 for differential analysis

Using this procedure we obtained 25 genes are differentially expressed. but i am not sure about whether these reads belongs to siRNA or it is a piece of coding region.

Please help me to identify the siRNA from transcriptome data.

Looking forward for the reply.

Thanks with Regards, Akila Ranjith, Research Scholar, Department of Biotechnology, Indian Institute of Technology - Madras.

RNA-Seq sequence galaxy • 1.8k views

ADD COMMENT • link 6.6 years ago by akilabioinfo ▴ 10

0

Entering edit mode

Is it possible to identify the siRNA using this data?

Based on the (absent) information on the data, it's impossible to judge. Was the sequencing done in such a way that you included also fragments of the size of the siRNA or siRNA precursors? Was it Illumina, PacBio, which technology? Which read length?

I have filtered the fastq file using following criteria

You should seriously think to filter reads by quality as well.

I have used salmon/sailfish for quantification using annotated GTF

Why did you use an alignment-free method such as sailfish when you have the edge of working on one of the organisms with the best characterized genomes (C. elegans)?

ADD REPLY • link 6.6 years ago by Matteo Schiavinato ★ 3.6k

score 0 · Answer 1 · 2017-08-29

0

Entering edit mode

6.6 years ago

akilabioinfo ▴ 10

Thanks for your reply.

We have outsource the sample for sequencing. We will confirm whether they have included siRNA precursors

The data belongs to illumina HIseq, read length: 50 bp.
we also considered the base quality
Can we use cufflink are string tie for quantification?

If i am right we need siRNA enrichment in the library protocol while sequencing the transcriptome in order to identify siRNA.

ADD COMMENT • link 6.6 years ago by akilabioinfo ▴ 10

0

Entering edit mode

Remember to use the "add reply" button at the end of a comment to reply to that comment, so the thread keeps being organized. I am sure one admin will soon move these two answers of ours to the comments. The "add answer" button is only to use if you have an answer to the problem you posed.

The data belongs to illumina HIseq, read length: 50 bp.

I am no expert in siRNA: is this read length compatible with what you want to discover?

we also considered the base quality

What was your threshold?

Can we use cufflink are string tie for quantification?

You can use whatever floats your boat, but I think (personal opinion) that you should make use of the best setting that your data allow you to use. Cufflinks, HTSeq, Stringtie, all of them will do what you want! However, before choosing the quantification method and the differential expression algorithm, I would make sure that what you need is in your data.

we need siRNA enrichment ... in order to identify siRNA

I guess you can say that.

ADD REPLY • link 6.6 years ago by Matteo Schiavinato ★ 3.6k

0

Entering edit mode

Thanks for your great help quality score :30 and above

ADD REPLY • link 6.6 years ago by akilabioinfo ▴ 10