Question

Mirna And Deseq

1

Entering edit mode

10.3 years ago

bioinforupesh2009.au ▴ 140

Hello all,

Just a quick answer by experts. I am dealing with miRNA-seq data and having some doubt.

1. Filtering by resSig <- res[ res$padj < .1, ]

I am not able to get any significant miRNAs on my sample while by using resSig <- res[ res$pval < 0.05, ]. I am getting about 41 DE miRNAs (up and down). What should I do? May I rigid with only pval makes sense? I mean "May I keep going only filtering by pval < 0.05, is that enough to consider my genes significantly differentially expressed?"

2. After this analysis may go further for filtering by overall counts, plotting Heat map and PCA plots would be good results? This makes any significant sense? I am asking because I am not very expert in statistics.

Thanks, any help would be appreciated

differential-expression deseq • 9.3k views

ADD COMMENT • link updated 13 months ago by Ram 43k • written 10.3 years ago by bioinforupesh2009.au ▴ 140

1

Entering edit mode

do you have any replicate for your analysis?

ADD REPLY • link 10.3 years ago by Martombo ★ 3.1k

0

Entering edit mode

yes for sure...two conditions with 5 replicates each.

ADD REPLY • link 10.3 years ago by bioinforupesh2009.au ▴ 140

0

Entering edit mode

I wouldn't use unadjusted p-values unless really desperate (they're less likely to validate). If you haven't done so already, try doing some independent filtering to increase statistical power.

ADD REPLY • link 10.3 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks for your suggestion but could you please explore your advice. As i am not so expert of R

ADD REPLY • link 10.3 years ago by bioinforupesh2009.au ▴ 140

0

Entering edit mode

I assume you're referring to the independent filtering part. Have a read through this paper from Wolfgang Huber's group and the genefilter Bioconductor package. Also note, this is all done automagically in more recent version of DESeq2, so if you're using that then this may have already been done.

ADD REPLY • link 10.3 years ago by Devon Ryan 104k

0

Entering edit mode

ahhh, thanks for your nice suggestions, now i am going further the section of DESeq manual of 5.1 filtering by overall counts upto Heat Map and PCA plot (for data quality assessment bysample clustering), is this same as in DESeq ????

ADD REPLY • link 10.3 years ago by bioinforupesh2009.au ▴ 140

0

Entering edit mode

They're more or less the same, it's just done automatically in the more recent versions of DESeq2. If you're still using DESeq, do yourself a favor and transition over to DESeq2, it have a number of other improvements.

ADD REPLY • link 10.3 years ago by Devon Ryan 104k

0

Entering edit mode

Dear sir dpryan79 , Thanks for your kind advice, its really nice use DESeq2 instead DESeq, i used and got results that, would like to discuss. in my code i used,

###DE ananlysis

dds <- DESeq(ddsHTSeq)

res <- results(dds)

res <- res[order(res$padj),]

head(res)

####extraction of DE at value ####counts

sum(na.omit(res$padj < 0.1))

[1] 42 ##output

sum(na.omit(res$pval < 0.05))

[1] 120 ##output

###Investigate strongly donwregulated genes

head(res[order(res$log2FoldChange, -res$baseMean),])

down <- (res[order (res$log2FoldChange, -res$baseMean),])

write.csv(down,file="Human_neuro_down_padjless_0.1.csv")

###Investigate strongly upregulated gene

head(res[order (-res$log2FoldChange, -res$baseMean), ] )

upregulated <- (res[order (-res$log2FoldChange, -res$baseMean),])

write.csv(upregulated,file="Human_neuro_upregulated_padjless_0.1.csv")

1) is the correct way, what i have done ??? Also in order to find Down and Up-regulated genes ?? as prev. mentioned in DESeq ?? 2) May i say now after independent filtering, i got 42 gene that are DE at padj < 0.1 ?? or 120 genes are DE at pval < 0.05 ?? pls make me sure, i am bit confused. 3) i have gone through overall counts data transform, plotting Heat map and PCA plots AND Mean counts & Hist of p-value. And i stopped here. But in the last part while running ..

abline(a=0, slope, col="red3",lwd=2) #does not work##error#object 'slope' not found why ???

Thanks and looking forward for your kind advice again

ADD REPLY • link 10.3 years ago by bioinforupesh2009.au ▴ 140

0

Entering edit mode

That looks correct from eye-balling the code at least. You have 42 DE genes, never use unadjusted p-values with data like this unless you have a very good reason. The abline() just plots a straight line on a graph, presumbably with other points. I assume there was a step skipped or a typo, at least if you're just following the vignette. Regardless, that's not something to worry about.

ADD REPLY • link 10.3 years ago by Devon Ryan 104k

0

Entering edit mode

Thank you sir for your quick reply, and what about down about up regulate genes ??? May i have to look over csv file ???

ADD REPLY • link 10.3 years ago by bioinforupesh2009.au ▴ 140

0

Entering edit mode

It's easiest to just look at the sign of the log2 foldchange in the CSV files. You're ordering things by that, but it's usually (in my experience) easiest to just order things by adjusted p-value and then not separate things by direction of change (after all, it's often beneficial to consider both down and up-regulated genes that might be interacting together).

ADD REPLY • link 10.3 years ago by Devon Ryan 104k

score 3 · Answer 1 · 2014-01-10

miRNA DE analysis to my opinion has some significant differences compared to gene DE.

There are numerous issues with almost all parts of the analysis: counting miRNA reads (some algorithms might be too strict and you end up losing power or others can be too loose and you can have increased variance), normalization (miRNA normalization is a debate topic, especially after a couple of RNA articles) and as you have mentioned the DE algorithm.

May I ask how you've ended up with miRNA counts? Did you use an inhouse algorithm or a ready-made tools, such as miRDeep?

The available DE algorithms have not been extensively tested in miRNA-Seq (or small-RNA-Seq) data for me to suggest one that easily even though we use them quite a lot for that specific task.

We've used DESeq many times before for miRNA DE analysis, even though it has proved to be a bit strict. I still always start from that. It helps sometimes to remove really lowly expressed miRNAs, since a fraction of miRNAs hog most of the sequenced reads and the rest get very few counts that will disable them to achieve statistical significance. By removing the miRNAs with really low expression you might increase power for the rest.

I've used DESeq2 from the first days it got out in Bioconductor but I'm still reluctant to put it in my "official" pipeline. I really liked the improvements but I'm waiting for the article to come out or to see some more tests. The only in-depth comparison I can recall at the moment is from (http://bib.oxfordjournals.org/content/early/2013/12/02/bib.bbt086.full) [check the supplement], where I was not convinced that it's that better than the original DESeq. But it's only one publication..

I would suggest also to check out EdgeR.

Cheers,

IV

Edit

If you do PCA you will get a 2D representation of which samples are closer to each other (more similar) than others. It can be useful in numerous situations where you can see if your experiment worked for all replicates, exclude outliers, identify subsets within your samples, etc.

Heatmap of DE miRNAs is also useful, since you can get a nice image of groups of miRNAs that behaved similarly and is usually used for presentations or articles. From the miRNAclusters you can also get some interesting gene subgroups for post analysis.

After the DE analysis you can use a tool for pathway analysis, such as miRPath (http://www.microrna.gr/miRPathv2) that can get you pathways targeted by your DE miRNAs.

End of Edit