Biostar Beta. Not for public use.
RNA-Seq analysis = only 10 DE genes
0
Entering edit mode
2.2 years ago
ste.lu • 40
@ste.lu43755

Hi All,

I've RNA-seq data for 2 cell lines (let's say A and B) which have been knocked out for a gene: -A wt -A KO -B wt -B KO. I've used salmon to map the reads on the reference genome and DESeq2 to perform the differential expression analysis. In the end I've only 10 genes DE in between WT and KO, do you think something is wrong or it is a feasible result?

RNA-Seq • 448 views
ADD COMMENTlink
2
Entering edit mode

That could be a good result. Since you are generating hypotheses for further testing it may be very manageable to make a story out of the 10 genes you have identified. But without knowing the complete story this is about all we can say.

ADD REPLYlink
0
Entering edit mode

Well, then I'll cross the fingers!

ADD REPLYlink
1
Entering edit mode

How many replicates do you have per condition and cell line? It could be that given the gene has a limited impact on the regulation of other genes or cellular responses and you lack power to detect modest changes.

ADD REPLYlink
0
Entering edit mode

I have 2 technical replicates for each of the biological replicates (A and B in the question). However, something for one sample went wrong during the library preparation and it made it useless, so I ended up analyzing only 1 biological replicate against the other. Would you suggest to insert in the analysis 2 technical replicate for one cell line against one technical replicate for the other cell line?

ADD REPLYlink
1
Entering edit mode

Which filters for L2FC, padj, basemean, etc., do you use to define a DE gene?

ADD REPLYlink
0
Entering edit mode

I am considering DE genes the ones with a padj below 0.05

ADD REPLYlink
0
Entering edit mode

This is probably the most liberal filter one can think of, and still you get just 10 DE genes, meaning that the samples are extremely similar to each other. Have you checked Spearman's correlation? I bet it would be close to 1. And regarding the experimental design you ended up having, please see these posts http://seqanswers.com/forums/showthread.php?t=31036 https://support.bioconductor.org/p/101210/

ADD REPLYlink
1
Entering edit mode
2.2 years ago
@ialbert

Always visualize your results with other means, in this case align against the genome. Once you look at your data with IGV the answers will be more forthcoming.

Are you getting so few results because the data is perfect, everything comes out the same across all samples? (this is usually rare)

Then perhaps your replicate variability is such that the intra-replicate consistency is comparable to the variation across conditions, in which case you are finding these few results because the evidence for variation across conditions is just not there.

Finally, I will say that I only had such a problem a few times, when studying brain samples from an excellent scientist whose data always turns out to be nearly perfect, text-book like consistency across replicates.

ADD COMMENTlink
0
Entering edit mode

Always visualize your results with other means, in this case align against the genome. Once you look at your data with IGV the answers will be more forthcoming.

Is there a way to go from Salmon results to something similar to TopHat or I have to go back and redo the alignment?

Are you getting so few results because the data is perfect, everything comes out the same across all samples?

How can I control for this? with something more than a correlation?

ADD REPLYlink
0
Entering edit mode

The best is to align separately against the genome, you can use hisat2 or even bwa mem for that.

If your samples are similar across samples it means there is no expression change. Correlation may not be informative in some cases where lengthy well-correlated regions will mask shorter, uncorrelated regions. In addition, correlation accounts for changes in the same direction, and may not be able to account for changes that take place in the same direction only with different magnitudes.

Correlation is good for noisy data, for data that replicates too well across conditions it becomes a lot less useful.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.3