Question

pvalues very low in Salmon and edgeR

0

Entering edit mode

6.4 years ago

Tania ▴ 180

Hi Everyone

I used Salmon and edgeR. I have some DGE with pvalues very very very low so close to zero. Actually the same genes are differentially expressed with a pvalue small ~ 0.001 in cuffdiff but not close to zero, so I am just wondering why the significance values are so different. Like geneA has a pvalue 0.001 in cuffdiff and a very small pvalue close to zero in Salmon?

Is this weird ? or just because of the number of genes in the background between aligning to a genome and mapping to transcript?

Thanks

RNA-Seq • 2.0k views

ADD COMMENT • link updated 6.2 years ago by Biostar 20 • written 6.4 years ago by Tania ▴ 180

0

Entering edit mode

Could you give more information about your input? Maybe even show the commands you're running?

ADD REPLY • link 6.4 years ago by Hussain Ather ▴ 990

0

Entering edit mode

Hi @Tania, are you trying to compare "wicked-fast transcript quantification" of Salmon with cuffdiff?

Also, you can search Trinity Group for probable same situation, as they use Salmon and edgR, too.

ADD REPLY • link 6.4 years ago by Farbod ★ 3.4k

0

Entering edit mode

I am trying to comparing some gene expressions I got from Salmon (using FMD index) and edgeR and what I got from cuffdiff.

ADD REPLY • link 6.4 years ago by Tania ▴ 180

0

Entering edit mode

Hi Tania,

First possibility:

Cuffdiff would have performed it's differential expression comparisons on FPKM values; EdgeR would have performed it's differential expression comparisons on trimmed mean of M-values (TMM) (I hope that you have supplied raw counts (not FPKM counts) to edgeR?)

Second posibility:

Low sample numbers will produce very low P values.

ADD REPLY • link 6.4 years ago by Kevin Blighe 87k

0

Entering edit mode

Thank you. Yes, I supplied counts to edgeR not FPKM? so still not sure why?

ADD REPLY • link 6.4 years ago by Tania ▴ 180

0

Entering edit mode

FPKM counts, which are normalised and used by Cuffdiff, are fundamentally different from the normalised counts used by edgeR. Evidence that has accumulated over the years implies that, with FPKM counts, many false-positive associations will be made through differential expression analysis. This does not fully explain why edgeR in your data is calling a lower P value though.

What are your sample numbers?

ADD REPLY • link 6.4 years ago by Kevin Blighe 87k

0

Entering edit mode

Thanks Kevin. You mean how many samples I have? I have 12 control vs 12 tumor. This is a sample result for example:

"IGFBPL1",7.14991080788358,5.342277022899,4.68668105170228e-82,2.98151026239127e-78
"TMPRSS6",8.97546800618013,6.99003709974313,4.91094391964278e-65,1.33893378151975e-61

ADD REPLY • link 6.4 years ago by Tania ▴ 180

0

Entering edit mode

Hi, that makes sense because low sample numbers will result in low and unreliable P and adjusted values, like these. Your number of false-positive associations is higher with lower samples, even after multiple testing correction. However, those that are most differentially expressed, you can have confidence that these are genuine results. It's the other ones of lesser statistical significance about which you need to be careful.

ADD REPLY • link 6.4 years ago by Kevin Blighe 87k

0

Entering edit mode

Got you Kevin, so what do you think is a good cutoff?

Should I use cutoff in edgeR as (0.01) for example, or even more stringent?

ADD REPLY • link 6.4 years ago by Tania ▴ 180

0

Entering edit mode

I would go as low as FDR adjusted P < 0.0001 and absolute log2 fold change > 2.

There is no real way to know the exact best cutoff. You may have to go back and forward with it for a while.