<Cuffdiff> Huge difference between compatible_count and total_count
1
0
Entering edit mode
5.7 years ago

Hi,

I use HISAT2 + Cuffdiff to process my 150PE Mouse RNA-seq data.

Recently, I notice there are huge differences in number between compatible_count and total_count in my result. Many genes were underestimated due to zero "compatible count".

I've check the "XS:A:(+-) and it exists in my SAM/BAM files (Below). I visually check alignments with IGV and nothing is strange.

I also have tried different CuffDiff parameters, like –total-hits-norm or --poisson-dispersion, to see any improvements. But parameters didn't work. The only progress is that correct number of total counts was recognized by CuffDiff (Below)

My questions are:

  1. What features are taken to consider a read-pair compatible or not by CuffDiff ?
  2. Any parameters to increase number of compatible_count?

Thank you very much for your help.

SAM example:

A00123:18:H3MHFD:1:2162:4182:19413  419 1   3054721 1   137M    =   3054721 -137    CTTAGGGGCTTGAGAAAGTTCTCGCCCTCTCACCTGGGGCCTAAGATTGTATCAAGATAACTATGACAATGGCCTGACCTTTAAGGTTCCGCTTCTAACAATCATAAAGCATCCATAGGACTTCCAGGTACCCGCCC   FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF::FFFFF   AS:i:-5 ZS:i:-5 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:5A131  YS:i:-5 YT:Z:CP XS:A:-  NH:i:3
A00123:18:H3MHFD:1:2162:4182:19413  339 1   3054721 1   137M    =   3054721 -137    CTTAGGGGCTTGAGAAAGTTCTCGCCCTCTCACCTGGGGCCTAAGATTGTATCAAGATAACTATGACAATGGCCTGACCTTTAAGGTTCCGCTTCTAACAATCATAAAGCATCCATAGGACTTCCAGGTACCCGCCC   FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF   AS:i:-5 ZS:i:-5 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:5A131  YS:i:-5 YT:Z:CP XS:A:-  NH:i:3

Default Cuffdiff message:

[12:34:11] Modeling fragment count overdispersion.
> Map Properties:
>   Normalized Map Mass: 749021.00
>   Raw Map Mass: 752500.47
>   Fragment Length Distribution: Empirical (learned)
>                 Estimated Mean: 272.39
>              Estimated Std Dev: 138.57
> Map Properties:
>   Normalized Map Mass: 749021.00
>   Raw Map Mass: 746990.97
>   Fragment Length Distribution: Empirical (learned)
>                 Estimated Mean: 274.52
>              Estimated Std Dev: 131.43
[12:35:12] Calculating preliminary abundance estimates
[12:35:12] Testing for differential expression and regulation in locus.

total-hits-norm

[15:24:31] Modeling fragment count overdispersion.
> Map Properties:
>   Normalized Map Mass: 52162345.85
>   Raw Map Mass: 55210687.49
>   Fragment Length Distribution: Empirical (learned)
>                 Estimated Mean: 273.32
>              Estimated Std Dev: 140.81
> Map Properties:
>   Normalized Map Mass: 52162345.85
>   Raw Map Mass: 49297576.52
>   Fragment Length Distribution: Empirical (learned)
>                 Estimated Mean: 273.60
>              Estimated Std Dev: 131.36
[15:25:33] Calculating preliminary abundance estimates
RNA-Seq cuffdiff • 1.0k views
ADD COMMENT
0
Entering edit mode
5.7 years ago

Sorry, I find the problem was caused by wrong strandness setting at HISAT2 mapping step.

After changing to right strandness setting, Cuffdiff reports expected compatible_count number.

ADD COMMENT

Login before adding your answer.

Traffic: 2707 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6