Question

Visualizing hg19 and hg38 FPKMs in a single plot

0

Entering edit mode

7.5 years ago

komal.rathi ★ 4.1k

Hello everyone,

I am trying to plot NCAM1 gene expression from 5 disparate RNA sequencing datasets - all are processed in the same way (STAR -> RSEM) and quantified in terms of FPKM. The problem is that I have 4 datasets mapped to hg38 and one is mapped to hg19. These are the coordinates of NCAM1 in hg19 and in hg38 from UCSC Genome Browser:

hg19: chr11:112,831,969-113,149,158
hg38: chr11:112,961,436-113,275,489

Can I plot NCAM1 expression across these datasets (in one plot) even though they were mapped and quantified using different genome references and annotations?

RNA-Seq • 2.9k views

ADD COMMENT • link updated 7.5 years ago by seidel 11k • written 7.5 years ago by komal.rathi ★ 4.1k

2

Entering edit mode

7.5 years ago

seidel 11k

If you have FPKM values, then essentially the mapped reads have already been normalized to the appropriate gene structure, as Santosh notes in his comment "Note also that FPKM is normalized for transcript length". You can put them together in the same plot, but I would comment appropriately in the legend that one is from a different source. You still face the risk that the odd one is a different isoform, but unless you can figure this out explicitly the comment is your only safeguard.

ADD COMMENT • link 7.5 years ago by seidel 11k

score 3 · Accepted Answer · 2016-11-08

3

Entering edit mode

7.5 years ago

Santosh Anand 5.7k

My guess is that from hg19 -> hg38, the only change will be in the coordinates, not in the gene structure and annotation per se. If I remember well, GENCODE does only the liftOver of cordiates from hg19 -> hg38. In that case you can safely plot the expressions in one plot. To be more sure, you can check if the gene structure from both annotations are same or not (It can be possible that they are quantifying different splice forms due to different annotations used.)

ADD COMMENT • link 7.5 years ago by Santosh Anand 5.7k

0

Entering edit mode

Thanks - so the hg38 based data used Gencode and hg19 used Refseq. I don't know if there is much difference in the gene models - looking at the UCSC genome browser, they appear to be just slightly different.

ADD REPLY • link 7.5 years ago by komal.rathi ★ 4.1k

0

Entering edit mode

Gencode and RefSeq could differ by small amounts at ends, probably because GENCODE is curated for accurate gene structure (both 5' and 3' end). But if the overall gene structure is same, some bp here or there will not change the FPKM (Note also that FPKM is normalized for transcript length)

ADD REPLY • link 7.5 years ago by Santosh Anand 5.7k

0

Entering edit mode

Thanks for clarifying - can you move this to an answer so I can accept it?

ADD REPLY • link 7.5 years ago by komal.rathi ★ 4.1k

0

Entering edit mode

can you move this to an answer so I can accept it?

How to do that?

ADD REPLY • link 7.5 years ago by Santosh Anand 5.7k

0

Entering edit mode

I'm not sure if you can move it, but I can so I did :p

ADD REPLY • link 7.5 years ago by WouterDeCoster 47k

0

Entering edit mode

Appreciate that very much, thank you :)

ADD REPLY • link 7.5 years ago by Santosh Anand 5.7k