GRCH38 counts significantly lower than for version 37?
1
0
Entering edit mode
5.4 years ago
ab123 ▴ 50

Hi there,

Quick question:

are the gene annotations specific to the sequencing platform for RNA-Seq?

I'm looking at a dataset produced with Illumina HiSeq 2500. I've aligned it both with Human GRCH 38 and the older 37 version. So far so good...alignment files are the same sizes.

When I then count the reads I'm getting almost 60,000 for the gtf GRCH 37 but only 17,000 for gtf GRCH 38. The gtf files are both coming from ensembl.org.

I'm not sure what's wrong. Is there any explanation for this?

Many thanks!

RNA-Seq grch annotation ensembl • 1.0k views
ADD COMMENT
0
Entering edit mode

alignment files are the same size

File size is useless when it's not an extreme value. It is an indicator of nothing, so you cannot predicate a "so far so good" statement on that.

ADD REPLY
0
Entering edit mode

60,000 for the gtf GRCH 37 but only 17,000 for gtf GRCH 38

What do those numbers refer to? Genes in your GTF file? There is a reason major new genome builds are spaced a few years apart since they can include major refinements in information content.

ADD REPLY
0
Entering edit mode

I am not entirely sure if version 38 really would only contain 17,000 genes? It refers to the final counts that are then used for diff. expr. analysis. Would version 38 contain significantly fewer genes?

ADD REPLY
2
Entering edit mode
5.3 years ago
Emily 23k

Two possibilities:

  1. You're getting counts of genes in GRCh38 and counts of transcripts in GRCh37.

  2. You've used a protein-coding only reference file for GRCh38 and a complete genes reference file for GRCh37.

ADD COMMENT

Login before adding your answer.

Traffic: 1610 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6