Biostar Beta. Not for public use.
Question: Generating count table
1
Entering edit mode

Hi all,

I've been trying to calculate TPM values for a count table, thus I've been using biomaRt package to obtain gene lengths. However, I'm using an NCBI annotation file and my count table has gene_name's as row names (I'm using gene_name instead of gene_id, because I receive an error otherwise). Thus, when I try to merge gene length table and my count table, I get an empty data frame (I assume that the gene names in my annotation file and biomaRt provides are not matching or maybe I'm doing wrong something else). Here is how my code looks:

rsubread.counts<-featureCounts(files=E08_bam,
                               annot.ext="/media/gokberk/DATA/Thesis/Data/Annotations/Macaca_fascicularis_5.0_genomic.gtf",
                               isGTFAnnotationFile = TRUE,
                               GTF.featureType="exon",
                               GTF.attrType="gene_name",
                               useMetaFeatures = TRUE,
                               allowMultiOverlap = FALSE)

cnt<-rsubread.counts$counts

ensembl<-useMart("ensembl",dataset="mfascicularis_gene_ensembl",host="www.ensembl.org")

geneLength<-getBM(attributes=c("ensembl_gene_id","ensembl_transcript_id","external_gene_name","transcript_length"),
                  filter = "external_gene_name",
                  values = rownames(cnt),
                  mart = ensembl)

merged<-geneLength %>%
        group_by(ensembl_gene_id,external_gene_name) %>%
        summarise(Length= round(mean(transcript_length))) %>%
        rename(GeneID=external_gene_name)

countdata<-data.frame(GeneID=rownames(cnt),cnt)

countdata<- merged %>%
            dplyr::select(GeneID, Length) %>%
            inner_join(countdata)

After running these, head(countdata) and dim(countdata) commands give following outputs:

# A tibble: 0 x 206
# Groups:   GeneID [0]
# … with 206 variables: GeneID <chr>, Length <dbl>, X.media.gokberk.DATA.Thesis.Data.BAM_files.E08.SRR2930102.bam <int>,
#   X.media.gokberk.DATA.Thesis.Data.BAM_files.E08.SRR2930103.bam <int>, X.media.gokberk.DATA.Thesis.Data.BAM_files.E08.SRR2930104.bam <int>,
#   X.media.gokberk.DATA.Thesis.Data.BAM_files.E08.SRR2930105.bam <int>, X.media.gokberk.DATA.Thesis.Data.BAM_files.E08.SRR2930106.bam <int>,
#   X.media.gokberk.DATA.Thesis.Data.BAM_files.E08.SRR2930107.bam <int>, X.media.gokberk.DATA.Thesis.Data.BAM_files.E08.SRR2930108.bam <int>,
#   X.media.gokberk.DATA.Thesis.Data.BAM_files.E08.SRR2930109.bam <int>, X.media.gokberk.DATA.Thesis.Data.BAM_files.E08.SRR2930110.bam <int>,
#   X.media.gokberk.DATA.Thesis.Data.BAM_files.E08.SRR2930111.bam <int>, X.media.gokberk.DATA.Thesis.Data.BAM_files.E08.SRR2930112.bam <int>,...

[1]   0 206

I was wondering if the issue here is actually the discrepancy between my annotation file and gene names provided by getBM and if so how could I fix it. Thanks a lot!

Cheers, Gökberk

ADD COMMENTlink 9 months ago gokberk • 30
4
Entering edit mode

The featureCounts function should give you the correct length of the transcript/gene. There should be a column named length.

ADD COMMENTlink 9 months ago Benn 6.9k
Entering edit mode
0

Yes, I found it and solved my problem, thanks a lot!

ADD REPLYlink 9 months ago
gokberk
• 30

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0