Gtf File Errors When Using Easyrnaseq To Produce Count Tables.
1
0
Entering edit mode
11.1 years ago
e.karasmani ▴ 140

Dear All,

Hello, as I am not an expert I would like your help with my problem.....

I have used tophat to allign a fastq from RNA-seq experiment and then I wanted to make a count table to use it in DESeq. Hence I tried to use easyRNAseq for that purpose but I have a problem.....

here is my code....

library(easyRNASeq)
library(BSgenome.Mmusculus.UCSC.mm9)


dataDir <- "/data/lena/m"
file.exists(dataDir)
stopifnot(file.exists(dataDir))
list.files(dataDir)


gtf <- "/home/lena/mm9_IlluminaAnnotation_genes.gtf"
rna <- "accepted_hits.bam"


 c_ers <- easyRNASeq(organism = "Mmusculus", 
               chr.sizes = as.list(seqlengths(Mmusculus)),
               annotationMethod = "gtf", annotationFile= gtf, 
               format = "bam", count = "genes", 
               summarization = "geneModels",outputFormat="DESeq", 
               filenames = rna, filesDirectory = dataDir
               )

and here is what i get

Checking arguments...
Fetching annotations...
Read 566539 records
Error in .getGtfRange(organismName(obj), filename = filename, ignoreWarnings = ignoreWarnings,  :
 Your gtf file: /home/lena/mm9_IlluminaAnnotation_genes.gtf does not contain all the required fields: gene_id, transcript_id, exon_number, gene_name                                                                                                                      .
In addition: Warning messages:
1: The use of the list for providing chromosome sizes has been deprecated. Use a named numeric vector instead.
   2: In .Method(..., deparse.level = deparse.level) :
 number of columns of result is not a multiple of vector length (arg 17)

Hence what am I doing wrong? What should I do ??? Could you please advise me?

This is how my Gtf file looks like

chr1    unknown    exon    3204563    3207049    .    -    .    gene_id "Xkr4"; gene_name "Xkr4"; p_id "P2671"; transcript_id "NM_001011874"; tss_id "TSS1758";

 chr1    unknown    stop_codon    3206103    3206105    .    -    .    gene_id "Xkr4"; gene_name "Xkr4"; p_id "P2671"; transcript_id "NM_001011874"; tss_id "TSS1758";
 chr1    unknown    CDS    3206106    3207049    .    -    2    gene_id "Xkr4"; gene_name "Xkr4"; p_id "P2671"; transcript_id "NM_001011874"; tss_id "TSS1758";

Also I tried to use the biomart way....

so that is what I did.....

ensembl=useMart(host='may2009.archive.ensembl.org', biomart='ENSEMBL_MART_ENSEMBL', dataset='mmusculus_gene_ensembl')
 ensembl.genes <- getBM(attributes = c('chromosome_name','start_position'
,'end_position','ensembl_gene_id','external_gene_id','strand'), mart = ensembl)


c_ers <- easyRNASeq(organism = "Mmusculus", 
               chr.sizes = as.list(seqlengths(Mmusculus)),
               annotationMethod = "biomaRt", annotationFile= ensembl.genes, 
               format = "bam", count = "genes", 
               summarization = "geneModels",outputFormat="DESeq", 
               filenames = rna, filesDirectory = dataDir
               )


Checking arguments...
 Fetching annotations...
 Error in easyRNASeq(organism = "Mmusculus", chr.sizes = as.list(seqlengths(Mmusculus)),  :
  The number of conditions: 0 did not correspond to the number of samples: 1
   In addition: Warning message:
   The use of the list for providing chromosome sizes has been deprecated. Use a named numeric vector instead.

Could you please help me....I am stucked and don't know what to do.

Thank you in advance!

I will appreciate your answers

Best regards Lena

r rna • 3.2k views
ADD COMMENT
0
Entering edit mode
11.1 years ago

The error message appears to be self explanatory:

Your gtf file: /home/lena/mm9_IlluminaAnnotation_genes.gtf does not contain all the required fields: gene_id, transcript_id, exon_number, gene_name

And indeed your GTF file that you list below does not contain all these fields.

ADD COMMENT

Login before adding your answer.

Traffic: 1311 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6