htseq-count error: "does not contain 'gene_id'
1
1
Entering edit mode
7.9 years ago

I'm attempting to use htseq-count to get read counts for some *Plasmodium *RNA-seq data. The command line looks as follows:

htseq-count --format=bam --order=pos 3D7_1-1.bam PlasmoDB-28_Pfalciparum3D7.gff

However, when I run the command I get the following error:

Error occured when processing GFF file (line 40 of file PlasmoDB-28_Pfalciparum3D7.gff):
  Feature exon_PF3D7_0100100-1 does not contain a 'gene_id' attribute
  [Exception type: ValueError, raised in count.py:53]

Is there a way to tell htseq-count to ignore any features that don't have a 'gene_id' attribute or a way to convert the gff I have to a gtf? Any ideas/help much appreciated.

Thanks,

RNA-Seq htseq-count • 6.3k views
ADD COMMENT
0
Entering edit mode

I have similar issues and have tried the various guides provided.

My GFF file

##gff-version 3
scaffold_252    maker   gene    22873   47327   .   -   .   ID=CR513_001491;Name=CR513_001491;Alias=augustus_masked-scaffold_252-processed-gene-0.0;Dbxref=InterPro:IPR000719,Pfam:PF00069;Ontology_term=GO:0004672,GO:0005524,GO:0006468;
scaffold_252    maker   mRNA    22873   47327   .   -   .   ID=CR513_001491-RA;Parent=CR513_001491;Name=CR513_001491-RA;Alias=augustus_masked-scaffold_252-processed-gene-0.0-mRNA-1;Dbxref=InterPro:IPR000719,Pfam:PF00069;Ontology_term=GO:0004672,GO:0005524,GO:0006468;_AED=0.50;_QI=0|0.08|0.04|0.16|1|1|25|0|927;_eAED=0.50;_merge_warning=1;
scaffold_252    maker   exon    47325   47327   .   -   .   ID=CR513_001491-RA:25;Parent=CR513_001491-RA;
scaffold_252    maker   exon    46552   46666   .   -   .   ID=CR513_001491-RA:24;Parent=CR513_001491-RA;
scaffold_252    maker   exon    44591   44628   .   -   .   ID=CR513_001491-RA:23;Parent=CR513_001491-RA;
scaffold_252    maker   exon    44455   44507   .   -   .   ID=CR513_001491-RA:22;Parent=CR513_001491-RA;

I tried

htseq-count  ~_sorted.bam ~standard_functional_blast_interproscan.gff -s no  --idattr=exon -f bam

it returned

Error processing GFF file (line 4 of file ~standard_functional_blast_interproscan.gff):
  Feature CR513_001491-RA:25 does not contain a 'exon' attribute
  [Exception type: ValueError, raised in features.py:329]

However when I run

htseq-count  ~sorted.bam ~standard_functional_blast_interproscan.gff

it returns

Error processing GFF file (line 4 of file ~standard_functional_blast_interproscan.gff):
  Feature CR513_001491-RA:25 does not contain a 'gene_id' attribute
  [Exception type: ValueError, raised in features.py:329]

Any possible solution to this? Thanks

ADD REPLY
0
Entering edit mode

So I cant see you gff, but have you tried using Parent instead of gene ID for the -i parameter?

ADD REPLY
0
Entering edit mode
7.9 years ago
igor 13k

Both the GTF and the GFF require "gene_id". You could use a different attribute with --idattr parameter:

GFF attribute to be used as feature ID. Several GFF lines with the same feature ID will be considered as parts of the same feature. The feature ID is used to identity the counts in the output table. The default, suitable for RNA-Seq analysis using an Ensembl GTF file, is gene_id.

First, you need to scan your GFF and see if there is one that makes sense. It's also possible that you will have to modify the file (if different entries have different attributes, for example).

ADD COMMENT

Login before adding your answer.

Traffic: 1758 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6