How to obtain gtf file for plant genomes?
1
0
Entering edit mode
10 months ago
Kumar ▴ 120

I would like to perform RNA-seq analysis for a plant genome. For which I need to downloaded genome and gtf files of the plant. However, NCBI database has gff file instead of gtf file. Even Ensembl Plants database also has gff files only. The gff file is not compatible to obtain assembly with annotation using HISAT pipeline. Therefore, kindly help me to fix this issue.

Thanks in advance.

Ensembl Assembly RNA-seq Gene Genome • 1.4k views
ADD COMMENT
1
Entering edit mode

Plant Ensembl has GTF file. On the landing page of the different organisms they indeed only have a button for gff3 but you can manually find it:

Imagine you have this link: https://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-56/gff3/hordeum_vulgare/

Simply change to https://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-56/gtf/hordeum_vulgare/

...replacing gff3 to gtf in the path. Should work.

ADD REPLY
0
Entering edit mode

Thank you ATpoint , However, I need to obtain gtf file for Punica granatum genome. It is not available in ensembl plant database, which is available in NCBI database. However, NCBI has gff file for Punica granatum. In this case how can I obtain gtf file for punica granatum.

ADD REPLY
1
Entering edit mode

The gff file is not compatible to obtain assembly with annotation using HISAT pipeline.

HISAT is not an assembly program. I assume you are referring to this application: Annotation (.gff) and .fasta files as index in Hisat2

Which specific genome are you referring to?

ADD REPLY
0
Entering edit mode

Sorry GenoMax , it is an indexing process, Mistakenly, I have mentioned it as assembly. I need to perform RNA-Seq analysis for Punica granatum (Pomegranate). It is not available in ensembl plant database, which is only available in NCBI. In this case how can I obtain gtf file for punica granatum.

ADD REPLY
2
Entering edit mode

You can try AGAT toolkit to convert your GFF to GTF: https://agat.readthedocs.io/en/latest/gff_to_gtf.html

ADD REPLY
0
Entering edit mode

Thank you GenoMax . let me try with AGAT toolkit as you suggested.

ADD REPLY
1
Entering edit mode

Thank you GenoMax ATpoint MirianT_NCBI for your valuable guidance. Now I could able to get gtf file.

ADD REPLY
2
Entering edit mode
10 months ago
MirianT_NCBI ▴ 720

Hi Kumar,

You can use NCBI Datasets for retrieving GTF files. Here's the command you can use:

datasets download genome taxon "Punica granatum" --include gtf

This command will download a data package with only the GTF files for the available genomes for this taxon. You can also include other data files, such as genome FASTA, GFF, etc.

After unzipping the data package, you will find the folder structure below:

unzip ncbi_dataset.zip -d punica
Archive:  ncbi_dataset.zip
  inflating: punica/README.md        
  inflating: punica/ncbi_dataset/data/assembly_data_report.jsonl  
  inflating: punica/ncbi_dataset/data/GCA_002201585.1/genomic.gtf  
  inflating: punica/ncbi_dataset/data/GCA_002837095.1/genomic.gtf  
  inflating: punica/ncbi_dataset/data/GCF_007655135.1/genomic.gtf  
  inflating: punica/ncbi_dataset/data/dataset_catalog.json

I hope this helps. Feel free to reach out if you have any questions.

ADD COMMENT
1
Entering edit mode

Other alternative is to go to: https://www.ncbi.nlm.nih.gov/datasets/taxonomy/22663/

Click on "Download" button, choose RefSeq/GenBank version and then choose "GTF" option. Uncheck "fasta sequence".

ADD REPLY

Login before adding your answer.

Traffic: 1682 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6