Error for reading .gtf file in R
1
1
Entering edit mode
5.9 years ago
modarzi ▴ 170

Hi, For my TCGA data set, I need to download gencode.v22.annotation.gtf.gz. For this purpose, I install “refGenome” package and run below codes in windows and Linux platforms in R:

setwd("E:/GTF")
library (refGenome)
# create ensemblGenome object for storing Ensembl genomic annotation data
ens <- ensemblGenome()
# read GTF file into ensemblGenome object
read.gtf(ens, "gencode.v22.annotation.gtf")

When I want to read .gtf file in RStudio in windows my R crashed and I have to restart RStudio again. Also In Linux I receive this message:

The application R has closed unexpectedly. By clicking on “show detail” bottom, I see this message:
R crashed with SIGABRT in __gnu_cxx::__verbose_terminate_handler

In addition, I downloaded gencode.v22.annotation.gtf.gz from 2 sources:

1- https://api.gdc.cancer.gov/data/fe1750e4-fc2d-4a2c-ba21-5fc969a24f27

2- https://www.encodeproject.org/files/gencode.v22.annotation/@@download/gencode.v22.annotation.gtf.gz

I appreciate if anybody share his/her comment with me.

Best regards,

Mohammad

Gene Transform Format GTF RNA-Seq refGenome • 4.2k views
ADD COMMENT
1
Entering edit mode

I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLY
0
Entering edit mode

I think this issue is same as: read .gtf in R. Instead of Rstudio, try the code in R console.

ADD REPLY
0
Entering edit mode

Thanks but "read.gtf in R" also was for me.anyway, can I use another version of annotation? for example latest version? or I have to use just that one? thanks from you if you share your comment with me. Best Regards, Mohammad

ADD REPLY
0
Entering edit mode

Thanks but "read.gtf in R" also was for me.anyway, can I use another version of annotation? for example latest version? or I have to use just that one? thanks from you if you share your comment with me. Best Regards, Mohammad

ADD REPLY
1
Entering edit mode

It seems program has a trouble in reading gtf from gencode (both primary and main). I checked both v22 and v28 (latest). Alternate is to use gtf from Ensembl:. If you don't want to use ensembl annotations, i have provided another way to use the gencode gtf. For using Ensembl annotation, Download gtf file from ensembl: (ftp://ftp.ensembl.org/pub/release-92/gtf/homo_sapiens). File: Homo_sapiens.GRCh38.92.gtf.gz. Unzip before you load into package. It worked on my machine.

> library(refGenome)
Loading required package: doBy
Loading required package: RSQLite

> ens <- ensemblGenome()      

> read.gtf(ens, "Homo_sapiens.GRCh38.92.gtf")
[read.gtf.refGenome] Reading file 'Homo_sapiens.GRCh38.92.gtf'.
[GTF]  2689571 lines processed.
[read.gtf.refGenome] Extracting genes table.
[read.gtf.refGenome] Found 58,395 gene records.
[read.gtf.refGenome] Finished.

If you want to use Gencode annotations only, there is a round about way. R /lRefGenome are crashing while loading gtf file from gencode. Instead, you can

  1. Download gff3 file from gencode (https://www.gencodegenes.org/releases/current.html)
  2. Install gffread from biocondia (current version 0.99)
  3. Run following command: gffread gencode.v22.annotation.gff3-T -o my.gtf (Note: For test sake, I haven't used parameters to produce desired gtf. Please use correct parambers to get gtf with desired features. gfftead -h will print options. I named output gtf as my.gtf)
  4. Now you can load this gtf in to R without any issues; Following is the test output:
>  library(refGenome)
Loading required package: doBy
Loading required package: RSQLite
>  ens <- ensemblGenome()   
> read.gtf(ens, "my.gtf")
[read.gtf.refGenome] Reading file 'my.gtf'.
[GTF]  1872314 lines processed.
[read.gtf.refGenome] Extracting genes table.
[read.gtf.refGenome] Finished.
> q()
  
ADD REPLY
0
Entering edit mode

Thanks. I Download gff3 file from gencode (https://www.gencodegenes.org/releases/current.html) but I can't install gffread in R.

> biocLite("gffread")
BioC_mirror: https://bioconductor.org
Using Bioconductor 3.7 (BiocInstaller 1.30.0), R 3.5.0 (2018-04-23).
Installing package(s) ‘gffread’
Warning message:
package ‘gffread’ is not available (for R version 3.5.0)

could you please that how can I install that package in R?

Best Regards, Mohammad

ADD REPLY
2
Entering edit mode

gffread is not available in R. It is a system application. There are several ways to install:

  1. from here: https://github.com/gpertea/gffread. Follow the installation instructions.
  2. from here: http://ccb.jhu.edu/software/stringtie/dl/gffread-0.9.12.Linux_x86_64.tar.gz. This is a binary. Just keep it in your path and test it
  3. If you are using ubuntu, try sudo apt install cufflinks (for this you should have sudo permissions). This would install gffread along with cufflinks.
  4. If you have conda/miniconda installed on your machine, conda install gffread would install gffread.
ADD REPLY
0
Entering edit mode
5.9 years ago

Try rtracklayer::import()

ADD COMMENT

Login before adding your answer.

Traffic: 2323 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6