From GTF file to GFF file (searching for all 5'UTRs, multiple UTRs per transcript)
1
0
Entering edit mode
7.2 years ago

Hi, I downloaded the GTF file on the Gencode website and now I want to create a GFF file containing all 5'UTRs, which then will be subsequently used in htSeq.

I have the following problems with writing my code in the command line: - How to obtain the 5'UTRs of each transcript? How to deal with + and - strand? I know that the 5'UTRs are those at the 5' end of the transcript.

  • There are in several cases with more than 2 UTRs per transcript. What to do with them?

This is the first scaffold of the final gff file, containing all UTRs.

awk '{OFS="\t"; if($3=="UTR"){print $1,$2,$3,$4,$5,".",$7,$10,$12}}' Geneannotation_all.gtf | sed 's/";//g; s/"//g' > Geneannotation_all.UTR.gff
RNA-Seq rna-seq gtf gff • 2.7k views
ADD COMMENT
0
Entering edit mode

From the htseq-count FAQ

I have a GTF file? How do I convert it to GFF? No need to do that, because GTF is a tightening of the GFF format. Hence, all GTF files are GFF files, too. By default, htseq-count expects a GTF file.

For getting the UTRs I would use grep.

There are in several cases with more than 2 UTRs per transcript. What to do with them?

That probably depends on your biological research question, you could consider merging them.

ADD REPLY
0
Entering edit mode

There is the possibility to download a gtf file containing information about the 5' UTRs per transcript. How can I add the information? (I just find information on the exons, and cdsStart and End)

ADD REPLY
0
Entering edit mode

Sorry, add which information to what?

ADD REPLY
0
Entering edit mode

Sorry. I want to download a gtf and bed file containing the location of the 5'UTR per transcript, so that I can use it in htseq and bedtools for further analysis. I already have an alignment.

ADD REPLY
0
Entering edit mode

GTF/GFF files (which can be converted to bed) are available from Ensembl and contain UTR information. You could filter the file using grep.

ADD REPLY
0
Entering edit mode
7.2 years ago
Jeffin Rockey ★ 1.3k

If I understood properly, available is a gtf file with exon and cds info but without UTR coordinates specified as such, which you would like to add.

A twisted and not so nice approach would be as below. But I suppose it would meet you requirement.

i) Download gtfToGenePred and genePredToGtf

ii) gtfToGenePred yourGenemodel.gtf yourGenemodel.gtf.genepred

iii) genePredToGtf -utr file yourGenemodel.gtf.genepred yourGenemodel.WithUtr.gtf

Please give this a try and check.

Addendum:

Also see the below post which resolves a similar requirement.

UTR annotation on top of reference GTF

ADD COMMENT

Login before adding your answer.

Traffic: 3164 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6