Extracting 5'UTR and 3'UTR bed files from gtf file
2
3
Entering edit mode
6.4 years ago
c_u ▴ 520

Hi, I have the gtf file for mouse, and I wanted to create bed files for the 5' and 3' UTRs using this gtf file ( I know there is method using the UCSC Table website, but I want to generate using the GTF file itself).

As far as I know, information about the UTR regions is not explicitly stated in the gtf file. So, what would be a good strategy to extract these bed files from the gtf file?

Thanks!

rna RNA-Seq gtf bed • 9.7k views
ADD COMMENT
0
Entering edit mode

GTFs downloaded from Gencode have UTR annotations. The designation of 3' and 5' can be derived from the UTR coordinates and their corresponding strand.

ADD REPLY
5
Entering edit mode
6.4 years ago
Robert Sicko ▴ 630

Another option.

python extract_transcript_regions.py -i genes.gtf -o genes --gtf

now convert this blockbed (bed12) to bed6

cat genes_3utr.bed | bed12ToBed6 -i stdin -n > genes_3utr_bed6.bed
cat genes_5utr.bed | bed12ToBed6 -i stdin -n > genes_5utr_bed6.bed
ADD COMMENT
0
Entering edit mode

PYTHON! I love it !!

ADD REPLY
4
Entering edit mode
6.4 years ago
ATpoint 81k

The 5'UTR comprises the beginning of exon1 up to the base right upstream of the start codon. In contrast, the 3' UTR is the base right downstream of the stop codon to the end of the last exon. One would therefore need the GTF and a fasta (or BSgenome in case you want to use R) of the genome to get the required information. There is a Git repository that apparently has a script to do exactly what you want , but I never tested it towards accuracy. Maybe give it a careful try.

In case you want to put code together yourself, I would do for the 5'UTR:

  • extract the first exon of every gene
  • get the nucleotide sequence of it
  • get coordinates of the start codon ATG
  • set start(exon1) to start(ATG)-1 as 5'UTR

For the 3' UTR the same, but search for the last exon and the stops (TGA, TAA, TAG) and define 3'UTR as end(stopCodon)+1 to end(lastExon). Be sure to be strand-specific.

ADD COMMENT

Login before adding your answer.

Traffic: 2800 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6