How to count number of unique exons/introns (and their locations) in a GTF file?
0
0
Entering edit mode
7.8 years ago
scchess ▴ 640

I have a large GTF file, I want the following information?

  • Number of unique exons
  • Number of unique introns
  • Locus of those unique exons
  • Locus of those unique introns

What'll be the best way to do that?

gtf exons • 3.1k views
ADD COMMENT
1
Entering edit mode

To somewhat reiterate what venu said, this depends on how one defines "unique". If you just want to get rid of exons/introns that are shared between transcripts then make a list of all exons/introns, sort it, and use uniq. If, on the other hand, you want to merge overlapping exons and thereby not have introns overlapping exons (possibly regardless of strand) then you'll want to either use bedtools or GenomicRanges in R.

ADD REPLY
0
Entering edit mode

This might work for exons, but what about introns?

ADD REPLY
0
Entering edit mode

That's the point of merging exons within genes (or between them if that matters to you). In R that's reduce(), in bedtools I think you can merge something with itself.

ADD REPLY
0
Entering edit mode

What do you mean by unique exon? Obviously each exon has a different locus. You mean you have duplicates in your GTF (exons with same locations) ?

ADD REPLY
0
Entering edit mode

I mean different transcripts in the file would have the same exons, and they must be filtered.

ADD REPLY

Login before adding your answer.

Traffic: 1964 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6