How To Calculate Number Of Bases Found In Introns/Cdss/Intergenic Space
2
1
Entering edit mode
12.0 years ago
Panos ★ 1.8k

I have a GenBank (or even gff) file and want to count the number of bases found in introns/CDSs/intergenic space.

Does anyone know if there's any script already out there? I'll start writing mine (most probably using BioPerl). I'm just being lazy and also don't want to re-invent the wheel :D

intron cds • 3.2k views
ADD COMMENT
2
Entering edit mode
12.0 years ago

One thing you could do is get (or generate) a BED or GFF file that lists each exon coordinate. Then use the mergeBed tool in bedtools to create longer intervals, finally all you need is to add up the lengths of the merged intervals.

ADD COMMENT
1
Entering edit mode
12.0 years ago
Rm 8.3k

using a GFF or GTF file for example using "gencode.v7.annotation_goodContig.gtf" file

to print number of bases in gene regions:

cat  gencode.v7.annotation_goodContig.gtf | awk '/gene/ { len +=$5-$4} END {print len}'

Similarly you can extract information for other regions ....

ADD COMMENT
0
Entering edit mode

My problem is that intron coordinates are only implied in my gff file (they're the regions inside a gene that are not CDSs). The same is also true for intergenic space. And last, there might be multiple splice variants per gene. I only want one of them...

ADD REPLY
0
Entering edit mode

you have to make your requirements more explicit - for example which splice variant do you want? in all I would say that there is probably no tool that does exactly what you want.

ADD REPLY
0
Entering edit mode

Albert, any variant would be good because I only want to have a rough estimate of the portion of genome found in introns/CDSs/intergenic space. Anyway, I've started writing my own script... I just wanted to make sure that I'm not re-inventing the wheel. Thank you all guys for your time!

ADD REPLY

Login before adding your answer.

Traffic: 2618 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6