Remove overlapping features from a gtf file?
2
0
Entering edit mode
8.5 years ago
RoryC ▴ 30

Hi, this seems like quite a straightforward question so apologies if it has been asked before (I couldn't find anything similar). I have a .gtf file containing CDS coordinates for a chromosome, and I plan to extract codons containing 4d sites. Therefore I would like to remove any CDS that overlap (on the same, or opposite, strand) so there is no ambiguity about what is a 4d site and what isn't. I've been trying to do this with bedtools but I'm not having much luck, as intersect would need to have an option where features that overlap 100% are ignored for me to compare the file to itself. Thanks

overlap CDS gtf bedtools • 5.0k views
ADD COMMENT
0
Entering edit mode

Bedtools intersect -v option with -r 1 does not work?

ADD REPLY
0
Entering edit mode

Hi, thanks for your answer. Do you mean -v -f 1 -r? This gives me zero output as every CDS is overlapped 100%.

ADD REPLY
2
Entering edit mode
8.4 years ago
RoryC ▴ 30

So after coming back to this some time after I think I've found a relatively simple way of doing this. First use bedtools merge with -o count to merge overlapping elements and add a fourth column which shows how many original elements contributed to the new elements. Then use a command prompt to remove any rows that have a number greater than 1 in the fourth column, thus removing anything that originally overlapped and was merged. For example with a bed file (the same could be done with gtf):

bedtools merge -i file.bed -c 1 -o count | awk ' { if($4==1) print $0} ' > newfile.bed
ADD COMMENT
1
Entering edit mode
8.5 years ago

You can use the script (dexseq_prepare_annotation.py) given in the DEXSeq package to collapse overlapping exons. See Figure.1 of this paper http://genome.cshlp.org/content/22/10/2008.full

If this is not what you wanted, you may need to tweak the script a bit.

ADD COMMENT

Login before adding your answer.

Traffic: 2529 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6