Biostar Beta. Not for public use.
Remove overlapping features from a gtf file?
0
Entering edit mode
4.0 years ago
RoryC • 30
Uppsala, Sweden

Hi, this seems like quite a straightforward question so apologies if it has been asked before (I couldn't find anything similar). I have a .gtf file containing CDS coordinates for a chromosome, and I plan to extract codons containing 4d sites. Therefore I would like to remove any CDS that overlap (on the same, or opposite, strand) so there is no ambiguity about what is a 4d site and what isn't. I've been trying to do this with bedtools but I'm not having much luck, as intersect would need to have an option where features that overlap 100% are ignored for me to compare the file to itself. Thanks

ADD COMMENTlink
0
Entering edit mode

Bedtools intersect -v option with -r 1 does not work?

ADD REPLYlink
0
Entering edit mode

Hi, thanks for your answer. Do you mean -v -f 1 -r ? This gives me zero output as every CDS is overlapped 100%.

ADD REPLYlink
2
Entering edit mode
4.0 years ago
RoryC • 30
Uppsala, Sweden

So after coming back to this some time after I think I've found a relatively simple way of doing this. First use bedtools merge with -o count to merge overlapping elements and add a fourth column which shows how many original elements contributed to the new elements. Then use a command prompt to remove any rows that have a number greater than 1 in the fourth column, thus removing anything that originally overlapped and was merged. For example with a bed file (the same could be done with gtf):

bedtools merge -i file.bed -c 1 -o count | awk ' { if($4==1) print $0} ' > newfile.bed

ADD COMMENTlink
1
Entering edit mode
13 months ago
geek_y 9.7k
Barcelona/CRG/London/Imperial

You can use the script (dexseq_prepare_annotation.py) given in the DEXSeq package to collapse overlapping exons. See Figure.1 of this paper http://genome.cshlp.org/content/22/10/2008.full

If this is not what you wanted, you may need to tweak the script a bit.

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1