Identify reads overlapping a junction in gff file
2
0
Entering edit mode
9.3 years ago

Dear all,

I have a gff annotation file with coordinates for an overlapping genomic feature, e.g.

chr1 1-100
chr1 100-250
chr1 250-300
chr1 300-500

and an alignment file (bam/bed) with reads mapping to that chromosome. I need to extract reads that are specifically overlapping the junctions of that genomic feature. So, in a bed file like this,

chr1 50-70
chr1 90-110
chr1 150-170
chr1 245-255

Only extract

chr1 90-110
chr1 245-255

Any existing tools that can help me do that. I know bedtools intersect will give me all the common overlap but I am only looking for reads mapping the junctions of concurrent genomic features. I am not well versed in bioinformatics so I can't write my own scripts but am able to run existing tools.

Any ideas would be appreciated.

Thanks

junction gff overlap • 2.9k views
ADD COMMENT
0
Entering edit mode

Search this site/the web for bedtools intersect

ADD REPLY
0
Entering edit mode

Thanks for the idea, Istvan Albert. That actually makes a lot of sense. I can do that.

Thanks again

ADD REPLY
1
Entering edit mode
9.3 years ago

One way to do this is to make a pipeline that:

  1. Converts the annotations to BED via convert2bed -i gff
  2. Uses awk to get one base elements, if the previous stop and current start position values are equal
  3. Uses BEDOPS bedops --range to expand (for example) a 10-base window around the resulting one-base elements (adjusting this window to whatever you need)
  4. Maps the ranged elements to find overlapping reads via bedmap and convert2bed -i bam
$ convert2bed -i gff < annotations.gff \
    | awk ' \
        BEGIN { \
            previous_chr = "chr0"; \
            previous_stop = -1; \
        } \
        { \
            current_chr = $1; \
            current_start = $2; \
            current_stop = $3; \
            if ((current_start == previous_stop) && (current_chr == previous_chr)) { \
                print current_chr"\t"current_start"\t"(current_start + 1); \
            } \
           previous_chr = current_chr; \
           previous_stop = current_stop; \
        } \
    }' \
    | bedops --everything --range 10 - \
    | bedmap --echo --echo-map --skip-unmapped - <(convert2bed -i bam < reads.bam) \
    > answer.bed
ADD COMMENT
0
Entering edit mode
9.3 years ago

Often the way to solve these problems lies in building one ore more new dataset that when used with bedtools intersect produces what you need.

For example here create file that contains a marker in the form of a 1bp long interval for its start of each feature. Then in another file create markers for the 1bp interval of the end of each feature. Now intersect the two files and you'll end up with an file with intervals that are common. Use this file to intersect with the reads.

ADD COMMENT

Login before adding your answer.

Traffic: 2559 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6