Splice Junction file intersection with genome annotation
3
0
Entering edit mode
9.8 years ago
ruchiksy ▴ 50

Hello,

I have a tab delimited format Splice Junction file and the file looks something like this:

chr1    11212    12009    1    1    0    0    2    48
chr1    11672    12009    1    1    0    0    1    31
chr1    11845    12009    1    1    0    0    1    28
chr1    12228    12612    1    1    1    0    1    32
chr1    12722    13220    1    1    1    0    3    9
chr1    14830    14969    2    2    1    0    218    50
chr1    15039    15795    2    2    1    0    98    50
chr1    15948    16606    2    2    1    1    10    48
chr1    16766    16857    2    2    1    0    24    44
chr1    16766    16875    2    2    0    0    2    36

The task is to filter out lines in which Column 6 has value 1, Column 7 has value 1 and Column 8 has value 10 or greater.

I have been going through the bedtools documentation but I am not quite sure on how to get started, I would appreciate a few pointers on how to get going. My input file is going to be in the tab delimited format and I also have the Gencode V.19 GTF file for annotation.

Thanks!

Edit

  • Column 1: chromosome
  • Column 2: first base of the intron (1-based)
  • Column 3: last base of the intron (1-based)
  • Column 4: strand
  • Column 5: intron motif: 0: non-canonical; 1: GT/AG, 2: CT/AC, 3: GC/AG, 4: CT/GC, 5: AT/AC, 6: GT/AT
  • Column 6: 0: unannotated, 1: annotated (only if splice junctions database is used)
  • Column 7: number of uniquely mapping reads crossing the junction
  • Column 8: number of multi-mapping reads crossing the junction
  • Column 9: maximum spliced alignment overhang

Added the field names.

RNA-Seq splice-junction bedtools • 4.8k views
ADD COMMENT
0
Entering edit mode

Hello ruchiksy!

It appears that your post has been cross-posted to another site: SeqAnswers.

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY
0
Entering edit mode

Hi

Can you please tell me from where can I get the splice junction annotation file for human.

ADD REPLY
0
Entering edit mode

You can download a GTF file from Ensembl.

ADD REPLY
2
Entering edit mode
9.8 years ago

Since this isn't a BED file, it'd be extra work to get bedtools to deal with it. Just use awk:

awk '{if($6!=1 && $7!=1 && $8<10) print $0}' original.txt > filtered.txt

For your example, that would print:

chr1    11212    12009    1    1    0    0    2    48
chr1    11672    12009    1    1    0    0    1    31
chr1    11845    12009    1    1    0    0    1    28
chr1    16766    16875    2    2    0    0    2    36
ADD COMMENT
0
Entering edit mode

It worked, thanks Mr. Devon. Although there is a small correction I wanted reads greater than 10 not less than, I changed it when I ran this.

ADD REPLY
1
Entering edit mode
9.8 years ago
Ann ★ 2.4k

It's hard to tell from your example what each field is meant to represent as there are many possible ways you could use BED format to indicate splicing patterns. If you can give more detail it will be easier to recommend your next step.

ADD COMMENT
1
Entering edit mode

I have just added the field names, should have done that in the first place. Thanks!

ADD REPLY
0
Entering edit mode
9.0 years ago
shirley0818 ▴ 110

Hi ruchiksy,

I found your input bed file is quite useful. May I ask which tool/software you used to obtain your splice Junction file?

Many thanks,
Shirley

ADD COMMENT
0
Entering edit mode

I think its STAR

ADD REPLY
0
Entering edit mode

Definitely looks like my recent STAR output to me, in case another vote was needed.

ADD REPLY

Login before adding your answer.

Traffic: 2520 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6