Extracting upstream and Downstream of a Gene based on different intervals
1
0
Entering edit mode
5.9 years ago
always_learning ★ 1.1k

Hi All,

I want to extract the regions for the few genes in upstream and downstream based on certain intervals for example in the range of 0-50, 50-5000, 5000-50000 basepair. What could be the best way to do that? One way I could think off is to make a bed files for these intervals separately and then search it using tools. Is there any other smarter way to do this?

Thanks

intervals upstream • 3.8k views
ADD COMMENT
0
Entering edit mode

You are on the right track. The general approach is described here: retrieving sequences of a upstream and downstream of a coordinate for hg19

ADD REPLY
0
Entering edit mode

To do something like this in the past, I began by retrieving the TSS locations for all genes from Ensembl Biomart. I then wrote a simple AWK command like the one in the link provided by @genomax to get exactly what I want, using the TSS positions as the starting location.

EDIT: With sachas answer below you now have the 2 main ways to do this. Both begin with getting the coordinates of your gene and then selecting what you want from a reference file via an scripted command (e.g. AWK) or a bedtools function.

ADD REPLY
0
Entering edit mode

Thanks !!

How will I extract something like interval 50-5000 BPs region from a gene?

ADD REPLY
4
Entering edit mode
5.9 years ago
sacha ★ 2.4k
  • Create a bed file with your gene. ( using refSeq.txt for instance ) : gene.bed
  • Use bedtools slop to make regions larger according how many basepair you want : gene_slop.bed
  • Use bedtools intersect between gene.bed and gene_slop.bed to keep only upstream & downstream region
  • Use bedtools gefasta to extract sequence
ADD COMMENT

Login before adding your answer.

Traffic: 2632 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6