Genes around open regions
2
0
Entering edit mode
6.6 years ago

Hello Everyone!

I am new here and new to field of genomics and informatics.

Recently we generated ATAC-seq data of normal and cancer cells.. The data shows open regions in cancer cells compared to normal cells. I would like to know if there is a program out there to generate a list of genes that are in the neighborhood of these open regions. Which I would then feed to IPA or GSEA to see what function those genes enrich for. Ideally I would like to come up three lists , one for genes within 25kb of open regions, 2nd for genes within 75kb and 3rd for genes within 150kb

Please suggest how can I achieve that.

atac-seq gene • 1.6k views
ADD COMMENT
3
Entering edit mode
6.6 years ago

Via BEDOPS bedmap:

$ bedmap --echo --echo-map-id-uniq --range 25000 open-regions.bed genes.bed > answer.25kb.bed
$ bedmap --echo --echo-map-id-uniq --range 75000 open-regions.bed genes.bed > answer.75kb.bed
$ bedmap --echo --echo-map-id-uniq --range 150000 open-regions.bed genes.bed > answer.150kb.bed

IDs could be fed into http://www.ebi.ac.uk/QuickGO/ for classification (depending on format).

To get a genes.bed file, e.g. via Gencode:

$ wget -qO- ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_26/gencode.v26.basic.annotation.gff3.gz \
  | gunzip -c - \
  | convert2bed --input=gff - \
  | awk '$8=="gene"' - \
  > genes.bed
ADD COMMENT
0
Entering edit mode

Thank you Alex, I really appreciate your help.

ADD REPLY
0
Entering edit mode

I want to use the release_19 of gencode so i made changes to the above code and was able to successfully generate genes.bed file. But, I couldn't get the "bedmap --echo --echo-map-id-uniq --range 25000 open-regions.bed genes.bed > answer.25kb.bed" code to work. It creates the output file but the contents are just a repeat of open-regions file.

ADD REPLY
0
Entering edit mode

The --echo option reports the open region element, the --echo-map-id-uniq reports all IDs of genes from the Gencode v19 set that overlap the open region.

If you want the genes themselves, you could do something like:

$ bedmap --echo-map --multidelim '\n' --range 25000 open-regions.bed genes.bed | sort-bed - > genes.25kb.bed

Etc.

Leaving out --echo and putting in --echo-map and --multidelim '\n' options gives you the genes that overlap the open region within 25kb.

See the documentation for more information about the --echo-map-* options available to you. It might seem a little overwhelming but the docs try to walk through several examples of how they work.

ADD REPLY
0
Entering edit mode

Thanks Alex that really helped :)

ADD REPLY
2
Entering edit mode
6.6 years ago

Personally, I'd use GREAT, as you can just feed it your differential regions and it'll do all sorts of enrichment analyses. If you want to feed it three lists of annotated regions as you describe, you can use bedtools closest with the -d option and feed it your regions and a gene list in BED or GTF format (which you could download from the UCSC table browser). That will get the closest gene to each region and the last column will report the distance to said gene for your region, which would allow you to create your three lists via awk, perl, python, excel, whatever.

ADD COMMENT
0
Entering edit mode

Thanks Jared, I appreciate it.

ADD REPLY

Login before adding your answer.

Traffic: 1992 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6