Find all genes in multiple regions
2
0
Entering edit mode
2.9 years ago

Hi!

I've read a lot of posts similar to this but I've not seen (or may have missed) a good solution of how to find all genes in a region when you have multiple (200+ regions).

For example, I have a list of regions like this

chr3:56953782-58953782
chr3:229305477-231305477
chr13:48536705-50536705
chr10:62754024-64754024
chr6:31633697-33633697
chr6:31608932-33608932
chr6:31587588-33587588

UCSC's interface is great, but it gives me a list of all results in one file without indication of which rows belong

#hg19.knownCanonical.chrom  hg19.knownCanonical.chromStart  hg19.knownCanonical.chromEnd    hg19.kgXref.spDisplayID hg19.kgXref.geneSymbol
chr3    56761445    57113336    Q9NR81-2    ARHGEF3
chr3    56974067    56994880        ARHGEF3-AS1
chr3    57094468    57109460    SPT12_HUMAN SPATA12
chr3    57124009    57199403    I17RD_HUMAN IL17RD
chr3    57231943    57234280    HESX1_HUMAN HESX1
chr3    57261764    57307498    DP13A_HUMAN APPL1
chr3    57310555    57326132    C9JX97_HUMAN    ASB14
chr3    57327726    57530071    DYH12_HUMAN DNAH12
chr3    57541980    57547768    PDE12_HUMAN PDE12

Ensambl's biomart is great too, but similar problem.

Is there a way I can get something along the lines of:

region  Genes
chr3:56953782-58953782  GENE1
chr3:56953782-58953782  GENE2
chr3:56953782-58953782  GENE3
chr3:229305477-231305477    GENE4
chr3:229305477-231305477    GENE5

Thank you!

ucsc ensembl biomart • 970 views
ADD COMMENT
0
Entering edit mode
2.9 years ago
Emily 23k

You can get the genomic locus of the gene with BioMart, which you can then reference back to your input list, but that's non-ideal.

An alternative is to use the Ensembl REST API overlap endpoint. You can then program in any language you like to run through your list and print with your loci and genes.

Looking at your list, these are all single bases. Are they variants? In which case, have you considered using the VEP instead?

ADD COMMENT
0
Entering edit mode
2.9 years ago

If you transform your list of regions to bed format (sed + awk for instance) and you download the gene definitions you want from UCSC's TableBrowser for instance in bed format too, then a simple bedtools command would do:

sed 's/[:-]/\t/g' regions.txt \
| awk 'OFS="\t"{$2--; print}' \
| bedtools intersect -wao -a - -b genes.bed
ADD COMMENT

Login before adding your answer.

Traffic: 2452 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6