Question: From A List Of Gene Symbols To A Bed File With name of the chromosome and Start/end position
2
0
Entering edit mode
6.3 years ago
11yj3312 ▴ 20

Hi,y'all, I have a list of Gene Symbols,How can i transform Gene Symbols to a .bed file with name of the chromosome and Start/end position

gene genesymbols Bed Position chromosome • 2.4k views
ADD COMMENT
0
Entering edit mode

You should add more information, such as the genome build in which you want co-ordinates (hg19?; hg38?; mm9?; mm10?). Also, are these HGNC gene symbols? Are you only interested in the co-ordinates of the canonical isoform?

You could quite easily just download the GENCODE GTF annotation files from here and then extract the information from these using grep

There is most likely a more automated solution.

ADD REPLY
0
Entering edit mode
6.3 years ago

A simple method would be to go to Ensembl biomart, select the relevant organism, select what you want (chromosome and start/end position) and then upload the list of gene symbols you have.

ADD COMMENT
0
Entering edit mode
6.3 years ago

If you want to do things in a more automated fashion, you could install the Ensembl Perl API and then run a Perl script (like the one posted below) to grab exons.

#!/usr/bin/env perl

use strict;
use warnings;
use Data::Dumper;
use Bio::EnsEMBL::DBSQL::DBAdaptor;

my $host    = 'ensembldb.ensembl.org';
my $user    = 'anonymous';
my $dbname  = 'homo_sapiens_core_89_38';
my $port    = '3306';
my $species = 'homo_sapiens';
my $group   = 'core';
my $db = new Bio::EnsEMBL::DBSQL::DBAdaptor(-host =>   $host,
                                            -user =>   $user,
                                            -dbname => $dbname,
                                            -port =>   $port);

my $slice_adaptor = $db->get_SliceAdaptor();

my $slices = $slice_adaptor->fetch_all('chromosome');
foreach my $slice (@{$slices}) {
    my $chr = "chr".$slice->seq_region_name();
    my $genes = $slice->get_all_Genes();
    foreach my $gene (@{$genes}) {
        my $exons = $gene->get_all_Exons();
        my $id = $gene->external_name();
        my $exon_index = 1;
        my $exon_number = $exon_index;
        my $exon_count = scalar(@{$exons});        
        foreach my $exon (@{$exons}) {
            my $start = $exon->start();
            my $end = $exon->end();
            if ($start < $end) {
                my $stable_id = $exon->stable_id();
                my $strand = $exon->strand();
                if ($strand == 1) { 
                    $strand = "+";
                    $exon_number = $exon_index;
                } 
                elsif ($strand == -1) { 
                    $strand = "-";
                    $exon_number = $exon_count - $exon_index + 1;
                } 
                else { 
                    die "unknown value for strand\n"; 
                }
                print STDOUT join("\t", ($chr, $start, $end, $id, $exon_number, $strand))."\n";
                $exon_index++;
            }
        }
    }
}

Be sure to change the dbname and species variables depending on your needs.

Once you have exons with a Ensembl names, you can use a Python script like the following to make a translation table to map Ensembl names to HGNC symbol names.

#!/usr/bin/env python

import sys
from mygene import MyGeneInfo

hgnc_names = []
for line in sys.stdin:
    hgnc_names.append('%s' % (line.strip()))

mg = MyGeneInfo()
results = mg.querymany(hgnc_names, scopes='symbol', species='human', verbose=False)

for result in results:
    sys.stdout.write("%s\t%s\n" % (result['symbol'], result['name']))

From here, if you're working with HGNC names, you can process the Perl script output to include HGNC symbols for all exons, and then use grep to find matches for your genes of interest.

ADD COMMENT

Login before adding your answer.

Traffic: 2044 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6