This site is a beta test.
Question: Protein coding mm10 refseq bed
1
Entering edit mode
12 months ago
rbronste • 240

Just trying to export a bed file from table browser for protein coding gene body locations in mm10 containing the following header/columns:

chr start end NA genename NMname strand

Not sure if there is a more straightforward way to get the following arrangement, thanks!

ADD COMMENTlink 12 months ago rbronste • 240 • updated 12 months ago vkkodali ♦ 1.1k
1
Entering edit mode
12 months ago
arup ♦ 1.3k
India

Use the Selected fields option in Output format and click on get output then choose required columns from selection page.

Link to table browser

Table Browser

Select columns:

Selection  Page

ADD COMMENTlink 12 months ago arup ♦ 1.3k
0
Entering edit mode
12 months ago
vkkodali ♦ 1.1k
United States

If you are interested in RefSeq data, why not download the GFF3 annotation from NCBI and parse that file? You can download the GFF3 file from RefSeq FTP site here:

ftp://ftp.ncbi.nlm.nih.gov/genomes/Mus_musculus/GFF_interim/interim_GRCm38.p6_top_level_2017-09-26.gff3.gz

A gene can be protein-coding and yet have one or more non-coding transcript variants. Hence, you need to first get the list of gene_ids that are coding at least one protein. You can do so by parsing the GFF3 file as follows:

zgrep -v '^#' interim_GRCm38.p6_top_level_2017-09-26.gff3.gz | awk 'BEGIN{FS="\t";OFS="\t"}($3=="CDS"){print $9}' | grep -o 'GeneID:[0-9]*' | sort -u > ~/GRCm38.p6_protein_coding_genes.txt

Then, you can grep for those geneids in the GFF3 file where the column 3 has gene to get the entire range of the gene and strand. It is unclear to me whether you are interested in just the range for gene or each transcript variant (because one of your columns is NM). Depending on exactly what you want, it is fairly easy to come up with an appropriate unix command to parse the GFF3 file and return a bed-style file.

ADD COMMENTlink 12 months ago vkkodali ♦ 1.1k

Login before adding your answer.

Powered by the version 1.5.2