Bulk downloading IMG gene ID information
3
1
Entering edit mode
9.2 years ago
frcamacho ▴ 210

Hi there,

I created a file of hundreds of IMG gene IDs and I wanted to find the genes that corresponds to the IMG gene IDs. I do not want to have to manually input the IDs into IMG database and find the gene information for each since I have hundreds of IDs. Is there a way to access the IMG database in the terminal or download a file with each ID and their gene information?

Any suggests are much appreciated.

Thanks!

gene • 6.6k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
1
Entering edit mode
9.2 years ago
Josh Herr 5.8k

Unfortunately, I'm not aware of an easy answer to this question.

There have been many changes to how JGI/IMG handle data which is largely due to the issues with storing a ton of data. What was an effort to store a lot of analyses and other genome sequencing projects (not done at JGI) has resulted in recently moving to just storing raw data sequenced by JGI. I'm unclear what is available and what is not available now -- maybe someone can clear this up for me. There are a few different FTP servers (for JGI, IMG, IMG/M, etc.) and, from my understanding, they all have some level of redundancy and unique data on them. I haven't tried the API for all platforms. I have recently been using globus to query/transfer data from JGI FTPs with pretty good success.

The link provided by marina.v.yurieva doesn't really give you any context on what is happening at IMG and there have been a lot of changes in the last two years since that comment was made.

I'm not currently clear if the API tool provided by JGI works for JGI's IMG also see here for more information.

This biostars question (Get Ncrna Genes In Fasta Format From Img Database) is a little old, but it may help you with how to proceed. Let us know if you have any luck.

ADD COMMENT
0
Entering edit mode
8.0 years ago
saurabh.mk • 0

Hello,

I hope your problem was solved long ago!

I recently found a way for myself, so may be its useful to someone. 1. From IMG, one can download all information as an excel file, and you can ask for NCBI taxonid to be included. I found the NCBI taxon id to be the only link consistently reported for IMG data. 2. From the NCBI ftp site, one can get a file (something like assembly_summary.csv) with similar details which also includes the NCBI taxon id and a link to the folder of the particular genome on the ftp server. 3. From there, you can download all the information that NCBI has about the genome.

In summary, IMG id -> NCBI taxon id -> NCBI ftp server link -> NCBI information.

This may be useful for NCBI- http://www.ncbi.nlm.nih.gov/genome/doc/ftpfaq/#howtofind

Saurabh

ADD COMMENT
0
Entering edit mode

Hi Saurabh,

I'm trying to convert NCBI taxid to IMG ID but I couldn't find the place to download this mapping info (step 1 in your answer above). I tried selecting all genomes in the genome browser and export their related information but there was no NCBI taxid. Would you mind directing me to the correct link?

Thanks a lot.

ADD REPLY
0
Entering edit mode

I have managed to find the box for selecting NCBI ID at the bottom of the page. Please ignore my question. Thanks!

ADD REPLY
0
Entering edit mode
7.5 years ago
hanchen1996 ▴ 10

This post is quite old, but in case anyone is looking for a way to download JGI data, there is a link http://genome.jgi.doe.gov/help/download.jsf go to download with globus

ADD COMMENT

Login before adding your answer.

Traffic: 2545 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6