Biostar Beta. Not for public use.
How to download all available sequences of a gene from all bacteria using R
0
Entering edit mode
10 months ago
mschmidt • 0

I need to download all/many sequences of a specific bacterial gene from Genbank nuccore database from entries limited to complete genome sequences. I prefer using R. Querying: 'Bacteria[ORNG] AND gyrB[GENE] AND complete genome[TI] ' in web interface results in >10k hits. I do not want to download whole genome sequences but only extracted gyrB sequences to make a local database. I tried

library(rentrez):
db = "nuccore"
query = "Bacteria[ORGN] AND gyrB[GENE] AND complete[TI]" 
found = entrez_search(db, query, config = NULL, retmode = "xml", use_history = FALSE, retmax = 90000)

but this fetch ids for whole genome sequences. Is it possible to get fasta sequences for gryB genes or at least gyrB coordinates however I'm not into downloading whole genome sequences of thousands of genomes.

ADD COMMENTlink
0
Entering edit mode

You can get this data from Ensembl bacteria using the Ensembl Genomes perl API or maybe using the R package biomartr.

ADD REPLYlink
0
Entering edit mode

It would be a great option but I found that BioMart is not currently available for Ensembl Bacteria. https://support.bioconductor.org/p/82585/

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1