Hi all- I am trying use the ape package in R to pull protein sequences from GenBank. The sequence I'd like to retrieve is the following: https://www.ncbi.nlm.nih.gov/protein/AAG26087.1
To do this I used the following code:
library(ape)
kl_seq = read.GenBank("AAG26087.1", as.character = TRUE)
And the sequence I obtained was:
$AAG26087.1
[1] "m" "s" "r" "s" "s" "k" "r" "n" "r" "d" "g" "r" "g" "w" "v" "n" "g" "r" "k" "k" "d" "r" "d" "r" "k" "k" "k" "k" "k" "k" "d" "n" "w" "g" "n" "k" "g" "g" "k" "k" "d"
[42] "k" "d" "g" "g" "a" "a" "k" "r" "a" "r" "t" "d" "m" "d" "s" "g" "g" "k" "r" "r" "g" "g" "s" "d" "k" "r" "d" "h" "r" "r" "r" "k" "a" "n" "k" "r" "k" "a" "a" "g" "g"
[83] "k" "h" "s" "k" "k" "r" "t" "d" "r" "r" "r" "r" "t" "a" "g" "s" "v" "g" "g" "v" "n" "g" "g" "s" "r" "g" "a" "g" "g" "g" "v" "n" "m" "s" "v" "s" "s" "r" "t" "g" "g"
[124] "d" "v" "r" "g" "n" "g" "w" "d" "a" "d" "s" "s" "c" "r"
attr(,"species")
[1] "Hepatitis_delta_virus"
However, this sequence clearly does not match the sequence of length 214 aa in the above link. I'm wondering if there's something I'm missing here- perhaps there is another argument, function, or package I should use to pull the correct sequence? Is there any way to bulk download from GenBank directly, given a list of accession IDs? Any tips would be much appreciated. Thanks!
Use NCBI's unix utilities.
Replace ID's in a loop if you have multiple ID's to download
genomax, thanks very much for your help- I will try this. Is there any way to use efetch to get the nucleotide sequence as well?
Just use the right accession number and it will return nucleotide sequence.