Biostar Beta. Not for public use.
download protein sequences from NCBI
0
Entering edit mode
15 months ago
guillaume.rbt • 590
France

Hi all,

I would like to download all protein sequences from one species on NCBI:

https://www.ncbi.nlm.nih.gov/protein?linkname=bioproject_protein&from_uid=261773

This is maybe trivial, but is there a way to download all sequences concatenated in only one fasta?

Thanks a lot,

Guillaume

ADD COMMENTlink
1
Entering edit mode

there is a send to option through which you can download all the sequences. After just remove the fasta headers to make a single fasta

awk 'BEGIN{a=0}{if($0~/^>/){if(a==0){print}a++;}else{print}}' input.fasta >out.fasta
ADD REPLYlink
0
Entering edit mode

thanks for the response, how do you use the send to option? Is this on the console or on the website?

ADD REPLYlink
3
Entering edit mode
16 months ago
France/Nantes/Institut du Thorax - INSE…

using the ncbi interface you can just click on "Send to > File"

or using eutils:

curl "http://www.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=bioproject&id=261773&linkname=bioproject_protein"  | xmllint --xpath '//LinkSetDb' - | xmllint --format - | grep "<Id>" | cut -d '>' -f 2 | cut -d '<' -f 1 | while read L; do curl "http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=${L}&rettype=fasta" ; done
>gi|821074095|gb|KKY28990.1| putative uracil phosphoribosyltransferase [Diplodia seriata]
MFVHASGPESIKFKHLQGQVQVLLVDSVINSGATILDFVEAIREINPGIRIVVVAGTVQAQCISPNNPFY
KTLAQHGDISLVALRSSETKFTGSGGTDTGNRLFNTTHLL

>gi|821074094|gb|KKY28989.1| putative integral membrane protein [Diplodia seriata]
MPQYFPWPYSVDPLPEDLRRGLWPVGIFALMSTVATLALLCWITYRLVSWRKHYRSYVGYNQYVLLIYNL
LLADLQQSISFLISFHWIHTDSMLAPSPACFGQAWLVQIGDISSGMFVLAIALHTFFSVVKGRQIPFRAF
LIGTIVIWALALLLTVLGPALHGSDYFTAAGAWCWASDKYETERLWLHYLWIFIIEFGTVIIYALIFIYL
RKQLVSIASAHQHSTQNKVSQAARYMVLYPLTYVLLTLPLAAGRMATMTGQTLPIAYYCAAGSMMTSCGW
VDAALYALTRRVLVSNEIDQPQGGAGKGASSSGGRTGYGGHGSSHTATGWDIASFSDRKGGMGADHSVTI
TGGLDARGSNFIDMDELSKGGVHHHATERVGRPKHKGSSTPSTQGLTRARSSSTSARESTPRGSTDSILA
GLGGVRAETKVEIRVEPANGFMLPGEGSGSNGSSGMSTPNGRTVEVVGNSHAMRPRSGSPY

ADD COMMENTlink
2
Entering edit mode
6 months ago
Sej Modha 4.2k
Glasgow, UK

NCBI Unix e-utils version of the @pierre's solution

esearch -db bioproject -query 261773|elink -target protein |efetch -format fasta
ADD COMMENTlink
1
Entering edit mode
2.3 years ago
tlorin • 250
Switzerland

Here is a well-explained tutorial for your problem :)

ADD COMMENTlink
0
Entering edit mode

The link you provided doesn't seem to work.

ADD REPLYlink
0
Entering edit mode

This should be better now, thanks!

ADD REPLYlink
0
Entering edit mode

thank you all for your help! works fine

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1