Retrieve CDS sequence from XP accession number
1
0
Entering edit mode
5.1 years ago

Hi!!

I looking for a faster way to retrieve my CDS sequences from a list of protein accession numbers from NCBI. I know how to do it manually and one by one, but I need a quicker way to solve this problem.

Thank you CS

ncbi • 1.3k views
ADD COMMENT
0
Entering edit mode

You can use NCBI eutils for batch retrieval. More info about this available on: https://www.ncbi.nlm.nih.gov/books/NBK179288/ and http://bioinformatics.cvr.ac.uk/blog/ncbi-entrez-direct-unix-e-utilities/

ADD REPLY
2
Entering edit mode
5.1 years ago
GenoMax 141k

Actual examples for @Sej' suggestion (output truncated to save space).

$ efetch -db nuccore -id "XP_001563456 " -format fasta_cds_na
>lcl|XM_001563406.1_cds_XP_001563456.1_1 [locus_tag=LBRM_15_0070] [db_xref=UniProtKB/TrEMBL:A4H7X7,GeneID:5413973] [protein=conserved hypothetical protein] [protein_id=XP_001563456.1] [location=1..1404] [gbkey=CDS]
ATGCCCTTGTCCTGCGTCGCCAAAGCTGAGGATGTCTTGCAGAAGACTGTGCATCTCTCCAGAGGCGGCC
TCTGCGCAGAGTTCACAGCGGAGGACATCCAGCGCATCACGGACGCCGACGTGCTCCGCTACCTCTCCAC
CCACTCTAATGCACGCACCGAATTGGACGGCGGTATCAACACCGCACCTGTTGAAAAGTCGCTCGCTCCT
GTGACGGGGGCGGCAGACATGGAGGTGCACATGGAGGCCTTGCAGGAGGCGATCAGCACATTTATTACAG

$ efetch -db nuccore -id "XP_001563456 " -format fasta_cds_aa
>lcl|XM_001563406.1_prot_XP_001563456.1_1 [locus_tag=LBRM_15_0070] [db_xref=UniProtKB/TrEMBL:A4H7X7,GeneID:5413973] [protein=conserved hypothetical protein] [protein_id=XP_001563456.1] [location=1..1404] [gbkey=CDS]
MPLSCVAKAEDVLQKTVHLSRGGLCAEFTAEDIQRITDADVLRYLSTHSNARTELDGGINTAPVEKSLAP
VTGAADMEVHMEALQEAISTFITVVDNEGCRYEIRVGALGHVQVPIDDDSYADGASLHEDEGDIEVAPAS
DAVHVGMSGEKSAVTEEATSAAVSRPSSEVTPAASHQKGWPVRRPQPSKPVRPARAAAHLSARVRQQNRF
ADD COMMENT

Login before adding your answer.

Traffic: 3225 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6