How to get pfam seed sequence taxonomy info?
1
0
Entering edit mode
8.9 years ago
beegrackle ▴ 90

Is there an easy(ish) way to get the taxonomy information for pfam seed sequences? For the life of me I cannot figure out any other way to do it than blasting each seed sequence and getting the top hit off of ncbi, which is ridiculous because pfam links to ncbi and vice-versa. Since pfam has the option for downloading sequences by taxa, surely this information is available.

For example: blasting a seed sequence for pfam000078 gets me to a genbank entry which leads me to this page: http://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?uid=249567. And oh look, it links to pfam000078.... However, the actual ids for the seed sequences (e.g. RTJK_DROME/502-757) seem perfectly useless.

pfam blast taxonomy ncbi • 2.6k views
ADD COMMENT
0
Entering edit mode
3.7 years ago
jubillante • 0

Here is how I got taxonomy strings starting with a fasta seed alignment

  1. Get the list of accession IDs
    grep ">" seed.fasta | sed 's/>//' |cut -f1 -d'/' > pfamaccessioncodes.txt
    
  2. Upload your list (or copy and paste) into the Retrieve/ID mapping tool https://www.uniprot.org/uploadlists/

  3. Download a tab-separated version of your information. The data you want should be in columns 1 and 7.

ADD COMMENT

Login before adding your answer.

Traffic: 2514 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6