Searching Fasta File For Specific Ids
3
0
Entering edit mode
10.8 years ago
andysw90 ▴ 20

Hi all,

Apologies if this is answered elsewhere, I've searched for it but can't find!

I have a list of UniProt IDs (e.g., Q95T64 Q8INK6 Q9GNK5) and want to search through them all (n=~450) to see if they exist in a fasta file of a species, outputting two lists, one list of the IDs that do appear in the species fasta, and one list of those that do not!

Grep perhaps?

Many thanks

Andy

EDIT: The list of IDs are contained within a .txt file, return seperated

fasta search • 3.1k views
ADD COMMENT
5
Entering edit mode
10.8 years ago

On linux just type:

sort <(cat list.txt) <(grep '>' uniprot.fa | cut -c5-10) | uniq -d

to get a list of UniProt IDs that are present in your fasta file.

If you want to list IDs that don't appear among fasta records, just change uniq -d to uniq -u.

ADD COMMENT
0
Entering edit mode

how shoulld i pull out the mutiple sequences from one fasta file by using gene id header in another text file?? what is command use in linux?

ADD REPLY
0
Entering edit mode
10.8 years ago

extract the IDs from the fasta header, sort both files, use comm

ADD COMMENT
0
Entering edit mode
10.8 years ago
KCC ★ 4.1k

Python script below. Save it in a file like myscript.py

from sets import Set
import sys

# parameters: uniprotfile fastafile

uniprotfile = open(sys.argv[1])
fastafile = open(sys.argv[2])


A = Set([])

for line in uniprotfile:
    line = line.strip()
    A.add(line)

B = Set([])
for line in fastafile:
    if line.startswith('>'):
        line = line[1:].strip()
        B.add(line)


C = A.intersection(B) #names that are in both FASTA and UniProt list
print "Present:"
for el in C:
    print el

print
print "Not present:"
D = A.difference(B) #names not found in fasta
for el in D:
    print el

Next, run the script in this way: python myscript uniprot.txt fasta.fa

ADD COMMENT

Login before adding your answer.

Traffic: 2514 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6