What Is The Minimum Coverage And Identity Percentage For Protein Domains?
1
0
Entering edit mode
10.5 years ago
SRKR ▴ 180

I am trying to find out existence of protein domains in a set of sequences. I am using BLASTX for the task. I have made a BLASTX of my sequences with the ProDom sequences. I used an e-value cutoff of 1e-3. What should be the identity percentage and the coverage percentage of the domain so that I can say that a protein domain or it's signature is present in a particula region?

Eg: Identity percentage is 80% (80% of the amino acids in the aligned regions are identical) Coverage 60% (The aligned length is 60% of the total length of the domain being compared to)

What other precautions and observations should I make while making such comparison?

• 5.2k views
ADD COMMENT
0
Entering edit mode
10.5 years ago

Conservation diverges according to many parameters at the sequence and organism scales. So there is no accurate answer to your specific question. But many people rely on profile-based search because multiple alignments show conservation insights you cannot get with single sequences. You are using pfam which is a great db and proposes in addition a HMM version. As, on the top of it, they worked closely with HMMer people their HMM profiles have been high-end quality made. Each of them does provide us with e.g. so-called "Trusted score Cutoff" (TC in HMM headers) computed by scanning each HMM with all sequences uses to build those, and taking the min score which results from this comparison. HMMer outputs several (many) kinds of results including a "per-domain" score and e-value. As you're using blastx I guess you want to start with a nucleic query which cannot be handled by HMMer to compare with a protein DB. So you could either add a 6-frames translation step or an ORF calling before HMMscan or HMMsearch.

ADD COMMENT

Login before adding your answer.

Traffic: 2969 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6