Biostars beta testing.
Question: How to use awk to look for the lowest e-value field?
0
Entering edit mode

Hello!, I am trying to parse some results given by HMMER and in the tblout file I was able to isolate the matches I want.

Nonetheless, the same value is being repeated several times if it matches to just one profile.

For example, this is one read is repeated 3 times:

SRR6033660.161030 FAM007172 4e-15 4.2e-15 63.4 63.4
SRR6033660.1458607 FAM019859 2.5e-12 2.7e-12 55.0 54.9
SRR6033660.1458607 FAM015326 4e-14 4.2e-14 58.8 58.7
SRR6033660.1458607 FAM000764 7.5e-25 8.1e-25 94.6 94.5

It matches to 3 families, nonetheless I just want to select the row which has the lowest e-values (3rd and 4th columns)

How can I write an awk command that gives me this output?

SRR6033660.161030 FAM007172 4e-15 4.2e-15 63.4 63.4
SRR6033660.1458607 FAM000764 7.5e-25 8.1e-25 94.6 94.5

Thanks!

ADD COMMENTlink 12 months ago jxi21 • 0 • updated 12 months ago RamRS 21k
Entering edit mode
1

Since you're here, sort can do this with the -g option.

$ sort -g -k3,4 inputfile | head -n2 
SRR6033660.1458607 FAM000764 7.5e-25 8.1e-25 94.6 94.5
SRR6033660.161030 FAM007172 4e-15 4.2e-15 63.4 63.4
ADD REPLYlink 12 months ago
manuel.belmadani
• 830
Entering edit mode
0

Hello jxi21!

We believe that this post does not fit the main topic of this site.

This is an awk question, right? Please search stackoverflow. On the other hand, if your aim is to pick entries with the least p-value, I'll reopen the question and not restrict the tool to awk unless there's good reason.

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLYlink 12 months ago
RamRS
21k
This thread is not open. No new answers may be added
Similar Posts
Loading Similar Posts
Powered by the version 2.0