Adding Domain Annotation Identifiers To Fasta Headers
2
0
Entering edit mode
10.3 years ago
elfeu ▴ 40

Hello guys!

I'm needing to add information from an entry I made in my pfam file sequences in fasta format. I have an annotation file with two columns: the first column I have the name of the sequences in the second column and have the annotation (eg, contig01 domain01.hmm). I like that the multi-fasta file to be like this:

contig01 domain01.hmm [sequence]

contig02 domain0434.hmm [sequence]

Does anyone have a script to do this?

thank you

perl fasta annotation • 4.6k views
ADD COMMENT
2
Entering edit mode
10.3 years ago
5heikki 11k

Assuming your contig names are as described without additional fields:

cat test
contig01 domain01.hmm
contig02 domain0434.hmm

cat test.fasta
>contig01
abcd
>contig02
efgh

join -1 1 -2 1 -o 2.1,1.2,2.2 <(sort -k1,1 test) <(cat test.fasta | tr "\n" " " | tr ">" "\n" | grep . | sort -k1,1) | awk '{print ">"$1"_"$2"\n"$3}' > test.annotated.fasta

cat test.annotated.fasta
>contig01_domain01.hmm
abcd
>contig02_domain0434.hmm
efgh
ADD COMMENT
0
Entering edit mode

Hi,

Thank you very much. could you please add some modification so that this could work on multiple-line fasta. If possible, some detail explanation regarding those commands would be great. Thanks

ADD REPLY
0
Entering edit mode
10.3 years ago
Kenosis ★ 1.3k

Given your datasets (and a more detailed posting of your question here), the following produces your requested output:

use strict;
use warnings;

my $file1 = shift;
my ( %hash, @F );

while (<>) {
    $hash{ $F[0] } = $F[1] if @F = split;
}

local $/ = '>';
push @ARGV, $file1;

while (<>) {
    print ">$F[0]_$hash{ $F[0] }\n$F[1]\n" if @F = split />|\n/ and $hash{ $F[0] };
}

Usage: perl script.pl foo.fa annotations.txt [>outFile.fa]

The last, optional parameter directs output to a file.

The script first creates a hash from the annotations' file, where the key is the identifier and its associated value is the annotation. While its reading the fasta file in chunks of records, it appends the annotation to the header and prints the record if it finds a matching identifier.

Hope this helps!

ADD COMMENT

Login before adding your answer.

Traffic: 2427 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6