how to remove asterisk characters from a translated sequences (fasta format)?
0
1
Entering edit mode
8.6 years ago
seta ★ 1.9k

Hi everybody,

I used Transdecoder to translate the assembly transcriptome, there is asterisk characters (*) in the translated sequences indicating stop codon. I plan to use Interproscan on this assembly and * cause an error. Could you please let me know how I can remove these characters from fasta file? removing is the right or they have to replaced with stop codon, but which of them?! Thanks for any help

sequencing Assembly alignment • 7.3k views
ADD COMMENT
3
Entering edit mode
sed -i 's/*//g' filename.fasta
ADD REPLY
2
Entering edit mode

At first I thought you were trolling the question-poster, but it turns out that sed (at least as implemented in Cygwin) will interpret '*' as a literal asterisk. However, it might be safer to do

sed -i 's/\\*//g' filename.fasta

just to make it crystal clear to the interpreter to treat '*' as '*.

ADD REPLY
2
Entering edit mode

Indeed, sed can be confusing if one doesn't escape things. Compare echo "fooo*{1}" | sed "s/o*//g", echo "fooo*{1}" | sed "s/o*{1}//g" and echo "fooo*{1}" | sed "s/*{1}//g".

ADD REPLY
1
Entering edit mode

This is an old post, but for the benefit of future googlers: a sed 's/*//g' solution is absolutely safe in terms of escaping (I would not do it with the -i option though) - there is no reason for panic. tr -d '*' would be more elegant though. BUT: nothing can replace a format-aware utility, because in a general case stops can appear not only in the end of a protein sequence but also in the middle (which is not expected for Transdecoder though), and asterisks are allowed to appear in headers. The golden standard is emboss's transeq which has the -trim and -clean options to strip the final or all stops respectively.

ADD REPLY
1
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 2188 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6