Entering edit mode
5.8 years ago
lzhou
•
0
I have a FASTA alignment of 13 genes from 200 different species. How would I go about changing every third nucleotide to either R or Y using command line?
Thanks!
Does that have to follow any rule or is it supposed to randomly set each third base to R or Y? And most importantly, what are you trying to achieve?
I am looking to change all purines to R and pyrimidines to Y.
If you want to change only every third base:
If you want to change all purines/pyrimidines to R/Y:
Hi, I tried running your command and I got the error message "awk: line 2: function gensub never defined". Is there something else I need to do to the command/or my fasta file?
Maybe you need GNU awk. Are you on a Mac by chance? Then you'd need to install the GNU tools.
Try following with gnu-sed: For replacing every third base with appropriate symbol:
For replacing all the bases with appropriate symbol:
It appears that it is replacing nucleotides, but not every 3rd base (or it appears random to me). For example, ATG CTT AAC became ATR YTT RAY. I think the species names in the fasta sequences might be causing this problem, I see Y's and R's in the names. Can I get it to start and stop searching and replacing after each line break (there is a line break after the species name, and one at the end of the sequence)? Thank you for your help!
It is not random. It is working as per pattern. But pattern is incorrect or it must be constrained. Will post once i am done with it. Thanks. @OP
Thank you for your help, much appreciated!
Fasta files must be flattened (
seqkit seq -w 0 input fasta
for flattening fasta)How replacing purines to R and pyrimidines to Y will help find the best partitioning scheme? And what do you mean by "best partitioning scheme"?