Biostar Beta. Not for public use.
Question: Reformatting Fasta Files
0
Entering edit mode

Hi I would like to know the awk command for changing

>AY156743.1 HIV-1 clone P.ENV from USA envelope glycoprotein (env) gene, partial cds
TCAATTACTGGTAAATGGCAGTCTAGCAGAAGAAGARGTAGTAATTAGATCTGAAAATTTCACGAACAAT
GCTAAAAYCATAATAGTACAGCTGAAAGAMCCTGTAGAAATTAATTGTACAAGACCCAACAACWATACAR
G.....

to

>AY156743_P
TCAATTACTGGTAAATGGCAGTCTAGCAGAAGAAGARGTAGTAATTAGATCTGAAAATTTCACGAACAAT
GCTAAAAYCATAATAGTACAGCTGAAAGAMCCTGTAGAAATTAATTGTACAAGACCCAACAACWATACAR
G.....

or simply deleting:

ENV from USA envelope glycoprotein (env) gene, partial cds

thanks

ADD COMMENTlink 15 months ago Gritz122 • 0 • updated 15 months ago h.mon 25k
Entering edit mode
1

Hello Gritz122!

Questions similar to yours can already be found at:

We have closed your question to allow us to keep similar content in the same thread.

If you disagree with this please tell us why in a reply below. We'll be happy to talk about it.

Cheers!

ADD REPLYlink 15 months ago
h.mon
25k
Entering edit mode
0

Let alone that none of the formats above are fasta, why wouldn't you use a text editor? Or what other cases shall be covered? what you show is not simply deleting

ENV from USA envelope glycoprotein (env) gene, partial cds

but deleting everything after the white space, reformatting the version and having all on one line...

also note the use of the 101010 symbol to have text formating

ADD REPLYlink 15 months ago
Carambakaracho
♦ 1.2k
Entering edit mode
0

my mistake what I meant was based off of this fasta file:

>AY156743.1 HIV-1 clone P.ENV from USA envelope glycoprotein (env) gene, partial cds
TCAATTACTGGTAAATGGCAGTCTAGCAGAAGAAGARGTAGTAATTAGATCTGAAAATTTCACGAACAAT
GCTAAAAYCATAATAGTACAGCTGAAAGAMCCTGTAGAAATTAATTGTACAAGACCCAACAACWATACAR
GAAAAAGGATAASTMYAGGACCAGGGAGAGTACTTTAYACAACAGGAGAAATAATAGGAAATATAAGAAA
AGCATATTGTAACATTAGTAGAGCAAAATGGAATAACACTCTAGGACAGATAGCTGAAAAATTAAGAGAA
CAATTTAATAAAACAATARTCTTTAAKCAATCCTCAGGAGGGGACCCAGAAATTGYAATGCACAGTTTTA
ACTGTGGAGGGGAATTTTTCTACTGTAATACATCACAACTGTTTAATAGTACCTGGAATAGTACTAAAAA
TGACACKACCRSGRCASGMGWTACCATAMTCACAYKCCCATKCAAACTAATTCTAATYATRWRCMTGTRG
CWGGRAGTAGSAAWRKMWRYGYMKGCCMTTCMCRKCCWARGWAKMAKTAGATSMWCWTYWAMTGKKMYAG
KGCTACTACKARCWRGMGATRGTGGTRAKRACAMCRSTRCTAATGAGACCTTCAGACCTGGAGGAGGARA
TATGAGGGACAATTGGAGAAGTGAATTATATAAATATAAAGTAGTAAAAATTGAACCATTAGGAGTAGCA
CCCACCAAGGCAAAGAGAAGGGTGGTGCAAAGAGAAAAAAGA

I want to reformat the heading to: using awk

>AY156743.1 HIV-1 clone P
TCAATTACTGGTAAATGGCAGTCTAGCAGAAGAAGARGTAGTAATTAGATCTGAAAATTTCACGAACAAT
GCTAAAAYCATAATAGTACAGCTGAAAGAMCCTGTAGAAATTAATTGTACAAGACCCAACAACWATACAR
GAAAAAGGATAASTMYAGGACCAGGGAGAGTACTTTAYACAACAGGAGAAATAATAGGAAATATAAGAAA
AGCATATTGTAACATTAGTAGAGCAAAATGGAATAACACTCTAGGACAGATAGCTGAAAAATTAAGAGAA
CAATTTAATAAAACAATARTCTTTAAKCAATCCTCAGGAGGGGACCCAGAAATTGYAATGCACAGTTTTA
ACTGTGGAGGGGAATTTTTCTACTGTAATACATCACAACTGTTTAATAGTACCTGGAATAGTACTAAAAA
TGACACKACCRSGRCASGMGWTACCATAMTCACAYKCCCATKCAAACTAATTCTAATYATRWRCMTGTRG
CWGGRAGTAGSAAWRKMWRYGYMKGCCMTTCMCRKCCWARGWAKMAKTAGATSMWCWTYWAMTGKKMYAG
KGCTACTACKARCWRGMGATRGTGGTRAKRACAMCRSTRCTAATGAGACCTTCAGACCTGGAGGAGGARA
TATGAGGGACAATTGGAGAAGTGAATTATATAAATATAAAGTAGTAAAAATTGAACCATTAGGAGTAGCA
CCCACCAAGGCAAAGAGAAGGGTGGTGCAAAGAGAAAAAAGA
ADD REPLYlink 15 months ago
Gritz122
• 0
• updated 15 months ago
h.mon
25k
Entering edit mode
0

I have a large FASTA file and want to shorten the headers, all of them contain "ENV from USA envelope glycoprotein (env) gene, partial cds" , and I want to know how to shorten the header

ADD REPLYlink 15 months ago
Gritz122
• 0
Entering edit mode
1

keep it simple then. Use a text editor like notepad++ and the find replace function or sed 's/ENV from USA envelope glycoprotein (env) gene, partial cds//' <in.fasta >out.fasta

ADD REPLYlink 15 months ago
Carambakaracho
♦ 1.2k
Entering edit mode
0

Use the code button (the button with 101010) to format fasta file snippets, because > is interpreted as quoting text by the MarkDown parser - I've just done for your posts above.

I closed your question because it has been answered before, please search the forum if the post linked above does not solve your problem.

ADD REPLYlink 15 months ago
h.mon
25k
This thread is not open. No new answers may be added
Similar Posts
Loading Similar Posts
Powered by the version 2.0