Entering edit mode
8.9 years ago
thom_otis
•
0
Hello!
I understand that my problem is simple, but I can't solve it. Who can help me to write a script that split one embl file into several so that each sequence is kept in a separate file (the identifier of the end sequence of each sequences is \)?
Part of embl file:
ID comp0_c0_seq1; SV 1; linear; unassigned DNA; STD; UNC; 205 BP.
XX
DE len=205 path=[1:0-135 1445:136-204]
XX
SQ Sequence 205 BP; 64 A; 54 C; 31 G; 56 T; 0 other;
GTATTGAACT GCAGAGCATT AAATGCTGCA ACTCAGTGCT TAGAATTCAT TAGATTCAGA 60
GCAACGAACC CTAAATACTG AGCTGTCCCA TTAAATACTC TGCAGTTCAA TACTTAGCAT 120
TCACCATTAA ACATAACACT TCCCGAGTTT CCACCATCCA TAAACAGCAG GCATTGTAAC 180
CTGTAGGCTC TCTCCACGGT TACCT 205
//
ID comp0_c0_seq2; SV 1; linear; unassigned DNA; STD; UNC; 205 BP.
XX
DE len=205 path=[4094:0-135 1445:136-204]
XX
SQ Sequence 205 BP; 59 A; 50 C; 35 G; 61 T; 0 other;
AGAGTATTAA ATGTTGCAGT TCAGTGCTTA AAATTTATTG GATTCAGAGA ATCTTCAAAT 60
TCAACGGACC CTAAACACTG AGCTGTCGCA TTAAATGCTC TGCAGTTCAA TGCTTAGCTT 120
TCACCATTAA GCATAGCACT TCCCGAGTTT CCACCATCCA TAAACAGCAG GCATTGTAAC 180
CTGTAGGCTC TCTCCACGGT TACCT 205
//
ID comp1_c0_seq1; SV 1; linear; unassigned DNA; STD; UNC; 244 BP.
XX
DE len=244 path=[3:0-88 875:89-243]
XX
SQ Sequence 244 BP; 71 A; 51 C; 63 G; 59 T; 0 other;
GCAGAATTTA AGGCTATGAA TCAGGAGGTT CATAATTCCT TAAGGAGGGG AGTATGATGC 60
GGAGCATCCA CGCTCACCTC CACTCCACCG CATTGTCTTC GAGCTGTGAC AGCCAGCGCA 120
TAATATTCAA GAGCTATTGA CAGGTGTTGA AACGCGGCAG CCTTGCATAC TATTGAAGGA 180
CCACGTTTCA TTATTGTGAT CTATAAGAAG ACAGCTGATG CGATCATGAG GAAGGAAGAA 240
GGCT 244
//
Please do not close a post unless it was posted by mistake (and is irrelevant to the site). To mark a question solved, accept relevant answer(s).
Accepting as answer to stop question being bumped by Biostars bot.