Biostar Beta. Not for public use.
Embl File Corrections
0
Entering edit mode
2.5 years ago
CS • 10
United Kingdom

Dear All,

I am trying to correct a batch of old embl files to the new agreed format.

currently agreed format for the Pfam domains is:

/inference="protein motif:PFAM:PF03466" as an example , we can have multiple inferences per entry , but not repeat domains i.e. if there are repeat domains , we should just have one /inference="..." per entry.

/note=*domain "HMMPfam:PF09339;HTH_lclR;2e-05;codon 269-306"

so this would become /Inference="protein motif:Pfam:PF09339"

and duplicates per entry should be removed.

I did this with perl regex but now when I converted to /inference etc, but sometimes there was originally a second line which didn’t get converted with the script, so we have something like

FT “495-678”

And several other pieces of comments, the problem is that is very different what can be found there, so it is very difficult to pick out with a regex I am thinking. This will prevent the embl file from being valid.

Any help would be appreciated. I am also attaching a small file below:

ID   Lsalivarius_cp400_4_358425-448092; SV 1; linear; unassigned DNA; STD; UNC; 89668 BP.
XX
FH   Key             Location/Qualifiers
FH
FT   source          1..89668
FT                   /note="scaffold4|size89668"
FT   CDS             complement(671..1594)
FT                   /note="*GO: aspect=; GOid=GO:; term=; evidence=IEA;
FT                   date=20121112"
FT                   /note="*GO: aspect=Component; GOid=GO:0016020;
FT                   term=membrane; evidence=IEA; date=20121112"
FT                   /note="*db_xref: 07-11-2012"
FT                   /note="*db_xref: Membrane insertion protein, OxaA/YidC"
FT                   /note="*domain: PANTHER:PTHR12428;IPR001708;6.4E-35;codon
FT                   35-247"
FT                   /note="*domain: PANTHER:PTHR12428:SF11;T;6.4E-35;codon
FT                   35-247"
FT                   /note="*domain: Pfam:PF02096;60Kd inner membrane
FT                   protein;9.8E-45;codon 57-247"
FT                   /note="*domain: PRINTS:PR00701;60kDa inner membrane protein
FT                   signature;8.6E-6;codon 131-154"
FT                   /note="*domain: PRINTS:PR00701;60kDa inner membrane protein
FT                   signature;8.6E-6;codon 214-237"
FT                   /note="*domain: Phobius:TRANSMEMBRANE;Region of a
FT                   membrane-bound protein predicted to be embedded in the
FT                   membrane.;-;codon 230-249"
FT                   /note="*domain: Phobius:TRANSMEMBRANE;Region of a
FT                   membrane-bound protein predicted to be embedded in the
FT                   membrane.;-;codon 208-224"
FT                   /note="*domain: Phobius:NON_CYTOPLASMIC_DOMAIN;Region of a
FT                   membrane-bound protein predicted to be outside the
FT                   membrane, in the extracellular region.;-;codon 157-175"
FT                   /note="*domain: Phobius:NON_CYTOPLASMIC_DOMAIN;Region of a
FT                   membrane-bound protein predicted to be outside the
FT                   membrane, in the extracellular region.;-;codon 27-49"
FT                   /note="*domain: Phobius:TRANSMEMBRANE;Region of a
FT                   membrane-bound protein predicted to be embedded in the
FT                   membrane.;-;codon 131-156"
FT                   /note="*domain: Phobius:CYTOPLASMIC_DOMAIN;Region of a
FT                   membrane-bound protein predicted to be outside the
FT                   membrane, in the cytoplasm.;-;codon 197-207"
FT                   /note="*domain: Phobius:SIGNAL_PEPTIDE_H_REGION;Hydrophobic
FT                   region of a signal peptide.;-;codon 8-20"
FT                   /note="*domain: Phobius:NON_CYTOPLASMIC_DOMAIN;Region of a
FT                   membrane-bound protein predicted to be outside the
FT                   membrane, in the extracellular region.;-;codon 225-229"
FT                   /note="*domain: Phobius:SIGNAL_PEPTIDE_C_REGION;C-terminal
FT                   region of a signal peptide.;-;codon 21-26"
FT                   /note="*domain: Phobius:SIGNAL_PEPTIDE;Signal peptide
FT                   region;-;codon 1-26"
FT                   /note="*domain: Phobius:TRANSMEMBRANE;Region of a
FT                   membrane-bound protein predicted to be embedded in the
FT                   membrane.;-;codon 50-74"
FT                   /note="*domain: Phobius:CYTOPLASMIC_DOMAIN;Region of a
FT                   membrane-bound protein predicted to be outside the
FT                   membrane, in the cytoplasm.;-;codon 250-307"
FT                   /note="*domain: Phobius:TRANSMEMBRANE;Region of a
FT                   membrane-bound protein predicted to be embedded in the
FT                   membrane.;-;codon 176-196"
FT                   /note="*domain: Phobius:CYTOPLASMIC_DOMAIN;Region of a
FT

I can send you the entire file.

Thanks CS

format • 1.4k views
ADD COMMENTlink
0
Entering edit mode

wrap the file in the code format. It would be easier to visualize. there will be an icon with 101010 on it.

ADD REPLYlink
0
Entering edit mode

Hi Bharat , finally i found the way to view it clearly. Can you please help me from here?

Thanks CS

ADD REPLYlink
0
Entering edit mode

Be sure to specify exactly what you have now and what you want the change to be. You say "/inference" and "/Inference" above but I'm guessing it should be the former. Also, there is no "/domain" tag in the example you show, but there is a "note" tag specifying the domain (i.e., "/note="*domain:..."), is that what you want to change?

ADD REPLYlink
0
Entering edit mode

Hi SES,

Thanks for the heads up. I have : /note=*domain "HMMPfam:PF09339;HTH_lclR;2e-05;codon 269-306"

so this would become /Inference="protein motif:Pfam:PF09339" I only want Pfam domains to be the part of final files.

Thanks

ADD REPLYlink
0
Entering edit mode

Have you tried contacting ENA about this directly (via datasubs@ebi.ac.uk)? (To be honest, I think removing the ability to store positions of the matches and the number of matches seems a bit strange/silly!)

ADD REPLYlink
0
Entering edit mode

Thanks Sarah,

I have written to ENA about this issue. Lets see what I get from them. CS

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1