I have a text file full of amino acids (CA.txt) as well as some other data. Here is a snippet of the text file
ATOM 109 CA ASER A 48 10.832 19.066 -2.324 0.50 61.96 C
ATOM 121 CA AALA A 49 12.327 22.569 -2.163 0.50 60.22 C
ATOM 131 CA AGLN A 50 8.976 24.342 -1.742 0.50 56.71 C
ATOM 145 CA APRO A 51 7.689 25.565 1.689 0.50 51.89 C
ATOM 158 CA GLN A 52 5.174 23.336 3.467 1.00 43.45 C
ATOM 167 CA HIS A 53 2.339 24.135 5.889 1.00 38.39 C
ATOM 177 CA PHE A 54 0.900 22.203 8.827 1.00 33.79 C
ATOM 188 CA TYR A 55 -1.217 22.065 11.975 1.00 34.89 C
ATOM 200 CA ALA A 56 0.334 20.465 15.090 1.00 31.84 C
ATOM 205 CA VAL A 57 0.000 20.066 18.885 1.00 30.46 C
ATOM 212 CA VAL A 58 2.738 21.762 20.915 1.00 27.28 C
It is only a 36 KB file. Essentially, my problem is that a few of the amino acids have the letter A in front of them where they are not supposed to be. Amino acid abbreviations are supposed to be 3 letters long. I have attempted to use regular expressions to remove the A at every instance of A in front of an amino acid abbreviation and write the new text to the file "CA-Finale.txt" . Here is my code so far
def Trimmer(txtFileName):
i = open('CA-Finale.txt', 'w')
j = open(txtFileName, 'r')
for record in j:
with open(txtFileName, 'r') as j:
content= j.read()
content_new = re.sub(r'(^ATOM\s+\d+\s+CA\s+)A(\w\w\w)', r'\1\2', content, flags = re.M)
i.write(content_new)
Trimmer('CA.txt')
When I run this, the text file that is generated is 16.7 MB in size, so something has clearly gone wrong. What could it be?
At least part of your problem is that you open the file twice...and the first one never gets closed.
Did you try removing/commenting out this line:
j = open(txtFileName, 'r')
westin.kosater