Fasta file substitution
1
1
Entering edit mode
8.4 years ago
max1415r ▴ 10

I have a batch protein FASTA file. I wan to create output file with position for each amino acid with possible mutation (remaining 19 aa) in tab-delimited file.

e.g. sequence

>sp|Q6NUK1|SCMC1_HUMAN Calcium-binding mitochondrial carrier protein
MLRWLRDFVLPTAACQDAEQPTRYETLFQALDRNGDGVVDIGELQEGLRNLGIPLGQDAE
>sp|Q6KCM7|SCMC2_HUMAN Calcium-binding mitochondrial carrier protein
MLCLCLYVPVIGEAQTEFQYFESKGLPAELKSIFKLSVFIPSQEFSTYRQWKQKIVQAGD

Output file; (Protein ID, position, amino acid, substitution(19). It will be given for all protein(around 4000)

Q6NUK1    1    M    A
Q6NUK1    1    M    R
Q6NUK1    1    M    N
Q6NUK1    1    M    D
Q6NUK1    1    M    C
Q6NUK1    1    M    Q
Q6NUK1    1    M    E
Q6NUK1    1    M    G
Q6NUK1    1    M    H
Q6NUK1    1    M    I
Q6NUK1    1    M    L
Q6NUK1    1    M    K
Q6NUK1    1    M    F
Q6NUK1    1    M    P
Q6NUK1    1    M    S
Q6NUK1    1    M    T
Q6NUK1    1    M    W
Q6NUK1    1    M    Y
Q6NUK1    1    M    V
SNP sequence • 1.9k views
ADD COMMENT
1
Entering edit mode
8.4 years ago
mkulecka ▴ 360

It's very easy using BioPython:

from Bio import SeqIO
from Bio.Alphabet import IUPAC

for seq_record in SeqIO.parse("example.fasta", "fasta"):
    record_name=seq_record.id.split("|")[1]
    for I in range(0,len(seq_record.seq)):
        letter=seq_record.seq[i]
        position=i+1
        IUPAC_list=list(IUPAC.protein.letters)
        IUPAC_list.remove(letter)
        for item in IUPAC_list:
            new_list=[record_name,str(position),letter,item]
            print(("\t").join(new_list))
ADD COMMENT
0
Entering edit mode

Thank you. It was very helpful.

ADD REPLY

Login before adding your answer.

Traffic: 2658 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6