working with promoter sequence of tomato using biopython
0
0
Entering edit mode
8.2 years ago
kws15 ▴ 40

Hi everyone, I have been working with tomato (s.lycopersicum) with its wild relative(s.pennellii). So below is a pair of ortholog protein I got previously using some alignment method.

XP_004243382.1--> protein LOC107025503 [Solanum pennellii]

XP_015081778.1-->protein LOC101253299 [Solanum lycopersicum]

you can easily search them on ncbi, so what I have to do is to analyse th promoter sequences of the proteins of all the orthologous proteins. I used the biopython script below to extract the promoter sequnce of all the gene in every chromosome genbank file for both species. genbank Chromosome files can be downloaded through the ftp site of ncbi

ftp://ftp.ncbi.nlm.nih.gov/genomes/Solanum_pennellii/

below is the biopython script

def prom_extract_multi(in_gbk = "in.gbk", prom_len = 1000, file_out = "prom_out.fna"):

from Bio import Seq
from Bio.Seq import Seq
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord

prom_out = ""   

GBrecord = next(SeqIO.parse(in_gbk, "genbank"))
for feature in GBrecord.features:
    if feature.type =="gene":
        my_start = feature.location._start.position # Identifies the start position of the gene on the sense strand (5' to 3' irrespective of actual coding strand).
        my_end = feature.location._end.position # Identifies the end position of the gene on the sense strand (5' to 3' irrespective of actual coding strand).
        start_1000 = my_start - prom_len
        end_1000 = my_end + prom_len
        if feature.strand == -1:
            feat_loc = str(feature.location)
            my_prom = GBrecord[my_end:end_1000].reverse_complement()
            prom_out += "> Promoter rev_comp" + "___" + feature.qualifiers['db_xref'][0] + "___" + feat_loc + "\n"
            prom_out += my_prom.seq.__str__() + "\n\n"

        elif feature.strand == 1:
            feat_loc = str(feature.location)
            my_prom = GBrecord[start_1000:my_start]
            prom_out += "> Promoter" + "___" + feature.qualifiers['db_xref'][0] + "___" + feat_loc + "\n"
            prom_out += my_prom.seq.__str__()+"\n\n"

file=open(file_out, 'w')
file.write(prom_out)
file.close()

so the output file of the promoter fasta file looks like something like this

Promoter___GeneID:101264245___209673:210580 TCCTAAACGTCGTTTTTAATTGTTTGATTTTTTGTTAAAAAATTAATTTGTGT........

So i tried to get my promoter sequence for the example pair shown above first, I relate the geneID from the output to the protein id(XP_ 004243382.1) of my example pair to get the promoter sequence for both species, however I was only able to get the one s.pennellii, did I miss something here?? I searched the chromosome genbank file and found something could be causing this,

http://www.ncbi.nlm.nih.gov/gene/?term=NW_004194338%20

the link shows “See also 69 discontinued or replaced items.” could this missed/replaced items are the part of my promoter sequence??

Also, can anyone please suggest what I can use to analyse promoter sequences, I tried promoterwise by ebi by failed to install on my laptop, I am thinking just to use blastn , I hope I have explained my question clear enough

thank you very much

biopython promoter tomato • 3.3k views
ADD COMMENT
0
Entering edit mode

When you create the output file handler with the open function, use the option "a" instead of "w" to append the result to the output instead of overwritten it.

ADD REPLY
0
Entering edit mode

There's a bug in your FASTA output: you have a space between ">" and "Promoter_..." which can break many tools which will interpret this as a zero length identifier.

ADD REPLY
0
Entering edit mode

In this line of code prom_out += "> Promoter" + "___" + feature.qualifiers['db_xref'][0] + "___" + feat_loc + "\n"

I am getting KeyError: 'db_xref'

Any suggestions please

ADD REPLY
0
Entering edit mode

Me too premraj.preeti. If you found a solution for this weird error, can you please share? i have Biopython version 1.71 Thanks.

ADD REPLY

Login before adding your answer.

Traffic: 2710 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6