Question

Mapping peptide to the source genomic region

0

Entering edit mode

9.1 years ago

genie66 ▴ 30

I have a list of peptide sequences, their respective protein names, their start and end co-ordinates in their protein sequences. Now I wanted to map them back to genomic source and get the genomic start and end co-ordinates(preferably exons) . I have tried several tools like proteogenomic mapping tools but no luck. Peptide atlas could able to provide the exonic co-ordinates but only one peptide is possible at a time, I have hundreds of peptides! Is there is any other way to do this! Please help me out! Thanks!

peptide mapping • 3.7k views

ADD COMMENT • link updated 7.4 years ago by microbe77 ▴ 30 • written 9.1 years ago by genie66 ▴ 30

score 1 · Answer 1 · 2016-11-11

Might be too late, but this is how to do it! 1. make a six frame peptide library from you genome (all possible peptides), I use 10 aa +, for 4.5M bp bacterium about 0.25M peptides 2. use this as a reference to get all peptides that map to your possible peptides 3. Get a fasta file that contains all the genome nucleotide sequence (this should be one entry fastafile that contains ALL nucleotides 4. make a nucleotide blast database using makeblastdb command from local blast installation 5. align your peptides to the genome database using tblastn: tblastn -query <your peptide="" fasta="" file=""> -db <your genome="" database="" (these="" are="" three="" files,="" just="" use="" name="" without="" extension)="" -out="" <name="" of="" the="" out="" file="" you="" want=""> -outfmt 6 (the -outfmt 6 will give you tabular results) -max_target_seqs <1 or more, use 1> (not sure about this option though double check!) -evalue 0.001 (to eliminate partial alignment)

open file in excel and only keep genome name (useually NC_xxxx), start, stop. Save this file as .bed which will be readable in almost all genome browsers (I use IGB)
The code that makes six frames is in python. I will paste the code hereunder:

better to find the code here: https://github.com/microbe777/fasta2six_frames

score 0 · Answer 2 · 2015-02-25

0

Entering edit mode

9.1 years ago

raunakms ★ 1.1k

using tools like tBLASTn could be a good starting point where it compares a protein query sequence against a nucleotide sequence database dynamically translated in all six reading frames (both strands).

ADD COMMENT • link 9.1 years ago by raunakms ★ 1.1k

score 0 · Answer 3 · 2015-02-25

0

Entering edit mode

9.1 years ago

Siva ★ 1.9k

You could try Scipio which uses blat to search a query protein sequence against its genome. It outputs the intron/exon boundaries and splice sites.

ADD COMMENT • link 9.1 years ago by Siva ★ 1.9k