Question

How to find start and stop codon for sequences in a fasta file?

0

Entering edit mode

9.0 years ago

grayapply2009 ▴ 280

I did blastn and blastx for my sequences (~400,000 sequences). How do I find and label the start and stop condon for each sequence in a fasta file?

next-gen • 7.2k views

ADD COMMENT • link updated 21 months ago by Ram 43k • written 9.0 years ago by grayapply2009 ▴ 280

0

Entering edit mode

9.0 years ago

Antonio R. Franco ★ 5.1k

Depending upon you got these sequence, it is likely that the start and/or the stop codon are missing

BlastX will be able to find a homologous protein sequence based upon the translation of a internal part of your sequence even though it lack the start and stop codon

ADD COMMENT • link 9.0 years ago by Antonio R. Franco ★ 5.1k

0

Entering edit mode

How do I find start and stop condon in the fasta file if the sequences have at least one of them?

ADD REPLY • link 9.0 years ago by grayapply2009 ▴ 280

1

Entering edit mode

For an individual sequence, you can try services like:

NCBI ORFFinder
Try EMBOSS. There are several programs available, in graphic and text mode. EMBOSS will allow you to use a fasta file with many sequences at once.

ADD REPLY • link updated 21 months ago by Ram 43k • written 9.0 years ago by Antonio R. Franco ★ 5.1k

Ram · Accepted Answer · 2015-04-24

1

Entering edit mode

9.0 years ago

Kamil ★ 2.3k

I suggest that you read about the genetic code to find the codons relevant to your organism.

~~You'll want to search for codons, perhaps with a tool like fasgrep. You might write your own script if you have a particular output format in mind.~~

On second glance, it seems that fasgrep is only useful for searching for sequence identifiers, not the sequences themselves.

ADD COMMENT • link updated 21 months ago by Ram 43k • written 9.0 years ago by Kamil ★ 2.3k

0

Entering edit mode

Thank you for your information, Kamil. So this fastgrep works like ExPASy? It picks the longest possible translated sequence?

ADD REPLY • link 9.0 years ago by grayapply2009 ▴ 280

1

Entering edit mode

~~fasgrep is like grep. It searches for a string in a body of text. In your question, you ask about finding codons. I'd recommend using a search tool like grep to find codons.~~

If you have a different goal, you should edit your question. For example, if you wish to find possible coding sequences within a nucleotide sequence, you might consider other tools designed for this purpose:

As you mentioned, ExPASy is a nice portal to find other tools that might meet your needs.

ADD REPLY • link updated 21 months ago by Ram 43k • written 9.0 years ago by Kamil ★ 2.3k

0

Entering edit mode

Yeah, I want to identify start and stop codon for each sequence but how do I know the codons grepped by fastgrep are correct for the coding sequence? I mean there are multiple "ATG"s or "TAG"s. Does this program take frame shift into consideration?

Besides, how do I label those codons when I grep them in a fasta file?

ADD REPLY • link updated 21 months ago by Ram 43k • written 9.0 years ago by grayapply2009 ▴ 280

0

Entering edit mode

If existing programs do not meet your needs, then you should write your own scripts to achieve your goals. If you're familiar with Python, this looks like a good starting point: Identifying open reading frames

Consider providing an example of your input and an example of your desired output. That might increase the clarity of your question.

ADD REPLY • link updated 21 months ago by Ram 43k • written 9.0 years ago by Kamil ★ 2.3k

0

Entering edit mode

Great! I'll take a look at the code. Thank you, Kamil!

ADD REPLY • link 9.0 years ago by grayapply2009 ▴ 280