It's been too long since we've had a Code golf! Following Pierre's suggestion in this question, here is a code golf about finding ORFs :)
So, here's the problem:
Write a program that finds the longuest ORF (closed or open) in a DNA sequence and returns the beginning and ending positions of the ORF, as well as the frame shift (-1, -2, -3, +1, +2, +3). Use the Standard Genetic Code. Note: the start/end positions are for the DNA sequence, not the corresponding amino acid sequence. In case there are no stop codons, return a +1 frame shift and the begining and end positions of the sequence.
As previously, you can use anything that can be run on a Linux terminal (your favorite language, emboss, awk...). The main interests of code golf question are to:
- See diversity of approaches
- Take on a small challenge
- Show off :)
- Most importantly: have fun!
Here is an example run:
find_orf("ACGTACGTACGTACGT") Should return:
frame: +1 ORF_start: 1 ORF_end: 16
Or, in another format:
"+1 1 16"
You can use this test file to test the output of your program.
EDIT: It appears that the challenge is a bit less simple than I thought it would be... I'll put a one week extension before giving the correct answer. Please don't hesitate to put anything you would use from the command line, including things like a command-line version of ORFinder or the such.
The correct answer will be given to the most popular answer on Thursday, March 3rd at 15:00 hour (US Eastern Time).
EDIT: Answer goes to Brad and the Clojure solution :) Nice to see some real life examples in Lisp! Could you plug a recursive function in there? :P It seems that this problem was not that small after all, so many thanks to all who contributed answers!