Good evening,
I am relatively new to bioinformatics and I have been tasked to align give or take 50 to 60 similar species with different formae specialis to find a unique genetic sequence that can be used as a probe. The species are all fungi, if it is of any importance. The entire assemblies can be found in NCBI in FASTA format. Given this,
1) What kind of program can I use to do this? 2) What kind of computer do I need to do this?
I have tried using the MUSCLE tool in AliView, UGENE and Mafft to test run around 3 of these formae specialis to test, but I always get an "out of memory" error.
Thank you.
Even extremely powerful computers will struggle to align fungal genomes in that number. Aligning of whole genomes isn’t trivial (or particularly accurate).
Do you already have a region you want to probe or are you just looking for any conserved site in the genomes?
I've been given a hint that I should focus on noncoding regions since it is most likely that coding regions will have genes that are 99% similar to similar f. sp.
1)Maybe something with K-mers is an option
2)With k-mers any computer
A k-mer tool: http://www.genome.umd.edu/jellyfish.html
Thank you, I will try.
Probe for what? What is the purpose of the probe?
You may want to look at databases of orthologous genes - like OrthoDB or OMA - then the alignment has already been done for you.
For detection of a certain organism that will not yield false positives as much as possible. In other words - for agricultural use
Fungal amplicon taxonomy focuses heavily on the ITS region, isn't that an option for you?
Unfortunately, I am barred from using the ITS region for this specific paper.