Entering edit mode
7.4 years ago
Zar
•
0
I'm an undergraduate researcher with a very limited knowledge on bioinformatics. I've been given several short sequences (300bp) in FASTA format that have been aligned to a reference sequence and would like to know if there is a particular software that can analyze the different sequences and identify SNVs as well as report the type of mutation?
Are you sure your data is in fasta format and not fastq? How does the file look like? Which technology was used to generate the data? Please be as informative as possible.
It was originally given to me as a simple text file and I converted it to fasta format so I could open it using BioEdit. The sequence itself was generated using RNA-seq I believe and the alignment was done using Clustal-w.
That's... uncommon, to say the least. Which organism are you working on? Is there a reference genome available? Could you get access to the original data?
I'm working with mummichog samples and there is a reference genome available, but according to my PI the sequencing wasn't as good. As for the original data, I'll have to ask my PI if I can get it. In the mean time I'm just manually highlighting the SNVs and looking at the codon table. Total PITA.
That's a ridiculous method of SNV-calling. Your fasta file contains the pairwise alignment obtained with clustal-W and you want to identify differences between the reference and your data?
Pretty much. Identifying the SNVs isn't as bad, it's determining the mutation type using a codon table that's tedious.
It's not only annoying and slow, but you also have a quite high chance to make mistakes (because you are only human). If you can isolate the original reads (fasta is okay) you can align them to the genome (using STAR). If you could get access to the original fastq data that would even be better.