Hello- I am using blat to match a set of sequences against the human genome. The problem is, for my particular research, I don't care WHERE or HOW MANY TIMES the individual sequences match with the human genome (which is information I can receive from the output .psl file). All I care about is IF a particular sequence matches anywhere (even just once).
If it does match at least once, I will deem that sequence "human" and will then separate the non-human and the human sequences (non-matched vs. matched).
Since I am inputting close to a million sequences, it would probably save hours of computational time to change the code of blat to STOP searching for where ever else a sequence matches up after it finds just one match, then output just that first match to the output .psl file, and move on to the next sequence.
I've been searching the blat website to see if this capability exists and have found nothing. If I am confident it does not exist, then I will try to change the source code of blat to accommodate my needs, but I first wanted to see if anyone on this forum has heard of this being done already before I spend time on it.
Please respond if you know more about this.
Thanks!
How long are the sequences? You might be able to just use bowtie2 or bwa, which would both require much less time.
Hello blakeln!
It appears that your post has been cross-posted to another site: SEQanswers
This is typically not recommended as it runs the risk of annoying people in both communities.
Hi Devon-
Thank you very much for your recommendation(s). I will look into bowtie2 and bwa. I have never used bwa so I am interested to see it's capabilities.