Biostar Beta. Not for public use.
Question: Question regarding Bioinformatics with R and large comparisons
0
Entering edit mode

Hi, for context I am a 3rd year undergraduate student majoring in Math minoring in Applied Stats and CS. I am about to start a research internship using R and Bioinformatic tools, and from a brief preliminary meeting it was revealed my first task to be able to write a program to create a table holding 500 * 106 snippets of genetic code given in a dataset and showing which gene code they originated from given all 50k gene codes.

Having no knowledge of Bioinformatic tool kits and libraries, would I be developing an algorithm myself to do all of these comparisons (seems like awful running time brute forcing it) or are there already pre existing sources to do tasks like these?

Thank you!

ADD COMMENTlink 21 months ago jmacrae04 • 0 • updated 21 months ago andrew.j.skelton73 5.7k
0
Entering edit mode

Chances are, there's already a tool that will do / partly do what you want. I'd suggest looking at Omicstools for a specific look at what's around, and how they could fit to your problem. It sounds like your problem is sequence classification? - It might be worth looking in the BLAST direction, and that doesn't necessarily mean that you need to download masses of reference databases, as there are Web / REST APIs available - see Ensembl (web service) or NCBI's (RESTful docs here) BLAST services.

ADD COMMENTlink 21 months ago andrew.j.skelton73 5.7k

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0