Question

Question regarding Bioinformatics with R and large comparisons

0

Entering edit mode

5.9 years ago

jmacrae04 • 0

Hi, for context I am a 3rd year undergraduate student majoring in Math minoring in Applied Stats and CS. I am about to start a research internship using R and Bioinformatic tools, and from a brief preliminary meeting it was revealed my first task to be able to write a program to create a table holding 500 * 106 snippets of genetic code given in a dataset and showing which gene code they originated from given all 50k gene codes.

Having no knowledge of Bioinformatic tool kits and libraries, would I be developing an algorithm myself to do all of these comparisons (seems like awful running time brute forcing it) or are there already pre existing sources to do tasks like these?

Thank you!

R • 1.0k views

ADD COMMENT • link updated 5.9 years ago by andrew.j.skelton73 6.5k • written 5.9 years ago by jmacrae04 • 0

score 0 · Answer 1 · 2018-05-21

Chances are, there's already a tool that will do / partly do what you want. I'd suggest looking at Omicstools for a specific look at what's around, and how they could fit to your problem. It sounds like your problem is sequence classification? - It might be worth looking in the BLAST direction, and that doesn't necessarily mean that you need to download masses of reference databases, as there are Web / REST APIs available - see Ensembl (web service) or NCBI's (RESTful docs here) BLAST services.