Question regarding Bioinformatics with R and large comparisons
1
0
Entering edit mode
5.9 years ago
jmacrae04 • 0

Hi, for context I am a 3rd year undergraduate student majoring in Math minoring in Applied Stats and CS. I am about to start a research internship using R and Bioinformatic tools, and from a brief preliminary meeting it was revealed my first task to be able to write a program to create a table holding 500 * 106 snippets of genetic code given in a dataset and showing which gene code they originated from given all 50k gene codes.

Having no knowledge of Bioinformatic tool kits and libraries, would I be developing an algorithm myself to do all of these comparisons (seems like awful running time brute forcing it) or are there already pre existing sources to do tasks like these?

Thank you!

R • 1.0k views
ADD COMMENT
0
Entering edit mode
5.9 years ago

Chances are, there's already a tool that will do / partly do what you want. I'd suggest looking at Omicstools for a specific look at what's around, and how they could fit to your problem. It sounds like your problem is sequence classification? - It might be worth looking in the BLAST direction, and that doesn't necessarily mean that you need to download masses of reference databases, as there are Web / REST APIs available - see Ensembl (web service) or NCBI's (RESTful docs here) BLAST services.

ADD COMMENT

Login before adding your answer.

Traffic: 2781 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6