How To Become A (Protein) Bioinformatician? - Skills
3
0
Entering edit mode
10.2 years ago
K ▴ 180

Hi all.

I am a biotechnologist who in the final part of my Ph.D. have become involved (and quite interested!) in the use of bioinformatics tools for protein study. I got the feeling that I should deepen in this discipline, and came up with the resolve to learn more about this, and about bioinformatics. That said, I have grasped that most bioinformaticians need a good knowledge of Linux, and at least some idea of programming.

What is the preferred programming language in the field of bioinformatics? What are the most common Linux "flavours" used in bioinformatics, if that is relevant? Are there any recent courses which could be recommended for learning both bioinformatics-related Linux and programming? Finally, are there any other skills desirable for the successful development of a protein bioinformatician-to-be?

Thanks a lot in advance.

career linux programming • 3.7k views
ADD COMMENT
2
Entering edit mode
10.2 years ago
Sanjiv Kumar ▴ 20

I too am new to bioinformatics and I did my Ph. D. in biotechnology, Like you I too was attracted by some of the bioinformatic work going on in my lab. Some of my suggestions are as follows, however I would wish hard core bioinformaticians can shed more light on the questions you have asked. Openion expressed are solely mine.

  1. Learning Linux is a must, must, Ubuntu is good for starters, CentOS is free and good one, Fedora and RedHat are the commercial versions I guess, Bio-Linux is specially designed for this use. I have little experience on the last three. flavor doesn't matter much till the point you deviate so much that you are using some flavor of Linux that's intended for something else, like you should not use BackTrack for intended bioinformatics work. That being said, it comes with SHELL SCRIPTING and massive use of vi/vim, these are going to be handy.

  2. Programming language - A lot of work can be done using shell scripting, but I think it is also must to learn a programming language and have command over it. Try with little and as you go, you skill grows. Perl or Python are good for start, along with that I would also suggest learning R, stats is going to be a lot later on. Additionally, if its not too much to ask MATLAB, though commercial its good if you know.

  3. Protein Bioinformatecian - since you are coining it as discipline (there's nothing as such), start with MSA (ClustalW, Muscle, etc), Phylogentic analysis, Annotation of hypothetical proteins, Domain databases searches (CDD, Pfam etc), Homology modeling, Docking, Simulation, Drug databases, here I am presuming you are well aware with the BLAST, BLAT etc. Know about GeneBank, Protein Data Bank, UniProt, NCBI, KEGG, KOG, and other similar resources. Once you have learned bit by bit about these I am sure you will have an inclination towards something specific and you can pursue it further.

  4. Whole genome/proteiome/transcriptome/metabolome data analysis - Another group of tools and methodology that inspires me is the NGS, that currently is my interest and I am learning. Including but not restricted to Genome (Genomics) and Transcriptome (Transcriptomics) assembly and analysis, Metabolomics, Pathway and Network analysis, Co-expression network analysis.

  5. Machine Learning - I have no idea/experience on this, may be someone working on it can elaborate more. HMM, SVM, Random Forest, Artificial Nural Network. They are awfully interesting and complicated.

Everything I learn I put it on my blog for my future reference and if it helps someone: http://bioinformatictools.blogspot.in/

But by grabbing them bit by bit at a time I think we can even eat a Dinosaur.

Hope I helped.

ADD COMMENT
0
Entering edit mode

Personally, I'd recommend Bio-Linux for a range of reasons but the main one is that you sit down on your first day of learning bioinformatics on linux and you already have a whole load of bioinformatics programs installed. You want to run a blast? google "command line blast example" and type in what you find, hit enter and it's working. On other OSs you'd usually have to read up on installing and calibrating before jumping in. Good for getting a feel around, and there's a load of introductory material from basics of linux upwards. (Also, it's basically Ubuntu anyway).

ADD REPLY
0
Entering edit mode

Thanks a lot, Mabeuf, 5heikki and Sanjiv!

I think that with your three answers (one very thorough) I pretty much complete the picture I was looking for. Thanks a lot again, and greetings from Barcelona, Spain,

K

ADD REPLY
1
Entering edit mode
10.2 years ago
5heikki 11k

IMO you should be fluent in at least bash (and the usage of gnu coreutils), and somewhat able to debug python and perl (well, code in general). You should be able to understand why some algorithms/ways of processing data are slower than others, and yes, you should be familiar with *nix. Learning Debian wouldn't be a bad start since half the world is based on it. Also, understand that there are many kinds of bioinformaticians. Some have almost nothing to do with biology, and just basically design algorithms. Others are biologists who just happen to do their work in silico. The requirements listed are for the latter kind..

ADD COMMENT
0
Entering edit mode
6.4 years ago
anicet.ebou ▴ 170
  1. Firstly you should learn Python, Perl, Bioperl and Biopython and R/Bioconductor.

  2. Bioinformaticians don't like to work on windows even if you install Cygwin. So you absolutely need a linux distros and i strongly recommend Ubuntu because of the community. So when a bug come you can find rapidly an answer or a tip around the stackoverflow, biostars, or any other resource to help you solve your problem.

  3. I think you should have at least installed on your linux Rstudio the IDE of R, Rodeo the best IDE of python for me and samtools to process easily your files, Jalview to visualize your alignments in proteins colors.

  4. Learn how protein database are built and what are their functions and also how you can use them for your analysis. Common used db are Uniprot, Pfam, Refseq protein.

  5. An essential tools for classification of proteins nowadays is Hidden Markov Models. You should use hmmer for this.

  6. For courses or articles on bioinformatics and protein you should just google it.

ADD COMMENT

Login before adding your answer.

Traffic: 3269 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6