Question

How To Become A (Protein) Bioinformatician? - Skills

0

Entering edit mode

10.2 years ago

K ▴ 180

Hi all.

I am a biotechnologist who in the final part of my Ph.D. have become involved (and quite interested!) in the use of bioinformatics tools for protein study. I got the feeling that I should deepen in this discipline, and came up with the resolve to learn more about this, and about bioinformatics. That said, I have grasped that most bioinformaticians need a good knowledge of Linux, and at least some idea of programming.

What is the preferred programming language in the field of bioinformatics? What are the most common Linux "flavours" used in bioinformatics, if that is relevant? Are there any recent courses which could be recommended for learning both bioinformatics-related Linux and programming? Finally, are there any other skills desirable for the successful development of a protein bioinformatician-to-be?

Thanks a lot in advance.

career linux programming • 3.7k views

ADD COMMENT • link updated 6.4 years ago by anicet.ebou ▴ 170 • written 10.2 years ago by K ▴ 180

score 2 · Answer 1 · 2014-02-13

I too am new to bioinformatics and I did my Ph. D. in biotechnology, Like you I too was attracted by some of the bioinformatic work going on in my lab. Some of my suggestions are as follows, however I would wish hard core bioinformaticians can shed more light on the questions you have asked. Openion expressed are solely mine.

Learning Linux is a must, must, Ubuntu is good for starters, CentOS is free and good one, Fedora and RedHat are the commercial versions I guess, Bio-Linux is specially designed for this use. I have little experience on the last three. flavor doesn't matter much till the point you deviate so much that you are using some flavor of Linux that's intended for something else, like you should not use BackTrack for intended bioinformatics work. That being said, it comes with SHELL SCRIPTING and massive use of vi/vim, these are going to be handy.
Programming language - A lot of work can be done using shell scripting, but I think it is also must to learn a programming language and have command over it. Try with little and as you go, you skill grows. Perl or Python are good for start, along with that I would also suggest learning R, stats is going to be a lot later on. Additionally, if its not too much to ask MATLAB, though commercial its good if you know.
Protein Bioinformatecian - since you are coining it as discipline (there's nothing as such), start with MSA (ClustalW, Muscle, etc), Phylogentic analysis, Annotation of hypothetical proteins, Domain databases searches (CDD, Pfam etc), Homology modeling, Docking, Simulation, Drug databases, here I am presuming you are well aware with the BLAST, BLAT etc. Know about GeneBank, Protein Data Bank, UniProt, NCBI, KEGG, KOG, and other similar resources. Once you have learned bit by bit about these I am sure you will have an inclination towards something specific and you can pursue it further.
Whole genome/proteiome/transcriptome/metabolome data analysis - Another group of tools and methodology that inspires me is the NGS, that currently is my interest and I am learning. Including but not restricted to Genome (Genomics) and Transcriptome (Transcriptomics) assembly and analysis, Metabolomics, Pathway and Network analysis, Co-expression network analysis.
Machine Learning - I have no idea/experience on this, may be someone working on it can elaborate more. HMM, SVM, Random Forest, Artificial Nural Network. They are awfully interesting and complicated.

Everything I learn I put it on my blog for my future reference and if it helps someone: http://bioinformatictools.blogspot.in/

But by grabbing them bit by bit at a time I think we can even eat a Dinosaur.

Hope I helped.

score 1 · Answer 2 · 2014-02-13

IMO you should be fluent in at least bash (and the usage of gnu coreutils), and somewhat able to debug python and perl (well, code in general). You should be able to understand why some algorithms/ways of processing data are slower than others, and yes, you should be familiar with *nix. Learning Debian wouldn't be a bad start since half the world is based on it. Also, understand that there are many kinds of bioinformaticians. Some have almost nothing to do with biology, and just basically design algorithms. Others are biologists who just happen to do their work in silico. The requirements listed are for the latter kind..

score 0 · Answer 3 · 2017-12-06

Firstly you should learn Python, Perl, Bioperl and Biopython and R/Bioconductor.
Bioinformaticians don't like to work on windows even if you install Cygwin. So you absolutely need a linux distros and i strongly recommend Ubuntu because of the community. So when a bug come you can find rapidly an answer or a tip around the stackoverflow, biostars, or any other resource to help you solve your problem.
I think you should have at least installed on your linux Rstudio the IDE of R, Rodeo the best IDE of python for me and samtools to process easily your files, Jalview to visualize your alignments in proteins colors.
Learn how protein database are built and what are their functions and also how you can use them for your analysis. Common used db are Uniprot, Pfam, Refseq protein.
An essential tools for classification of proteins nowadays is Hidden Markov Models. You should use hmmer for this.
For courses or articles on bioinformatics and protein you should just google it.