9 months ago

St. Louis, MO

Well, years after the original question, I think this is worth a revisit. While I agree with Kevin's answer fully, it's interesting to see how python has progressed in recent years to catchup with R on the statistical and visualization side of things. As others have mentioned, numpy, pandas, and scipy yield a huge amount of flexibility to python in terms of data manipulation and statistical analysis. It's true that it doesn't have many of the purely 'omics-based packages that R does, but more and more are being ported to python.

Really, the languages complement each other very nicely, in my opinion. Data munging and handling common file formats is easy in python, particularly with pysam, pybedtools, pyvcf, and pandas, and Rpy2 allows you to access those powerful R stats/modeling packages from within python.

I feel confident in saying python matches R's visualization capabilities at this point in time. I've never had a moment where I felt I **had** to go to R to create the figure I want. The creation of seaborn and plot.ly allow the creation of high-quality, interactive figures very easily without having to fiddle with matplotlib parameters much (if at all). Couple these with ipython and you've got some really interesting ways to explain and interactively wade through your data.

Python package installation has also come a long with with the advent of pip, anaconda, and bioconda. Similar to R, nearly every python package is a one-line install. This seemed to be a big complaint of many people 4 years ago, but it's been largely resolved now.

I don't see R going anywhere due to all of the packages made specifically to handle analysis of sequencing experiments, network interactions, etc. Python *can* do those things perfectly well, but why reinvent the wheel when you can pull it out and stick it on your own car whenever you need?

Overall, I feel python has established itself as an important player in bioinformatics for years to come. Part of this is due to its incredibly easy to pick up syntax, general flexibility, and extremely active developer base. I personally hate R as a language, but there's no denying its status as the backbone of statistical analysis in the bioinformatics community. Of course, that doesn't mean we have to interact with it anymore than is necessary or that python won't continue to make advances to help bridge the gap.

[R] as a language has its own syntax. As many people like R syntax I doubt that python will replace [R].

ok but the new generation whom did not learn R will not need to learn both R and Python i think they just will need to learn Python

Python is great, but its functionality (and the functionality of other scripting languages) does not completely overlap [R]. Learn both. If you hate R, learn Octave, Matlab, or Maple. Or you could implement and validate complicated statistical algorithms yourself.

I think the last part is bad advice. Do

notimplement statistical algorithms yourself, not even the ones you think are reasonably straightforward. The libraries that are there have been tested more extensively than anything you write most likely ever will, i.e. it will have no or very few bugs.Some statistical algorithms are simple. It is sometimes a good idea to reimplement them, for better understanding the pitfalls and for dropping unnecessary dependency. In addition, most 3rd-party libraries are written by fellow programmers no better than you and me. They make mistakes, mistakes that can exist for years because we all take them for granted. My favorite example is matrix multiplication in Ruby and a few other programming languages (e.g. Clay). It is widely known that the second matrix needs to be transposed for better cache efficiency, but in the Ruby library, the developer uses the much slower method. I would think such an obvious flaw should have been identified much earlier, but it is still there because no one bothers to read the source code (EDIT: or because no one implements matrix multiplication these days to learn the transpose trick). Except a small fraction of really high-quality or widely used libraries, most others are not trustworthy.

My personal confession here is that every time I try to implement a statistical method and I read more about it I realize that I do not know how to do it correctly. Or what correct actually means, or wether I should even worry about it. What I mean here that even the simplest task can become a little scary if one wanted to generalize. An example: what is the right way to compute something as simple as a sum.

The more we implement, the clearer we are about correctness and the more likely we can apply similar techniques to complex practical problems. For those who work on method development, reimplementing simple methods is very important. In addition, if you do not know how to implement sum correctly, many others will not know, either; some of them may be even unaware of numerical stability at all.

I was being cynical :-). Michael Schubert is correct.

related: http://stats.stackexchange.com/questions/1595/python-as-a-statistics-workbench

I really hate R's syntax, but as long as Python doesn't have an equivalent of ggplot2 to create nice plots, I will stick with R for most of the data visualization tasks.

The KDNuggets site is so ugly it is surprising one can read an entire article there - anyway, its poll is biased and not of much value, I would think.

And are those charts from the linked article Excel charts??

:D for sure it is Excel

Excel is the future of bioinformatics