by Micheleen Harris
Microsoft Data Scientist
As a Data Scientist, I refuse to choose between R and Python, the top contenders currently fighting for the title of top Data Science programming language. I am not going to argue about which is better or pit Python and R against each other. Rather, I’m simply going to suggest to play to the strengths of each language and consider using them together in the same pipeline if you don’t want to give up advantages of one over the other. This is not a novel concept. Both languages have packages/modules which allow for the other language to be used within it (rpy2 in Python and rPython in R). Even in Jupyter notebooks, using the python kernel, one can use “R magics” to execute native R code (which actually relies on rpy2).
I learned R and Python at about the same time. Having pretty equal footing in both languages makes pipelining them together when need be an attractive option as I have my favorite aspects of each. It is agreed R has crisp, clean and journal-quality graphics as well as an incredible arsenal of statistical packages. Python is both a general purpose language and it is agreed in some places it’s really a production-ready coding language. But who says you can’t do the heavy statistics, machine learning and/or graphics in R within Python? This blog is not about comparing the two languages, however, simply about options to pipeline them and maybe a bit on why you would want to do so.
First things first. We need to decide on a platform and here I’m focusing on notebooks. We actually could do all of this outside a notebook environment, but in general notebook systems are more sharable, interactive, and completely appropriate for demonstrations. If I was well-funded and wanted much …read more
Source:: r-bloggers.com