Quantcast
Channel: r software hub
Viewing all articles
Browse latest Browse all 1015

Working with databases in R

$
0
0

By Fisseha Berhane

table1

The dplyr package, which is one of my favorite R packages, works with in-memory data and with data stored in databases. In this extensive and comprehensive post, I will share my experience on using dplyr to work with databases. The basic functions of dplyr package are covered by Teja in another post at DataScience+

Using dplyr with databases has huge advantage when our data is big where loading it to R is impossible or slows down our analytics. If our data resides in a database, rather than loading the whole data to R, we can work with subsets or aggregates of the data that we are interested in. Further, if we have many data files, putting our data in a database, rather than in csv or other format, has better security and is easier to manage.

dplyr is a really powerful package for data manipulation, data exploration and feature engineering in R and if you do not know SQL, it provides the ability to work with databases just within R. Further, dplyr functions are easy to write and read. dplyr considers database tables as data frames and it uses lazy evaluation (it delays the actual operation until necessary and loads data onto R from the database only when we need it) and for someone who knows Spark, the processes and even the functions have similarities.

dplyr supports a couple of databases such as sqlite, mysql and postgresql. In this post, we will see how to work with sqlite database. You can get more information from the dplyr database vignette here.

When people take drugs, if they experience any adverse events, they can report it to the FDA. These data are in public domain and anyone can download them and analyze them. In this post, we will download demography information of the patients, drug they …read more

Source:: r-bloggers.com


Viewing all articles
Browse latest Browse all 1015