For anyone who has SQL background and who wants to learn R, I guess the sqldf
package is very useful because it enables us to use SQL commands in R. One who has basic SQL skills can manipulate data frames in R using their SQL skills. You can read more about sqldf
package from cran.
In this post, we will see how to perform joins and other queries to retrieve, sort and filter data in R using SQL. We will also see how we can achieve the same tasks using non-SQL R commands. Currently, I am working with the FDA adverse events data. These data sets are ideal for a data scientist or for any data enthusiast to work with because they contain almost any kind of data messiness: missing values, duplicates, compatibility problems of data sets created during different time periods, variable names and number of variables changing over time (e.g., sex in one dataset, gender in other dataset, if_cod in one dataset and if_code in other dataset), missing errors, etc.
In this post, we will use FDA adverse events data, which are publicly available.The FDA adverse events datasets can also be downloaded in .csv
format from the National Bureau of Economic Research. Since, it is easier to automate the data download in R from the National Bureau of Economic Research, we will download various datasets from this website. I encourage you to try the R codes and download data from the website and explore the adverse events datasets.
The adverse events datasets are created in quarterly temporal resolution and each quarter data includes demography information, drug/biologic information, adverse event, outcome and diagnosis, etc.
Let’s download some datasets and use SQL queries to join, sort and filter the various data frames.
Load R Packages
require(downloader) library(dplyr) library(sqldf) library(data.table) library(ggplot2) library(compare) library(plotrix)
Basic error Handing with tryCatch()
We will use the function below to …read more
Source:: r-bloggers.com