In part 1 I showed how to grab data from the forecast.io, now that we have all of that I want to use it to investigate the effects of weather on accidents. First, I realised after playing around a little that one possible way of doing this was as follows; In part 1 we grabbed weather data associated with each accident (time and location). But to compare we might also want weather data from a sample of days where there were no accidents. To do this, I went back to the forecast.io API and for each incident location, I downloaded data from a randomly chosen second day, at least 3 days away from the day of the incident at that location. In this way I am trying to compare weather information from the exact location and time of an accident to some random baseline. In other words is there anything special about weather at the time and location of road traffic accidents (RTA). So I went back to forecsat.io and got some baseline weather data as described (see scripts/getBaselineWeather.R
in the GH repo), I won’t recount it here as its just another use of the same API, much the same as part 1.
Here I will present that analysis that asks; is there anything special about the weather at locations and times where there has been an incident.
Lets set it up.
library(dplyr)
library(tidyr)
library(ggplot2)
d read.csv("https://raw.githubusercontent.com/rmnppt/Road_Traffic_Accidents/master/data/sample_weather_control.csv")
This next bit is not pretty but I wanted a method to select numeric collumns from data.frame
. I tried stuff with apply
but had no luck.
isNum function(dd) {
n ncol(dd)
num_ind logical(n)
for(i in 1:n) {
num_ind[i] is.numeric(dd[,i])
}
return(num_ind)
}
I also want to be able to calculate the standard error of the mean.
se ...read more
Source:: r-bloggers.com