Quantcast
Viewing all articles
Browse latest Browse all 1015

An Analysis of Traffic Violation Data with SQL Server and R

By Joseph Rickert

By Srini Kumar, Director of Data Science at Microsoft

Who does not hate being stopped and given a traffic ticket? Invariably, we think that something is not fair that we got it and everyone else did not. I am no different, and living in the SF Bay Area, I have often wondered if I could get the data about traffic tickets, particularly since there may be some unusual patterns.

While I could not find one for any data for the SF Bay Area, I supposed I could get some vicarious pleasure out of analyzing it for some place for which I could get data. As luck would have it, I did find something. It turns out that Montgomery County in Maryland does put all the traffic violation information online. Of course, they take out personally identifiable information, but do put out a lot of other information. If any official from that county is reading this, thanks for putting it online! Hopefully, other counties follow suit.

By the standards of the data sizes I have worked with, this is rather small; it is a little over 800K records, though it has about 35 columns, including latitude and longitude. I find it convenient to park the data in a relational database, analyze it using SQL first, and then get slices of it into R.

In this post, I’ll be using Microsoft SQL Server to import and prepare the traffic violations data, and then use R to visualize the data.

The SQL language

If you are unfamiliar with SQL, this SQL tutorial has been around for a long time, and this is where I got started with SQL. Another thing I find convenient is to use an ETL tool. ETL, if you are not already familiar with it, stands for Extract-Transform-Load, and is as pervasive a utility in IT …read more

Source:: r-bloggers.com


Viewing all articles
Browse latest Browse all 1015

Trending Articles