Quantcast
Channel: r software hub
Viewing all articles
Browse latest Browse all 1015

Using Decision Trees to Predict Infant Birth Weights

$
0
0

By Teja Kodali

CART_tree_titanic_survivors

In this article, I will show you how to use decision trees to predict whether the birth weights of infants will be low or not. We will use the birthwt data from the MASS library.

What is a decision tree?

A decision tree is an algorithm that builds a flowchart like graph to illustrate the possible outcomes of a decision. To build the tree, the algorithm first finds the variable that does the best job of separating the data into two groups. Then, it repeats the above step with the other variables. This results in a tree graph, where each split represents a decision. The algorithm chooses the splits such that the maximum number of observations are classified correctly. The biggest advantage of a decision tree is that it is really intuitive and can be understood even by people with no experience in the field.

For example, a classification tree showing the survival of passengers of the Titanic is as follows (source: Wikipedia):

The numbers under the node represent the probability of survival, and the percentage of observations that fall into that category. The first node on the right shows that 73% of the females survived, and females represented 36% of the total observations in the dataset.

Exploring the data

We will need the MASS and rpart libraries for this. Let’s load up the data, and look at it.

library(MASS)
library(rpart)
head(birthwt)
low age lwt race smoke ptl ht ui ftv bwt
85 0 19 182 2 0 0 0 1 0 2523
86 0 33 155 3 0 0 0 0 3 2551
87 0 20 105 1 ...read more

Source:: r-bloggers.com


Viewing all articles
Browse latest Browse all 1015