by Said Bleik, Shaheen Gauher, Data Scientists at Microsoft
Evaluation metrics are the key to understanding how your classification model performs when applied to a test dataset. In what follows, we present a tutorial on how to compute common metrics that are often used in evaluation, in addition to metrics generated from random classifiers, which help in justifying the value added by your predictive model, especially in cases where the common metrics suggest otherwise.
Creating the Confusion Matrix
We will start by creating a confusion matrix from simulated classification results. The confusion matrix provides a tabular summary of the actual class labels vs. the predicted ones. The test set we are evaluating on contains 100 instances which are assigned to one of 3 classes (a), (b) or (c).
Next we will define some basic variables that will be needed to compute the evaluation metrics.
n = sum(cm) # number of instances
nc = nrow(cm) # number of classes
diag = diag(cm) # number of correctly classified instances per class
rowsums = apply(cm, 1, sum) # number of instances per class
colsums = apply(cm, 2, sum) # number of predictions per class
p = rowsums / n # distribution of instances over the actual classes
q = colsums / n # distribution of instances over the predicted classes
Accuracy
A key metric to start with is the overall classification accuracy. It is defined as the fraction of instances that are correctly classified.
accuracy = sum(diag) / n
accuracy
## [1] 0.85
Per-class Precision, Recall, and F-1
In order to assess the performance with respect to every class in the dataset, we will compute common per-class metrics such as precision, recall, and the F-1 score. These metrics are particularly useful when the class labels are not uniformly distributed (most instances belong …read more
Source:: r-bloggers.com