Quantcast
Channel: r software hub
Viewing all articles
Browse latest Browse all 1015

Is your Classification Model making lucky guesses?

$
0
0

By Shaheen Gauher

by Shaheen Gauher, PhD, Data Scientist at Microsoft

At the heart of a classification model is the ability to assign a class to an object based on its description or features. When we build a classification model, often we have to prove that the model we built is significantly better than random guessing. How do we know if our machine learning model performs better than a classifier built by assigning labels or classes arbitrarily (through random guess, weighted guess etc.)? I will call the latter non-machine learning classifiers as these do not learn from the data. A machine learning classifier should be smarter and should not be making just lucky guesses! At the least, it should do a better job at detecting the difference between different classes and should have a better accuracy than the latter. In the sections below, I will show three different ways to build a non-machine learning classifier and compute their accuracy. The purpose is to establish some baseline metrics against which we can evaluate our classification model.

In the examples below, I will assume we are working with data with population size (n). The data is divided into two groups, with (x%) of the rows or instances belonging to one class (labeled positive or (P)) and ((1-x)%) belonging to another class (labeled negative or (N)). We will also assume that the majority of the data is labeled (N). (This is easily extended to data with more than two classes, as I show in the paper here). This is the ground truth.

Population Size (=n)
Fraction …read more

Source:: r-bloggers.com


Viewing all articles
Browse latest Browse all 1015

Trending Articles