Learning from Learning Curves

By Joseph Rickert

This is a follow-up to my earlier post on learning curves. A learning curve is a plot of predictive error for training and validation sets over a range of training set sizes. Here we’re using simulated data to explore some fundamental relationships between training set size, model complexity, and prediction error.

Start by simulating a dataset:

sim_data

The input columns are named X1, X2, etc.; these are all categorical variables with single capital letters representing the different categories. Cardinality is the number of possible values in the column; our default cardinality of 10 means we sample from the capital letters A through J.

Next we’ll add an outcome variable (y); it has a base level of 100, but if the values in the first two X variables are equal, this is increased by 10. On top of this we add some normally distributed noise.

set.seed(123)
data

With linear models, we handle an interaction between two categorical variables by adding an interaction term; the number of possibilities in this interaction term is basically the product of the cardinalities. In this simulated data set, only the first two columns affect the outcome, and the other input columns don’t contain any useful information. We’ll use it to demonstrate how adding non-informative variables affects overfitting and training set size requirements.

As in the earlier post, I’ll use the root mean squared error of the predictions as the error function because RMSE is essentially the same as standard deviation. No model should be able to make predictions with a root mean squared error less than the standard deviation of the random noise we added.

rmse

The cross-validation function trains a model using the supplied formula and modeling function, then tests its performance on a held-out test set. The training set will be sampled from the data available for training; to use approximately …read more

Source:: r-bloggers.com

Learning from Learning Curves

Trending Articles

Scuffham Amps - S-GEAR 2.6.0 VST, AAX, STANDALONE x86 x64 (R2R NO iLok2, +NO...

Practice Sheet of Right form of verbs for HSC Students

VHSE First (1st) Allotment 2025 - vhscap.kerala.gov.in

UNIVERSE LEAGUE – UNIVERSE LEAGUE – WAR (We Are Ready) – EP [iTunes Plus M4A]

City Hunter Teledrama – Episode 18 – 07th May 2016

Comment on Proposed Criteria for Identifying Predatory Conferences by Luke...

Bureau of Internal Revenue: Regional Offices (Directory)

Kendrick Lamar – Not Like Us (2024) [24Bit-88.2kHz] [PMEDIA] ⭐️

Inception 2010 Hindi Dual Audio 650MB BRRip 720p ESubs HEVC

East Hull MD admits sexual assaults after another victim comes forward

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

R. v. Sargeant, 2023 ONSC 6406 (CanLII)

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Who’s been sentenced at Northampton Magistrates’ Court

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Family cries out as traditional ruler allegedly abducts brother, extorts N2.5m

Long-Running Conflict In Springfield (MA) Gangland Sphere Has Manzi Family &...

Wondershare Filmora X v10.1.20.16 x64

Man arrested after fracas in flat

Man charged in ongoing Sexual Assault Investigation Derek Nyilas, 46, Faces...