Free SOA Exam SRM (Statistics for Risk Modeling) Basics of Statistical Learning Practice Questions
Statistical learning concepts on SOA Exam SRM cover the bias-variance tradeoff, model complexity, cross-validation, and the distinction between supervised and unsupervised learning methods. This topic provides the theoretical framework for all predictive modeling on the exam.
Sample Questions
Bias measures the systematic error introduced by the modeling assumptions. It is the difference between the expected prediction (averaged over many training sets) and the true function value . A high-bias model makes strong assumptions that may not match the true relationship (underfitting).
The Bayes classifier assigns each observation to the most probable class given the observed features: . This is the theoretically optimal classifier that minimizes the overall misclassification rate.
I. Validation set approach (50/50 split)
II. 10-fold cross-validation
III. LOOCV
Rank these approaches from HIGHEST to LOWEST variance of the test error estimate.
The ranking from highest to lowest variance is: I (validation set) > III (LOOCV) > II (10-fold CV). Validation set approach (I): Uses only 50% of data for training, and the estimate depends entirely on one random split. This produces the highest variance because a single split can be very unrepresentative. LOOCV (III): Uses observations for training in each fold. While it averages over folds, the training sets are nearly identical (differ by only 1 observation), producing highly correlated fold estimates. Averaging correlated estimates does not reduce variance as effectively, so LOOCV has moderate-to-high variance. 10-fold CV (II): Uses 90% of data for training and averages over 10 less-correlated fold estimates. The lower correlation between folds means averaging is more effective at reducing variance.