Math 253 Notes
Preface
Math 253 and the Macalester statistics curriculum
These notes are written in Bookdown
1
Introduction
1.1
Statistical and Machine Learning
1.1.1
Example 1: Machine translation of natural languages
1.1.2
Example 2: From library catalogs to latent semantic indexing
1.1.3
Computing technique
1.2
Review of Day 1
1.3
Theoretical concepts ISL §2.1
1.3.1
Statistics concepts
1.3.2
Computing concepts
1.3.3
Cross fertilization
1.4
Many techniques
1.4.1
Unsupervised learning
1.4.2
Supervised learning:
1.5
Basic dicotomies in machine learning
1.5.1
Purposes for learning:
1.5.2
Dicotomies
1.5.3
Prediction versus mechanism
1.5.4
Flexibility versus variance
1.5.5
Black box vs interpretable models
1.5.6
Reducible versus irreducible error
1.5.7
Regression versus classification
1.5.8
Supervised versus unsupervised
1.6
Programming Activity 1
1.7
Review of Day 2
1.7.1
Trade-offs/Dicotomies
1.8
A Classifier example
1.9
Programming Activity 2
1.10
Day 3 theory: accuracy, precision, and bias
1.10.1
Figure 2.10
1.10.2
Another example: A smoother simulated
\(f(x)\)
.
1.10.3
What’s the “best” of these models?
1.10.4
Why is testing MSE U-shaped?
1.10.5
Measuring the variance of independent sources of variation
1.10.6
Equation 2.7
1.11
Programming Activity 3
1.12
Review of Day 3
1.13
Start Thursday 15 Sept.
I Topic I: Linear Regression
1.14
Day 4 Preview
1.15
Small data
2
Notes
2.1
Review of Day 4, Sept 15, 2017
2.2
Regression and Interpretability
2.3
Toward an automated regression process
2.4
Selecting model terms
2.5
Programming basics: Graphics
2.6
In-class programming activity
2.7
Day 5 Summary
2.7.1
Linear regression
2.7.2
Coefficients as quantities
2.8
In-class programming activity
2.9
Day 6 Summary
2.10
Measuring Accuracy of the Model
2.11
Bias of the model
2.11.1
Theory of whole-model ANOVA.
2.12
Forward, backward and mixed selection
2.13
Programming Basics: Functions
2.14
In-class programming activity
2.15
Review of Day 7
2.16
Using predict() to calculate precision
2.17
Conclusion
3
Foundations: linear algebra, likelihood and Bayes’ rule
3.1
Linear Algebra
3.2
Arithmetic of linear algebra operations
3.3
The geometry of fitting
3.4
Precision of the coefficients
3.5
Likelihood and Bayes
3.6
Summary of Day 8
3.7
Day 9 Announcements
3.7.1
What’s a probability?
3.8
Conditional probability
3.9
Inverting conditional probabilities
3.10
Summary of Day 9
3.11
Likelihood example
3.12
Exponential probability density
3.12.1
Meanwhile, further north …
3.13
California earthquake warning, reprise
3.14
The Price is Right!
3.15
From likelihood to Bayes
3.16
Choosing models using maximum likelihood
3.17
Day 9 Review
3.18
Reading:
What is Bayesian Statistics
3.19
Programming Basics: Conditionals
3.20
ifelse()
examples
3.21
if … else … examples
3.22
Simple
3.23
Blood testing
3.24
The (hyper)-volume of the hypersphere.
3.25
Find the surface area,
\(D_n r^{n-1}\)
.
3.26
In-class programming activity
4
Classifiers
4.1
Classification overview
4.2
Day 10 preview
4.3
Probability and odds
4.4
Log Odds
4.5
Why use odds?
4.6
Use of glm()
4.7
Interpretation of coefficients
4.8
Example: Logistic regression of default
5
Linear and Quadratic Discriminant Analysis
5.1
Example: Default on student loans
5.2
A Bayes’ Rule approach
5.3
Univariate Gaussian
5.4
Uncorrelated bivariate gaussian
5.5
Bivariate normal distribution with correlations
5.6
Shape of multivariate gaussian
5.7
Generating bivariate normal from independent
5.8
Independent variables
\(x_i\)
5.9
Re-explaining
\(\boldsymbol\Sigma\)
5.10
LDA
5.11
QDA
5.12
Error test rates on various classifiers
5.13
Error rates
5.14
Receiver operating curves
6
Cross-Validation and Bootstrapping
6.1
Philosophical approaches
6.1.1
Occam’s Razor: A heuristic
6.1.2
Einstein’s proverb
6.2
Operationalizing model choice
6.3
Some definitions of “better”
6.4
Training and Testing
6.5
Trade-off
6.6
Classical theory interlude
6.7
Bootstrapping
7
Regularization, shrinkage and dimension reduction
7.1
Best subset selection
7.2
Approximation to best subset selection
7.3
Classical theory of best model choice
7.4
Optimization
7.4.1
What are we optimizing over?
7.5
Shrinkage methods
7.5.1
Ridge regression
7.6
LASSO
7.7
Review
7.8
Multi-collinearity
7.9
Creating correlations
7.10
Rank 1 Matrices
7.11
Idea of singular values.
7.12
Dimension reduction
8
Nonlinearity in linear models
8.1
Smoothers
8.1.1
Ideas of smoothness
8.1.2
Polynomials
8.1.3
The model matrix
8.1.4
Sigmoidal Functions
8.1.5
Hat functions
8.1.6
Fourier analysis
8.2
Steps
8.3
Other functions
8.4
Holes in the data
8.5
Bootstrapping
8.6
Normal theory confidence bands
8.7
Splines
8.7.1
B-splines
8.7.2
Natural splines
8.7.3
Smoothing splines
8.7.4
Smoothers in k dimensions
8.8
GAMS
9
Programming Activity
10
Where to place knots?
11
Trees for Regression and Classification
11.1
Splitting Criteria for Classification Trees
11.2
Variable importance
11.3
Avoiding overfitting
11.4
Pruning
11.5
Averaging
11.6
Shrinking (“Boosting”)
12
Support Vector Classifiers
12.1
Lines, planes, and hyperplanes
12.1.1
Rescaling X
12.1.2
Impose an absolute constraint
12.2
Optimizing within the constraint
12.3
Allowing violations of the boundary
12.4
Nonlinear Boundaries
12.5
Support Vector Machine
12.6
Kernels
12.7
SVM versus logistic regression
13
Programming Basics
13.1
Programming Basics I: Names, classes, and objects {progbasics1}
13.1.1
Names
13.1.2
Objects
13.1.3
Vectors
13.1.4
Matrices
13.1.5
Lists
13.1.6
Functions
13.2
Programming basics: Linear Models
13.2.1
Graphics basics
13.3
K-nearest neighbors
13.4
Loops/Iteration
13.5
Parts of a loop
13.6
Trivial examples
13.7
Bootstrapping
13.8
Leave-one-out cross-validation.
13.9
Building a package
Appendices
Connecting RStudio to your GitHub repository
13.10
Setting up RStudio
13.11
Setting up your Math 253 repository
13.12
Using your repository
13.13
Why are we doing this?
Instructions for the publishing system: Bookdown
Published with bookdown
Notes for Statistical Computing & Machine Learning
Appendices