5001: Statistical Machine Learning I 3rd class(20-9-15) 笔记

it2024-03-24 72

Overview: Statistical Machine Learning

Statistics and machine learningmodelsprediction and inferenceClassification Model assessment for regressionMSE（mean square error）training errortest error Model assessment for classificationMCE（misclassification error）training errortest error Validation set approachLOOCVK-fold CV

专有名词缩写 MSE（mean square error） MSE(f ) = E(L(Y , f (X))) = E(Y − f (X))2

MCE（misclassification error） MCE(f ) = E(L(Y , f (X))) = E(I(Y 6= f (X)))

Bias(fˆ(X)) = E(fˆ(X)) − f (X)

var(fˆ(X)) = E(fˆ(X) − E(fˆ(X)))2

Statistics and machine learning

“Different” terminologies:

Machine LearningStatisticsSupervised learningClassification/regressionUnsupervised learningClusteringSemisupervised learningClass’n/reg’n with missing responsesManifold learning(Nonlinear) dimension reduction

Supervised learning : for (x,y) x属于Rp，y属于R（x的维度是p）可以通过训练，进行Classification/regression

Unsupervised learning for x ，x属于Rp（x的维度是p），进行训练可以进行一些聚类相关的操作

对于Semisupervised learning some parts of its dataset contain the value y, but most of its data are just x without y for example, using python crawler to collect much data and tag some data by person

对于Manifold learning ？？？

Parametric modelsNonparametric modelsLinear/polynomial regression modelLocal smoothingGeneralized linear regression modelSmoothing splinesFisher’s discriminant analysisClassification and regression trees; random forest; boostingLogistic regressionSupport vector machinesDeep learning

models

prediction and inference

Classification

对于例子进行classification的思路 1 Linear regression

2 Nearest neighbors Left panel shows the result of 15-NN classifier; a few training data are misclassified, and the decision boundary adapts to the local density of the classes

Right panel shows the result of 1-NN classifier; none of the training data is misclassified

Model assessment for regression

MSE（mean square error）

MSE(f ) = E(L(Y , f (X))) = E(Y − f (X))2

training error

test error

Model assessment for classification

MCE（misclassification error）

MCE(f ) = E(L(Y , f (X))) = E(I(Y 6= f (X)))

training error

test error

Validation set approach

If we have a large training set, we can estimate the test error by randomly splitting the data into training and validation parts Use the training part to build model, and then assess the model by applying it to the validation part

LOOCV

Split the data set of size n into Training set with size n − 1 Validation set with size 1 Repeat this process n times

K-fold CV

最新回复(0)