专有名词缩写 MSE(mean square error) MSE(f ) = E(L(Y , f (X))) = E(Y − f (X))2
MCE(misclassification error) MCE(f ) = E(L(Y , f (X))) = E(I(Y 6= f (X)))
Bias(fˆ(X)) = E(fˆ(X)) − f (X)
var(fˆ(X)) = E(fˆ(X) − E(fˆ(X)))2
“Different” terminologies:
Machine LearningStatisticsSupervised learningClassification/regressionUnsupervised learningClusteringSemisupervised learningClass’n/reg’n with missing responsesManifold learning(Nonlinear) dimension reductionSupervised learning : for (x,y) x属于Rp,y属于R(x的维度是p) 可以通过训练,进行Classification/regression
Unsupervised learning for x ,x属于Rp(x的维度是p),进行训练 可以进行一些聚类相关的操作
对于Semisupervised learning some parts of its dataset contain the value y, but most of its data are just x without y for example, using python crawler to collect much data and tag some data by person
对于Manifold learning ???
Parametric modelsNonparametric modelsLinear/polynomial regression modelLocal smoothingGeneralized linear regression modelSmoothing splinesFisher’s discriminant analysisClassification and regression trees; random forest; boostingLogistic regressionSupport vector machinesDeep learning对于例子 进行classification的思路 1 Linear regression
2 Nearest neighbors Left panel shows the result of 15-NN classifier; a few training data are misclassified, and the decision boundary adapts to the local density of the classes
Right panel shows the result of 1-NN classifier; none of the training data is misclassified
MSE(f ) = E(L(Y , f (X))) = E(Y − f (X))2
MCE(f ) = E(L(Y , f (X))) = E(I(Y 6= f (X)))
If we have a large training set, we can estimate the test error by randomly splitting the data into training and validation parts Use the training part to build model, and then assess the model by applying it to the validation part
Split the data set of size n into Training set with size n − 1 Validation set with size 1 Repeat this process n times
