博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
用10张图来看机器学习Machine learning in 10 pictures
阅读量:4594 次
发布时间:2019-06-09

本文共 4107 字,大约阅读时间需要 13 分钟。

I find myself coming back to the same few pictures when explaining basic machine learning concepts. Below is a list I find most illuminating.

 

1. Test and training error: Why lower training error is not always a good thing:  Figure 2.11. Test and training error as a function of model complexity.

 

2. Under and overfitting:  Figure 1.4. Plots of polynomials having various orders M, shown as red curves, fitted to the data set generated by the green curve.

 

3. Occam's razor:  Figure 28.3. Why Bayesian inference embodies Occam’s razor. This figure gives the basic intuition for why complex models can turn out to be less probable. The horizontal axis represents the space of possible data sets D. Bayes’ theorem rewards models in proportion to how much they predicted the data that occurred. These predictions are quantified by a normalized probability distribution on D. This probability of the data given model Hi, P (D | Hi), is called the evidence for Hi. A simple model H1 makes only a limited range of predictions, shown by P(D|H1); a more powerful model H2, that has, for example, more free parameters than H1, is able to predict a greater variety of data sets. This means, however, that H2 does not predict the data sets in region C1 as strongly as H1. Suppose that equal prior probabilities have been assigned to the two models. Then, if the data set falls in region C1, the less powerful model H1 will be the more probable model.

 

4. Feature combinations: (1) Why collectively relevant features may look individually irrelevant, and also (2) Why linear methods may fail. From Isabelle Guyon's .

 

5. Irrelevant features: Why irrelevant features hurt kNN, clustering, and other similarity based methods. The figure on the left shows two classes well separated on the vertical axis. The figure on the right adds an irrelevant horizontal axis which destroys the grouping and makes many points nearest neighbors of the opposite class.

 

6. Basis functions: How non-linear basis functions turn a low dimensional classification problem without a linear boundary into a high dimensional problem with a linear boundary. From  by Andrew Moore: a one dimensional non-linear classification problem with input x is turned into a 2-D problem z=(x, x^2) that is linearly separable.

 

7. Discriminative vs. Generative: Why discriminative learning may be easier than generative:  Figure 1.27. Example of the class-conditional densities for two classes having a single input variable x (left plot) together with the corresponding posterior probabilities (right plot). Note that the left-hand mode of the class-conditional density p(x|C1), shown in blue on the left plot, has no effect on the posterior probabilities. The vertical green line in the right plot shows the decision boundary in x that gives the minimum misclassification rate.

 

8. Loss functions: Learning algorithms can be viewed as optimizing different loss functions:  Figure 7.5. Plot of the ‘hinge’ error function used in support vector machines, shown in blue, along with the error function for logistic regression, rescaled by a factor of 1/ln(2) so that it passes through the point (0, 1), shown in red. Also shown are the misclassification error in black and the squared error in green.

 

9. Geometry of least squares:  Figure 3.2. The N-dimensional geometry of least squares regression with two predictors. The outcome vector y is orthogonally projected onto the hyperplane spanned by the input vectors x1 and x2. The projection yˆ represents the vector of the least squares predictions.

 

10. Sparsity: Why Lasso (L1 regularization or Laplacian prior) gives sparse solutions (i.e. weight vectors with more zeros):  Figure 3.11. Estimation picture for the lasso (left) and ridge regression (right). Shown are contours of the error and constraint functions. The solid blue areas are the constraint regions |β1| + |β2| ≤ t and β12 + β22 ≤ t2, respectively, while the red ellipses are the contours of the least squares error function. 

 

from: http://www.denizyuret.com/2014/02/machine-learning-in-5-pictures.html

转载于:https://www.cnblogs.com/GarfieldEr007/p/5328593.html

你可能感兴趣的文章
c# 调用 c++写的DLL
查看>>
判断是否出现垂直滚动条
查看>>
袁韬淳第五次作业
查看>>
C#中怎样实现序列化和反序列化
查看>>
计算机网络(谢希仁版)——第二章回顾
查看>>
月薪20K的程序员整理的C语言的学习笔记,值得学习!
查看>>
Swing应用开发实战系列之二:设计日期选择面板窗口
查看>>
Swing应用开发实战系列之一:自定义JdbcTemplate
查看>>
Java随笔一:String类中方法split
查看>>
(转)使用LVS实现负载均衡原理及安装配置详解
查看>>
01整数规划
查看>>
a recipe kindly provided by Dimas for kikuchi
查看>>
icon design隐私条款
查看>>
移动端开发
查看>>
3. Elements of a Test Plan
查看>>
通过NuGet获取sqlite对应的.net的dll
查看>>
用户和用户组,以及文件和文件夹的权限
查看>>
H5 基于Web Storage 的客户端留言板
查看>>
linux添加字体
查看>>
Fastjson是一个Java语言编写的高性能功能完善的JSON库。
查看>>