摘要: 本文是吴恩达 (Andrew Ng)老师《机器学习》课程,第十四章《无监督学习》中第110课时《优化目标》的视频原文字幕。为本人在视频学习过程中记录下来并加以修正,使其更加简洁,方便阅读,以便日后查阅使用。现分享给大家。如有错误,欢迎大家批评指正,在此表示诚挚地感谢!同时希望对大家的学习能有所帮助. ————————————————
Most of the supervised learning algorithms we've seen, things like linear regression, logistic regression and so on. All of those algorithms have an optimization objective or some cost function that the algorithm was trying to minimize. It turns out that K-means also has an optimization objective or a cost function that is trying to minimize. And in this video, I'd like to tell you what that optimization objective is. And the reason I want to do so is because this will be useful for us for two purposes. First, knowing what is the optimization objective of K-means will help us to debug the learning algorithm and just make sure that K-means is running correctly. And second and perhaps even more importantly, in a later video, we'll talk about how we can use this to help K-means find better clusters and avoid local optima, but we'll do that in a later video that follows this one.
Just as a quick reminder, while K-means is running, we're going to be keeping track of two sets of variables. First it's the that keeps track of the index or the number of the cluster to which an example is currently assigned. And another set of variables we use as which is the location of cluster centroid . Again, for K-means, we use capital to denote the total number of clusters. And here the lower case is going to be an index into the cluster centroids, so it will be a number between 1 and capital . Now, here's one more bit of notation which is going to use to denote the cluster centroid of the cluster to which example has been assigned. To explain that notation a little bit more. Let's say has been assigned to cluster number 5. Then . So . So this is the cluster centroid of the cluster number 5 which is the cluster to which my example has been assigned. With this notation, we're now ready to write out what is the optimization objective of the K-means clustering algorithm. The cost function that K-means is minimizing is the function of all of these parameters through , through that K-means is varying as the algorithm runs. And the optimization objective is shown on the right, is . Here means the squared distance between each example and the location of the cluster centroid to which has been assigned. To explain this in picture, a point here is my example , and has been assigned to some cluster centroid denoted as a cross here. So that is the location of if has been assigned to cluster centroid 5 in my example up there. Then the squared distance is the squared of the distance between the point and this cluster centroid to which has been assigned. And what K-means do is trying to find parameters and to try to minimize this cost function . This cost function is sometimes also called the distortion cost function or the distortion of the K-means algorithm.
And just to provide a little bit more detail. Here's the K-means algorithm. The first step of this algorithm is cluster assignment where we assign each point to the cluster centroid, and it's possible to show mathematically that what cluster assignment step is doing is exactly minimizing with respect to the variables while holding the cluster centroids up to fixed. So what the cluster assignment does is it doesn't change the cluster centroids, what it's doing is picking the value of up to that minimizes the cost function or the distortion function . And the second step of K-means was the move centroid step. It can be shown mathematically, what the move centroid step does is, it chooses the value of that minimizes . It minimizes with respect to the locations of the cluster centroids, through . So what K-means is really doing is it taking the two sets of variables and partitioning them into two halves right here. First the set of variables and then the set of variables. And what it does is it first minimizes with respect to the variable , then minimizes with respect to , and then it keeps on iterating. And now that we understand K-means, let's try to minimize this cost function . We can also use this to try to debug our learning algorithm, and make sure that our implementation of K-means is running correctly. So, we now understand the K-means algorithm as trying to optimize this cost function , which is also called the distortion function. We can use that to debug K-means and help me show that K-means is converging, and that's running properly. In the next video, we'll also see how we can use this to help K-means find better clusters and help K-means to avoid local optima.
<end>