Translate

Wednesday, September 26, 2012

DATA WAREHOUSING AND MINIG ENGINEERING LECTURE NOTES--Support Vector Machines


Support Vector Machines

 

The Case When the Data Are Linearly Separable

To explain the mystery of SVMs, let’s first look at the simplest case—a two-class problem where the classes are linearly separable. Let the data set D be given as (X1, y1), (X2, y2), : : : , (XjDj, yjDj),whereXi is the setof training tupleswith associatedclass labels, yi. Each yi can take one of two values, either+1 or􀀀1 (i.e., yi 2 f+1, 􀀀1g), corresponding to the classes buys computer = yes and buys computer = no, respectively. To aid in visualization, let’s consider an example based on two input attributes, A1 and A2. From the graph, we see that the 2-D data are linearly separable (or “linear,” for short) because a straight line can be drawn to separate all of the tuples of class+1 from all of the tuples of class􀀀1. There are an infinite number of separating lines that could be drawn.We want to find the “best” one, that is, one that (we hope) will have the minimum classification error on previously unseen tuples.How can we find this best line?Note that if our datawere 3-D (i.e.,with three attributes),wewould want to find the best separating plane. Generalizing to n dimensions, we want to find the best hyperplane.We will use the term“hyperplane” to refer to the decision boundary that we are seeking, regardless of the number of input attributes.


 

A separating hyperplane can be written as W_X+b = 0;

 

whereW is a weight vector, namely,W = fw1, w2, : : : , wng; n is the number of attributes; and b is a scalar, often referred to as a bias. To aid in visualization, let’s consider two input attributes, A1 and A2,. Training tuples are 2-D, e.g., X = (x1, x2), where x1 and x2 are the values of attributes A1 and A2, respectively, for X. If we think of b as an additional weight, w0, we can rewrite the above separating hyperplane as w0+w1x1+w2x2 = 0:



Thus, any point that lies above the separating hyperplane satisfies  w0+w1x1+w2x2 > 0:

Similarly, any point that lies below the separating hyperplane satisfies w0+w1x1+w2x2 < 0:

 



 

The weights can be adjusted so that the hyperplanes defining the “sides” of the margin can be written as

H1 : w0+w1x1+w2x2 _ 1 for yi = +1, and

H2 : w0+w1x1+w2x2 _ 􀀀1 for yi = 􀀀1:

That is, any tuple that falls on or above H1 belongs to class +1, and any tuple that falls on or below H2 belongs to class 􀀀1.

 

yi(w0+w1x1+w2x2) _ 1, 8i.

 

Any training tuples that fall on hyperplanes H1 or H2 (i.e., the “sides” defining the margin) and are called support vectors. That is, they are equally close to the (separating) MMH. the support vectors are shown encircled with a thicker border. Essentially, the support vectors are the most difficult tuples to  classify and give the most information regarding classification. From the above, we can obtain a formulae for the size of the maximal margin. The distance from the separating hyperplane to any point on H1 is 1 jjWjj , where jjWjj is the Euclidean norm of W, that is p

W_W.9 By definition, this is equal to the distance from any point on H2 to the separating hyperplane. Therefore, the maximal margin is 2 jjWjj .

“So, howdoes anSVMfind theMMHand the support vectors?”Using some “fancy math tricks,”

No comments:

Post a Comment