Translate

Wednesday, September 26, 2012

DATA WAREHOUSING AND MINIG ENGINEERING LECTURE NOTES--Accuracy and Error Measures:


Accuracy and Error Measures:

All models must be assessed somehow. Despite the existence of a bewildering array of performance measures, much commercial modeling software provides a surprisingly limited range of options.

Mean Squared Error (MSE) is by far the most common measure of numerical model performance. It is simply the average of the squares of the differences between the predicted and actual values. It is a reasonably good measure of performance, though it could be argued that it overemphasizes the importance of larger errors. Many modeling procedures directly minimize it.


Mean Absolute Error (MAE) is similar to the Mean Squared Error, but it uses absolute values instead of squaring. This measure is not as popular as MSE, though its meaning is more intuitive .

Bias is the average of the differences between the predicted and actual values. With this measure, positive errors cancel out negative ones. Bias is intended to assess how much higher or lower predictions are, on average, than actual values.

Mean Absolute Percent Error (MAPE) is the average of the absolute errors, as a percentage of the actual values. This is a relative measure of error, which is useful when larger errors are more acceptable on larger actual values.


Accuracy is the proportion of the time that the predicted class equals the actual class, usually expressed as a percentage. It's meaning is straightforward, but may obscure important differences in costs associated with different errors. The classic example of such costs is the medical diagnostic situation, in which one can err be either: 1. keeping a healthy patient in the hospital (low cost), or 2. sending home a sick patient (very high cost).

These classifiers need to be checked for both the accuracy of their probabilities (Do cases predicted to have a 5% (30%, 80%, etc.) probability really belong to the target class 5% (30%, 80%, etc.) of the time?) and their ability to separate the classes in question.

Accuracy can be measured using many of the same metrics used to evaluate numerical models (MSE, MAE, etc.).
Some applications (as in marketing) are focused on how many items from the target class can be identified in the best so-many percent of the population. If for example, one only has the resources to mail marketing literature to 10% of the customer file, the ideal would be to pack as many actual respondents as possible into that best 10%. The mirror situation is typified by lenders who wish to cram as many bad loans as possible into the worst 10% of their file. Probably the most popular measure of class separation at present in the literature is the Area Under the ROC Curve (AUC or AUROC), which is like measuring separation across the whole spectrum.

The intrepid data miner is invited to explore these performance measures and related topics on his or her own:

confusion matrix
F-measure
sensitivity and specificity

No comments:

Post a Comment