Translate

Tuesday, September 25, 2012

DATA WAREHOUSING AND MINIG ENGINEERING LECTURE NOTES--Data reduction:


Data reduction:

  1. Reducing the number of attributes
    • Data cube aggregation: applying roll-up, slice or dice operations.
    • Removing irrelevant attributes: attribute selection (filtering and wrapper methods), searching the attribute space
    • Principle component analysis (numeric attributes only): searching for a lower dimensional space that can best represent the data..
  2. Reducing the number of attribute values
    • Binning (histograms): reducing the number of attributes by grouping them into intervals (bins).
    • Clustering: grouping values in clusters.
    • Aggregation or generalization
  3. Reducing the number of tuples
    • Sampling

 

Data reduction is the transformation of numerical or alphabetical digital information derived empirical or experimentally into a corrected, ordered, and simplified form.


Columns and rows are moved around until a diagonal pattern appears, thereby making it easy to see patterns in the data.

When information is derived from instrument readings there may also be a transformation from analog to digital form. When the data are already in digital form the 'reduction' of the data typically involves some editing, scaling, coding, sorting, collating, and producing tabular summaries. When the observations are discrete but the underlying phenomenon is continuous then smoothing and interpolation are often needed. Often the data reduction is undertaken in the presence of reading or measurement errors. Some idea of the nature of these errors is needed before the most likely value may be determined.

Coding of data reduction:

Coding involves three stages:

Open coding

Data is broken down and examined. The aim is to identify all the key statements in the interviews that relate to the aims of your research and your research problem. After identifying the key statements you can then put the key points that relate to each other into categories giving a suitable heading for each category.

Axial coding

After the open coding stage, this stage is to put the data back together and part of this process means re-reading the data you’ve collected so you can make precise explanations about the area of interest. During this stage new categories may be developed and used. Questions like this are asked usually in the axial stage – Can I put certain codes together under a more general code than keeping them separate in two?

Selective coding

This is the final stage of coding, this involves aiming to make the finishing touches to your categories and finish so you can group them together. When grouped together, you will then have to produce diagrams to show how your categories link together. The key part of this is to select a main category, which will form the main focal point of your diagram. Also you will need to look for contradictive data on previous research rather than data which support it.

These are common techniques used in data reduction.

  • Order by some aspect of size.
  • 'Diagonalizable' tables, so that unordered categories are re-arranged to make patterns easier to see.
  • Use averages to provide a visual focus as well as a summary.
  • Use layout and labeling to guide the eye.
  • Remove chart junk such as pictures and lines.

 

 

No comments:

Post a Comment