Translate

Monday, September 24, 2012

DATA WAREHOUSING AND MINIG ENGINEERING LECTURE NOTES--Definition of Data warehouse:


Definition of Data warehouse:

1.     A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision-making process.

2.     A decision support database that is maintained separately from the organization’s operational database.

3.     Support information processing by providing a solid platform of consolidated, historical data for analysis.

Why Data warehouse is subject oriented?

·        Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing

·        Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.

Why Data warehouse is integrated?

·        A data warehouse is constructed by integrating multiple, heterogeneous data sources like relational databases, flat files, on-line transaction records.

·        Data cleaning and data integration techniques are applied to ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources.

·        When data is moved to the warehouse, it is converted. 

 

Why Data warehouse is time variant?

·        The time horizon for the data warehouse is significantly longer than that of operational systems:

1.     Operational database: current value data.

2.     Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years)

·        Every key structure in the data warehouse Contains an element of time, explicitly or implicitly But the key of operational data may or may not contain “time element”

Why Data warehouse is non-volatile?

·        A physically separate store of data transformed from the operational environment.

·        Operational update of data does not occur in the data warehouse environment:

1.     Does not require transaction processing, recovery, and concurrency control mechanisms

2.     Requires only two operations in data accessing:

 

 

 

 

 

 

No comments:

Post a Comment