Data Warehouse Components:
The data warehouse
architecture is based on a relational database management system server that
functions as the central repository for informational data. Operational data
and processing is completely separated from data warehouse processing. This
central information repository is surrounded by a number of key components
designed to make the entire environment functional, manageable and accessible
by both the operational systems that source data into the warehouse and by
end-user query and analysis tools.
Typically, the source
data for the warehouse is coming from the operational applications. As the data
enters the warehouse, it is cleaned up and transformed into an integrated
structure and format. The transformation process may involve conversion, summarization,
filtering and condensation of data. Because the data contains a historical
component, the warehouse must be capable of holding and managing large volumes
of data as well as different data structures for the same database over time.
Seven
Components of a Data warehouse:
Data Warehouse Database: The central data warehouse database is the cornerstone of the data
warehousing environment. This database is almost always implemented on the
relational database management system (RDBMS) technology. However, this kind of
implementation is often constrained by the fact that traditional RDBMS products
are optimized for transactional database processing.
Sourcing,
Acquisition, Cleanup and Transformation Tools: A significant portion of the implementation effort is
spent extracting data from operational systems and putting it in a format
suitable for informational applications that run off the data warehouse.
The data sourcing, cleanup, transformation and migration tools
perform all of the conversions, summarizations, key changes, structural changes
and condensations needed to transform disparate data into information that can
be used by the decision support tool. They produce the programs and control
statements, including the COBOL programs, MVS job-control language (JCL), UNIX
scripts, and SQL data definition language (DDL) needed to move data into the
data warehouse for multiple operational systems. These tools also maintain the
meta data. The functionality includes:
·
Removing unwanted data
from operational databases
·
Converting to common
data names and definitions
·
Establishing defaults
for missing data
·
Accommodating source
data definition changes
The data sourcing,
cleanup, extract, transformation and migration tools have to deal with some
significant issues including:
·
Database heterogeneity.
DBMSs are very different in data models, data access language, data navigation,
operations, concurrency, integrity, recovery etc.
·
Data heterogeneity. This
is the difference in the way data is defined and used in different models -
homonyms, synonyms, unit compatibility (U.S. vs. metric), different attributes
for the same entity and different ways of modeling the same fact.
These tools can save a considerable amount of time and effort.
However, significant shortcomings do exist. For example, many available tools
are generally useful for simpler data extracts. Frequently, customized extract
routines need to be developed for the more complicated data extraction
procedures.
Meta data: Meta data is data about data that describes the data
warehouse. It is used for building, maintaining, managing and using the data
warehouse. Meta data can be classified into:
·
Technical meta data,
which contains information about warehouse data for use by warehouse designers
and administrators when carrying out warehouse development and management
tasks.
·
Business meta data,
which contains information that gives users an easy-to-understand perspective
of the information stored in the data warehouse.
Equally important, meta data provides interactive access to users
to help understand content and find data. One of the issues dealing with meta
data relates to the fact that many data extraction tool capabilities to gather
meta data remain fairly immature. Therefore, there is often the need to create
a meta data interface for users, which may involve some duplication of effort.
Access Tools: The principal purpose of data warehousing is to provide
information to business users for strategic decision-making. These users
interact with the data warehouse using front-end tools. Many of these tools
require an information specialist, although many end users develop expertise in
the tools. Tools fall into four main categories: query and reporting tools,
application development tools, online analytical processing tools, and data
mining tools.
Query and Reporting
tools can be divided into two groups: reporting tools and managed query tools.
Reporting tools can be further divided into production reporting tools and
report writers. Production reporting tools let companies generate regular
operational reports or support high-volume batch jobs such as calculating and
printing paychecks. Report writers, on the other hand, are inexpensive desktop
tools designed for end-users.
OLAP tools
are based on the concepts of dimensional data models and corresponding
databases, and allow users to analyze the data using elaborate,
multidimensional views. Typical business applications include product
performance and profitability, effectiveness of a sales program or marketing
campaign, sales forecasting and capacity planning. These tools assume that the
data is organized in a multidimensional model.
Data Marts: The concept of a data mart is causing a lot of excitement and
attracts much attention in the data warehouse industry. Mostly, data marts are
presented as an alternative to a data warehouse that takes significantly less
time and money to build. However, the term data mart means different things to
different people. A rigorous definition of this term is a data store that is
subsidiary to a data warehouse of integrated data. The data mart is directed at
a partition of data (often called a subject area) that is created for the use
of a dedicated group of users. A data mart might, in fact, be a set of
denormalized, summarized, or aggregated data. Sometimes, such a set could be
placed on the data warehouse rather than a physically separate store of data.
In most instances, however, the data mart is a physically separate store of
data and is resident on separate database server, often a local area network
serving a dedicated user group. Sometimes the data mart simply comprises
relational OLAP technology which creates highly denormalized dimensional model
(e.g., star schema) implemented on a relational database. The resulting hypercube
of data are used for analysis by groups of users with a common interest in a
limited portion of the database.
Data Warehouse Administration
and Management: Data warehouses tend
to be as much as 4 times as large as related operational databases, reaching
terabytes in size depending on how much history needs to be saved. They are not
synchronized in real time to the associated operational data but are updated as
often as once a day if the application requires it.
In addition, almost all
data warehouse products include gateways to transparently access multiple
enterprise data sources without having to rewrite applications to interpret and
utilize the data. Furthermore, in a heterogeneous data warehouse environment,
the various databases reside on disparate systems, thus requiring
inter-networking tools. The need to manage this environment is obvious.
Managing data
warehouses includes security and priority management; monitoring updates from
the multiple sources; data quality checks; managing and updating meta data;
auditing and reporting data warehouse usage and status; purging data;
replicating, sub setting and distributing data; backup and recovery and data
warehouse storage management.
Information Delivery System: The information delivery component is used to enable the process
of subscribing for data warehouse information and having it delivered to one or
more destinations according to some user-specified scheduling algorithm. In
other words, the information delivery system distributes warehouse-stored data
and other information objects to other data warehouses and end-user products
such as spreadsheets and local databases. Delivery of information may be based
on time of day or on the completion of an external event. The rationale for the
delivery systems component is based on the fact that once the data warehouse is
installed and operational, its users don't have to be aware of its location and
maintenance. All they need is the report or an analytical view of data at a
specific point in time. With the proliferation of the Internet and the World
Wide Web such a delivery system may leverage the convenience of the Internet by
delivering warehouse-enabled information to thousands of end-users via the ubiquitous world
wide network.
Data warehousing is being used by various other IT Companies like Oracle, Microsoft, etc. for developing their own
ReplyDeleteData Warehousing Solutions.
Find data warehousing Jobs at Naukri.com. Post your resume now to view & Apply to jobs in Teaching.
ReplyDeleteIt was really a nice article and I was really impressed by reading this article We are also giving all software Course Online Training. The Data Warehousing Courses Online Training is one of the leading Online Training institute in the world.
ReplyDeleteThanks for the informative article. This is one of the best resources I have found in quite some time. Nicely written and great info. I really cannot thank you enough for sharing.
ReplyDeleteDataware Housing Training in Chennai
Such a great articles in my carrier, It's wonderful commands like easiest understand words of knowledge in information's.
ReplyDeleteData Warehousing Training in Chennai
I feel happy to have seen your webpage and look forward to so many more entertaining times reading here. Thanks once more for all the details.
ReplyDeleteData science Course Training in Chennai |Best Data Science Training Institute in Chennai
RPA Course Training in Chennai |Best RPA Training Institute in Chennai
AWS Course Training in Chennai |Best AWS Training Institute in Chennai
Good Post. I like your blog. Thanks for Sharing.
ReplyDeleteData Warehousing Training in Noida