shareengineer: January 2013

Sunday, January 6, 2013

MODELING AND DESIGN IN OODBMS

Basically, an OODBMS is an object database that provides DBMS capabilities to objects thathave been created using an object-oriented programming language (OOPL). The basic principle is to add persistence to objects and to make objects persistent.

Consequently application programmers who use OODBMSs typically write programs in a native OOPL such as Java, C++ or Smalltalk, and the language has some kind of Persistent class, Database class, Database Interface, or Database API that provides DBMS functionality as, effectively, an extension of the OOPL.

Object-oriented DBMSs, however, go much beyond simply adding persistence to any one object-oriented programming language. This is because, historically, many object-oriented DBMSs were built to serve the market for computer-aided design/computer-aided manufacturing (CAD/CAM) applications in which features like fast navigational access,

versions, and long transactions are extremely important.

Object-oriented DBMSs, therefore, support advanced object-oriented database applications with features like support for persistent objects from more than one programming language, distribution of data, advanced transaction models, versions, schema evolution, and dynamic generation of new types.

Object data modeling

An object consists of three parts: structure (attribute, and relationship to other objects like aggregation, and association), behavior (a set of operations) and characteristic of types (generalization/serialization). An object is similar to an entity in ER model; therefore we begin with an example to demonstrate the structure and relationship.

Attributes are like the fields in a relational model. However in the Book example we have,for attributes publishedBy and writtenBy, complex types Publisher and Author,which are also objects. Attributes with complex objects, in RDNS, are usually other tableslinked by keys to the employee table.

Relationships: publish and writtenBy are associations with I:N and 1:1 relationship; composed_of is an aggregation (a Book is composed of chapters). The 1:N relationship is usually realized as attributes through complex types and at the behavioral level. For example,

Generalization/Serialization is the is_a relationship, which is supported in OODB through class hierarchy. An ArtBook is a Book, therefore the ArtBook class is a subclass of Book class. A subclass inherits all the attribute and method of its superclass.

Message: means by which objects communicate, and it is a request from one object to another to execute one of its methods. For example:

Publisher_object.insert (”Rose”, 123,…) i.e. request to execute the insert method on a Publisher object )

Method: defines the behavior of an object. Methods can be used

. to change state by modifying its attribute values . to query the value of selected attributes The method that responds to the message example is the method insert defied in the Publisher class.

The main differences between relational database design and object oriented database design include:

• Many-to-many relationships must be removed before entities can

be translated into relations. Many-to-many relationships can be implemented directly in an object-oriented database.

• Operations are not represented in the relational data model.

Operations are one of the main components in an object-oriented

database.

• In the relational data model relationships are implemented by

primary and foreign keys. In the object model objects communicate through their interfaces. The interface describes the data (attributes) and operations (methods) that are visible to other objects.

OBJECT ORIENTED DATA BASE OODB& RELATIONAL DB

TRANSACTION PROCESSING

Transaction

A transaction is a collection of actions that make consistent transformations of system states while preserving system consistency.

à concurrency transparency

à failure transparency

INTRODUCTION TO OBJECT ORIENTED DATA BASES

Object Databases

Ø Became commercially popular in mid 1990’s

Ø You can store the data in the same format as you use it. No paradigm shift.

Ø Did not reach full potential till the classes they store were decoupled from the database schema.

Ø Open source implementation available – low cost solution now exists.

What is Object Oriented Database? (OODB)

Ø A database system that incorporates all the important object-oriented concepts

Ø Some additional features

o Unique Object identifiers

o Persistent object handling

Ø Is the coupling of Object Oriented (OOP) Programming principles with Database Management System (DBMS) principles

o Provides access to persisted objects using the same OO-programming language

Advantages of OODBS

Ø Designer can specify the structure of objects and their behavior (methods)

Ø Better interaction with object-oriented languages such as Java and C++

Ø Definition of complex and user-defined types

Ø Encapsulation of operations and user-defined methods

Object Database Vendors

Ø Matisse Software Inc.,

Ø Objectivity Inc.,

Ø Poet's FastObjects,

Ø Computer Associates,

Ø eXcelon Corporation

Ø Db4o

1.4 QUERY PROCESSING

Ø using client-server architecture

Ø user creates query

Ø client parses and sends to server(s) (SQL?)

Ø servers return appropriate Tables

Ø client combines into one Table

Ø Issue of data transfer cost over a network

o optimise the query to transfer the least amount

Query Processing Components

Ø Query language that is used

o SQL: “intergalactic dataspeak”

Ø Query execution methodology

o The steps that one goes through in executing high-level (declarative) user queries.

Ø Query optimization

o How do we determine the “best” execution plan?

Query Optimization Objectives

Ø Minimize a cost function

§ I/O cost + CPU cost + communication cost

Ø These might have different weights in different distributed environments

Ø Wide area networks

o communication cost will dominate

§ low bandwidth

§ low speed

§ high protocol overhead

o most algorithms ignore all other cost components

Ø Local area networks

o communication cost not that dominant

o total cost function should be considered

Ø Can also maximize throughput

Query Optimization Issues – Types of Optimizers

Ø Exhaustive search

o cost-based

o optimal

o combinatorial complexity in the number of relations

Ø Heuristics

o not optimal

o regroup common sub-expressions

o perform selection, projection first

o replace a join by a series of semijoins

o reorder operations to reduce intermediate relation size

o optimize individual operations

Optimization Granularity

Ø Single query at a time

o cannot use common intermediate results

Ø Multiple queries at a time

o efficient if many similar queries

o decision space is much larger

Optimization Timing

Ø Static

o compilation Þ optimize prior to the execution

o difficult to estimate the size of the intermediate results Þ error propagation

o can amortize over many executions

o R*

Ø Dynamic

o run time optimization

o exact information on the intermediate relation sizes

o have to reoptimize for multiple executions

o Distributed INGRES

Ø Hybrid

o compile using a static algorithm

o if the error in estimate sizes > threshold, reoptimize at run time

o MERMAID

Statistics

Ø Relation

o cardinality

o size of a tuple

o fraction of tuples participating in a join with another relation

Ø Attribute

o cardinality of domain

o actual number of distinct values

Ø Common assumptions

o independence between different attribute values

o uniform distribution of attribute values within their domain

Decision Sites

Ø Centralized

o single site determines the “best” schedule

o simple

o need knowledge about the entire distributed database

Ø Distributed

o cooperation among sites to determine the schedule

o need only local information

o cost of cooperation

Ø Hybrid

o one site determines the global schedule

Ø each site optimizes the local subqueries

Network Topology

Ø Wide area networks (WAN) – point-to-point

o characteristics

§ low bandwidth

§ low speed

§ high protocol overhead

o communication cost will dominate; ignore all other cost factors

o global schedule to minimize communication cost

o local schedules according to centralized query optimization

Ø Local area networks (LAN)

o communication cost not that dominant

o total cost function should be considered

o broadcasting can be exploited (joins)

o special algorithms exist for star networks

Step 1 – Query Decomposition

§ Input : Calculus query on global relations

Ø Normalization

o manipulate query quantifiers and qualification

Ø Analysis

o detect and reject “incorrect” queries

o possible for only a subset of relational calculus

Ø Simplification

o eliminate redundant predicates

Ø Restructuring

o calculus query Þ algebraic query

o more than one translation is possible

o use transformation rules

Step 2 – Data Localization

Ø Input: Algebraic query on distributed relations

Ø Determine which fragments are involved

Ø Localization program

o substitute for each global query its materialization program

o optimize

Step 3 – Global Query Optimization

Ø Input: Fragment query

Ø Find the best (not necessarily optimal) global schedule

o Minimize a cost function

o Distributed join processing

§ Bushy vs. linear trees

§ Which relation to ship where?

§ Ship-whole vs ship-as-needed

o Decide on the use of semijoins

§ Semijoin saves on communication at the expense of more local processing.

o Join methods

§ nested loop vs ordered joins (merge join or hash join)

Centralized Query Optimization

Ø INGRES

o dynamic

o interpretive

Ø System R

o static

o exhaustive search

12 Rules of DDBMS (Date, 1987)

1. Local autonomy

2. No reliance on a central site

3. Continuous operation

4. Location independence

5. Fragmentation independence

6. Replication independence

7. Distributed Query processing

8. Distributed transaction processing

9. Hardware independence

10. Operating System independence

11. Network independence

12. Database independence

shareengineer

Pages

Translate