Translate

Thursday, September 6, 2012

LECTURE NOTES DBMS NOTES PART 5


 
DBMS NOTES PART 5
 
61. Why is XML a better markup language than HTML?

XML is a better markup language than HTML, primarily because XML provides a clear

separation between document structure, content, and materialization. Symbols cannot be used

ambiguously with XML.

62. What are the two means to describe the content of XML documents?

DTD (Document Type Declarations) and XML Schemas. An XML document that conforms to its

DTD is called type-valid. A document can be well-formed and not be type-valid, either because

it violates the structure of its DTD or because it has no DTD. However, DTDs have limitations

and to overcome these limits XML Schemas were created. XML Schemas are XML documents

that are the preferred method for defining document structure.

63. What is the difference between simple elements and complexType elements?

Simple elements have only one data value. ComplexType elements can have multiple

elements nested within them. ComplexTypes may also have attributes. The elements

contained in a complexType may be simple or other complexTypes. ComplexTypes may also

define element sequences.

64. What is ADO.NET?

ADO.NET is a new, improved, and greatly expanded version of ADO that was developed for the

Microsoft .NET initiative. ADO.NET incorporates all of the functionality of ADO, but adds much

more. In particular, ADO.NET facilitates the transformation of XML documents to and from

database data. Most importantly, ADO.NET introduces the concept of datasets, which are inmemory,

full-function, independent databases.

65. What is a dataset?

A dataset is an in-memory database that is disconnected from any regular database, but has

all the important characteristics of a regular database. Datasets can have multiple tables,

relationships, referential integrity rules, referential integrity actions, views and the equivalent

of triggers. Datasets are needed to provide a standardized, non-proprietary means to process

database views. They are especially important for the processing of views with multiple

multivalued paths.

66. Briefly describe the four JDBC driver types that Sun defines.

Type 1 drivers provide a bridge between Java and ODBC. Types 2-4 drivers are written

entirely in Java, but differ as to how they connect to the DBMS. Type 2 drivers rely on the

DBMS product for intermachine communication, if any. Type 3 drivers translate JDBC calls into

a DBMS-independent network protocol. Type 4 drivers translate JDBC calls into a DBMSdependent

network protocol.

67. What is the difference between a Java servlet and a Java applet?

An applet is a compiled Java bytecode program that is transmitted to a browser via HTTP and

is invoked using the HTTP protocol on the user's workstation. A servlet is a Java program that

is invoked on the server to respond to HTTP requests.

68. What is the coding pattern for using a JDBC driver?

1. Load the driver.

2. Establish a connection to the database.

3. Create a statement.

4. Do something with the statement.

69. What is a Java bean?

A Java bean is a Java class that has the following three properties. (1) There are no public

instance variables. (2) All persistent values are accessed using methods named getxxx and

setxxx. (3) The bean class must either have no constructors or it must have one explicitly

defined zero-argument constructor.

70. How are surrogate keys and metadata handled in MySQL?

MySQL uses integer data types combined with the property AUTO_INCREMENT to create

surrogate keys. This creates a sequence that starts at one (1) and increases by one (1) for

each new record. MySql maintain its metadata in a database named mysql. For example, this

database maintains two tables named user and db.

71. What is a data mart?

A data mart is a collection of data smaller in scope and size than a data warehouse. It is

dedicated to data from a particular business component of business functional area. A data

mart may function as a subset of a larger data warehouse. Users of a data mart are usually

knowledgeable analysts in the business area using the data mart.

72. What is RFM analysis?

RFM analysis is a Business Intelligence (BI) reporting system that analyzes and ranks

customers based on their purchasing patterns. R refers to "how recently" a customer placed

an order, F refers to "how frequently" the customer orders, and M refers to "how much

money" the customer spends. Typically, the customers are ranked into "20%" groups and

assigned a number to represent their ranking. Thus 1 means top 20%, 2 the next 20% and so

on. In this system a score of 1 is best and a score of 5 is worst. Thus a customer with an RFM

score = 1 5 1 would be one who has ordered recently, does not order frequently, and who

makes large purchases.

73. What are the functions of a reporting system?

A reporting system has three functions: 1. Report authoring -- connecting to data sources,

creating the report structure and formatting the report. 2. Report management -- defining

who receives which reports, when they receive them and how the reports are delivered. 3.

Report delivery -- based on report management metadata, either pushing the reports to the

recipients or allowing them to be pulled by the recipients.

74. What is OLAP?

OnLine Analytical Processing (OLAP) is a Business Intelligence (BI) reporting system. OLAP

provides the user with the capability to sum, count, average and do other simple arithmetic

operations on groups of data. An OLAP report has measures and dimensions. Measures are the

data values to be displayed. Dimensions are characteristics of the measures. OLAP reports are

called OLAP cubes, although such reports are not limited to three dimensions.

75. What is market basket analysis?

Market basket analysis is a data mining technique that determines which sets of products tend

to be purchased together. A common technique uses conditional probabilities. In addition to

the basic probability that an item will be purchased, three results are of particular interest:

Support -- the probability of two items being purchased together.

Confidence -- the probability of a second item being purchased GIVEN that another item has

been purchased.

Lift -- calculated as confidence divided by a basic probability, this shows the likelihood of a

second item being purchased IF an item is purchased.

76. Explain the differences between structured data and unstructured data.

Structured data are facts concerning objects and events. The most important structured data

are numeric, character, and dates. Structured data are stored in tabular form. Unstructured

data are multimedia data such as documents, photographs, maps, images, sound, and video

clips. Unstructured data are most commonly found on Web servers and Web-enabled

databases.

77. Explain why it is still necessary to have at least some familiarity with file

processing systems even though it has become evident that traditional file

processing systems have a number of shortcomings and limitations.

Many businesses still use file processing systems today. This is especially true in the creation

of backups for a database system. In addition, if you understand some of the limitations of a

file processing system such as program-data dependence, duplication of data, limited data

sharing, lengthy development times, and excessive program maintenance, you can try and

avoid them as you design and develop a databases.

78. What are some of the disadvantages associated with conventional file processing

systems?

There are five disadvantages. Program-data dependence occurs when file descriptions need to

be changed in all programs whenever a file description changes. Duplication of data is storing

the data more than one time. Limited data sharing occurs when the files are private so no one

outside of one application can access the data. Lengthy development times exist because file

processing systems takes longer to develop. Lastly, excessive program maintenance exists

since the effort to maintain a program is larger in this environment.

79. The range of database applications can be divided into five categories. Explain

the five different categories.

Databases can support from a single user (personal database) up to supporting the requests

of the world (internet database). In between, a database can support a workgroup (a

relatively small group of people), department database (a functional unit in an organization

such as marketing), or an enterprise database (entire organization).

80. Explain the differences between an intranet and an extranet.

An Internet database is accessible by everyone who has access to a Web site. An intranet

database limits access to only people within a given organization. An extranet database limits

access to only people within a company and a company's customers and suppliers.

81. Briefly explain the five components of an Information Systems Architecture and

their relationship to data.

Processes that manipulate the data. Networks that transport the data. People who process and

use the data. Events and point in time when processes need to be performed with the data.

Reasons for events and rules to govern the processing of the data.

82. Explain the systems development life cycle.

This is the traditional method to develop, maintain, and replace information systems that

includes the project identification and selection, project initiation and planning, analysis,

logical design, physical design, implementation, and maintenance steps. The process is a

series of steps that are ideally completed in a linear fashion. In reality, the process is not

linear and the process requires steps to be revisited and an overlap of steps.

83. Explain the differences of the two principal types of packaged data models.

Universal data models are common to many organizations. These models may be useful for

similar functions that are used across companies or organizations such as purchasing and

accounting. Industry-specific data models are used by specific industries.

84. Who can make up a systems or database team? Provide a brief explanation of

each.

The team includes a system analyst who identifies the need for information services to meet

opportunities of the business, database analysts who design the database, users who monitor

that the system will meet their needs, programmers who write computer programs, database

and data administrators who have responsibility for current and future databases and other

technical experts.

85. Briefly describe the six database activities that occur during the systems

development life cycle.

The enterprise modeling that analyzes the current data processing. Conceptual data modeling

that identifies entities, relationships, and attributes. The logical database design that identifies

data integrity and security requirements. The physical database design and definition that

defines the database to a DBMS. The database implementation that installs and converts data

from prior systems. Database maintenance that fixes errors in the database and database

applications.

86. Briefly explain an ERD.

An ERD is a detailed logical representation of the data for an organization. The ERD includes

entities, attributes, relationships, and cardinalities. An ERD is the mechanism where an entityrelationship

model is displayed.

87. List some of the chrematistics of good data definitions.

Definitions are gathered from the same sources and should be accompanied diagrams. A

definition will include special conditions, examples, how the data is created, whether the data

can change, who owns the data, whether the data is optional, and whether the data can be

broken into something more atomic.

88. Explain minimum and maximum cardinality.

Minimum cardinality is the minimum number of instances of an entity that can be associated

with each instance of another entity. Maximum cardinality is the maximum number of

instances of an entity that can be associated with each instance of another entity.

89. Describe the naming of Relationships?

Relationships are verb phrases and represent actions usually in the present tense. The name

should exclude vague names such as "has". The name should explain what action is being

taken and possibly why it is important.

90. Why is modeling time-dependent data with a time stamp important?

The values of data may change. A time stamp helps to ensure that the previous value of the

data stays in the database after it has changed so that you can see the before and after values

through time. Without a time stamp, you will most likely lose some of the history.

91. Explain the difference between total specialization and partial specialization.

Total specialization exists when every instance of a supertype must also be an instance of a

subtype. Partial specialization exists when every instance of a supertype does not have to be

an instance of a subtype.

92. Explain the difference between an ERD and EER.

An EER includes everything in an ERD and an EER allows for more complex relationships than

an ERD. An EER allows for object-oriented data modeling and include supertypes and subtypes

entities and inheritance.

93. Explain the difference between the disjoint and overlap rule.

The disjoint rule states an entity instance of a supertype can only be a member of one

subtype. The overlap rule states an entity instance of a supertype can be a member of

multiple subtypes.

94. List the three types of business rules and define each of them.

A derivation is a statement that is derived from other knowledge. A structured assertion is a

statement that expresses some aspect of the static structure of an organization. An action

assertion is a statement of a constraint on the actions of an organization.

95. Explain how a scenario is used for business rules.

A scenario is used to test business rules. It is a short script that describes how a business

reacts to certain situations.

96. Explain some of the main goals of normalization.

Normalization should minimize data redundancy. It should also simplify referential integrity

constraints. Normalization will also make it easier to insert, update, and delete data. And

finally, it provides better design.

97. List some of the properties of a relation.

Relations in a database have a unique name and no multivalued attributes exist. Each row is

unique and each attribute within a relation has a unique name. The sequence of both columns

and rows is irrelevant.

98. Explain what needs to happen to convert a relation to third normal form

First you must verify that a relation is in both first normal form and second normal form. If the

relation is not, you must convert into second normal form. After a relation is in second normal

form, you must remove all transitive dependencies.

99. Describe how a supertype/subtype relationship is mapped into a relation.

A separate relation is created for each supertype and subtype. The attributes common for all

of the subtypes are assigned to the supertype. Each subtype has the primary key from the

supertype assigned to it. A subtype discriminator is added to the supertype.

100. Describe domain constraints.

Domain constraints include entity integrity and referential integrity. The domain is a set of

values that may be assigned to an attribute. The entity integrity rule states that no part of a

primary key cannot be null. Referential integrity states that each foreign key value must

match a primary key value or be null.

101. What are the four objectives of the selection of a data type?

A data type should be selected so that all possible values are represented using minimal

storage space. The data type should help to ensure data integrity and support all possible data

manipulations (i.e., cannot place a letter in a field such as salary where a number is required).

102. Describe the four types of indexes.

A unique primary index is unique and is used to find and store a row. A nonunique primary

index is not unique and is used to find a row but also where to store a row (based on its

unique primary index). A unique secondary index is unique for each row and used to find table

rows. A nonunique secondary index is not unique and used to find table rows.

103. What is denormalization and why would someone consider doing so?

Denormalization is the process of taking normalized relations and changing them so that they

are not longer normalized. This process may lead to anomalies and create data redundancy as

negative consequences. However, the revised relations should improve database performance.

104. Compare a hierarchical and network database model?

The hierarchical model is a top-down structure where each parent may have many children

but each child can have only one parent. This model supports one-to-one and one-to-many

relationships. The network model can be much more flexible than the hierarchical model since

each parent can have multiple children but each child can also have multiple parents. This

model supports one-to-one, one-to-many, and many-to-many relationships.

105. Describe the differences between vertical and horizontal portioning.

Horizontal portioning is where the rows in a relation are separated by some criteria and placed

into a new relation or file with the same layout as the original relation (in this case only the

records in each file differ). Vertical portioning is where the columns in a relation are separated

by some criteria and placed into a new relation or file with a different layout as the original

relation.

106. Explain the difference between a dynamic and materialized view.

A dynamic view may be created every time that a specific view is requested by a user. A

materialized view is created and or updated infrequently and it must be synchronized with its

associated base table(s).

107. Discuss some of the techniques that can be used to tune operational

performance.

Choosing primary and secondary keys can increase the speed of row selection, joining, and

row ordering. Selecting the appropriate file organization for base tables and indexes can also

improve performance. Clustering related rows together and maintaining statistics about tables

and indexes can lead to increased efficiency.

108. Briefly describe the three types of SQL commands.

Data definition language commands are used to create, alter, and drop tables. Data

manipulation commands are used to insert, modify, update, and query data in the database.

Data control language commands help the DBA to control the database.

109. What are the steps to follow when preparing to create a table?

1. Identify the data type, length, and precision for each attribute. 2. Identify the columns that

can accept a null value. 3. Identify the columns that need to be unique. 4. Identify primary

and related foreign keys with the parent table being created before the child. 5. Determine

default values. 6. Determine where the domain values are that need to be constrained. 7.

Create the indexes.

110. What are some disadvantages of a standard language such as SQL?

A standard language can hinder the effort to create a new language. One standard is never

enough to meet all of the business needs. A standard can be a compromise among interested

parties which can cause the standard to not be ideal. If a standard is altered by some, than

portability between platforms could be hurt.

No comments:

Post a Comment