DBMS NOTES PART 5
61. Why is
XML a better markup language than HTML?
XML is a
better markup language than HTML, primarily because XML provides a clear
separation
between document structure, content, and materialization. Symbols cannot be
used
ambiguously
with XML.
62. What are
the two means to describe the content of XML documents?
DTD
(Document Type Declarations) and XML Schemas. An XML document that conforms to
its
DTD is
called type-valid. A document can be well-formed and not be type-valid, either
because
it violates
the structure of its DTD or because it has no DTD. However, DTDs have
limitations
and to
overcome these limits XML Schemas were created. XML Schemas are XML documents
that are the
preferred method for defining document structure.
63. What is
the difference between simple elements and complexType elements?
Simple
elements have only one data value. ComplexType elements can have multiple
elements
nested within them. ComplexTypes may also have attributes. The elements
contained in
a complexType may be simple or other complexTypes. ComplexTypes may also
define
element sequences.
64. What is
ADO.NET?
ADO.NET is a
new, improved, and greatly expanded version of ADO that was developed for the
Microsoft
.NET initiative. ADO.NET incorporates all of the functionality of ADO, but adds
much
more. In
particular, ADO.NET facilitates the transformation of XML documents to and from
database
data. Most importantly, ADO.NET introduces the concept of datasets, which are
inmemory,
full-function,
independent databases.
65. What is
a dataset?
A dataset is
an in-memory database that is disconnected from any regular database, but has
all the
important characteristics of a regular database. Datasets can have multiple
tables,
relationships,
referential integrity rules, referential integrity actions, views and the
equivalent
of triggers.
Datasets are needed to provide a standardized, non-proprietary means to process
database
views. They are especially important for the processing of views with multiple
multivalued
paths.
66. Briefly
describe the four JDBC driver types that Sun defines.
Type 1
drivers provide a bridge between Java and ODBC. Types 2-4 drivers are written
entirely in
Java, but differ as to how they connect to the DBMS. Type 2 drivers rely on the
DBMS product
for intermachine communication, if any. Type 3 drivers translate JDBC calls
into
a
DBMS-independent network protocol. Type 4 drivers translate JDBC calls into a
DBMSdependent
network
protocol.
67. What is
the difference between a Java servlet and a Java applet?
An applet is
a compiled Java bytecode program that is transmitted to a browser via HTTP and
is invoked
using the HTTP protocol on the user's workstation. A servlet is a Java program
that
is invoked
on the server to respond to HTTP requests.
68. What is
the coding pattern for using a JDBC driver?
1. Load the
driver.
2. Establish
a connection to the database.
3. Create a
statement.
4. Do
something with the statement.
69. What is
a Java bean?
A Java bean
is a Java class that has the following three properties. (1) There are no
public
instance
variables. (2) All persistent values are accessed using methods named getxxx
and
setxxx. (3)
The bean class must either have no constructors or it must have one explicitly
defined
zero-argument constructor.
70. How are
surrogate keys and metadata handled in MySQL?
MySQL uses
integer data types combined with the property AUTO_INCREMENT to create
surrogate
keys. This creates a sequence that starts at one (1) and increases by one (1)
for
each new
record. MySql maintain its metadata in a database named mysql. For example,
this
database
maintains two tables named user and db.
71. What is
a data mart?
A data mart
is a collection of data smaller in scope and size than a data warehouse. It is
dedicated to
data from a particular business component of business functional area. A data
mart may
function as a subset of a larger data warehouse. Users of a data mart are
usually
knowledgeable
analysts in the business area using the data mart.
72. What is
RFM analysis?
RFM analysis
is a Business Intelligence (BI) reporting system that analyzes and ranks
customers
based on their purchasing patterns. R refers to "how recently" a
customer placed
an order, F
refers to "how frequently" the customer orders, and M refers to
"how much
money"
the customer spends. Typically, the customers are ranked into "20%"
groups and
assigned a
number to represent their ranking. Thus 1 means top 20%, 2 the next 20% and so
on. In this
system a score of 1 is best and a score of 5 is worst. Thus a customer with an
RFM
score = 1 5
1 would be one who has ordered recently, does not order frequently, and who
makes large
purchases.
73. What are
the functions of a reporting system?
A reporting
system has three functions: 1. Report authoring -- connecting to data sources,
creating the
report structure and formatting the report. 2. Report management -- defining
who receives
which reports, when they receive them and how the reports are delivered. 3.
Report
delivery -- based on report management metadata, either pushing the reports to
the
recipients
or allowing them to be pulled by the recipients.
74. What is
OLAP?
OnLine
Analytical Processing (OLAP) is a Business Intelligence (BI) reporting system.
OLAP
provides the
user with the capability to sum, count, average and do other simple arithmetic
operations
on groups of data. An OLAP report has measures and dimensions. Measures are the
data values
to be displayed. Dimensions are characteristics of the measures. OLAP reports
are
called OLAP
cubes, although such reports are not limited to three dimensions.
75. What is
market basket analysis?
Market
basket analysis is a data mining technique that determines which sets of
products tend
to be
purchased together. A common technique uses conditional probabilities. In
addition to
the basic
probability that an item will be purchased, three results are of particular
interest:
Support --
the probability of two items being purchased together.
Confidence
-- the probability of a second item being purchased GIVEN that another item has
been
purchased.
Lift --
calculated as confidence divided by a basic probability, this shows the
likelihood of a
second item
being purchased IF an item is purchased.
76. Explain
the differences between structured data and unstructured data.
Structured
data are facts concerning objects and events. The most important structured data
are numeric,
character, and dates. Structured data are stored in tabular form. Unstructured
data are
multimedia data such as documents, photographs, maps, images, sound, and video
clips.
Unstructured data are most commonly found on Web servers and Web-enabled
databases.
77. Explain
why it is still necessary to have at least some familiarity with file
processing
systems even though it has become evident that traditional file
processing
systems have a number of shortcomings and limitations.
Many
businesses still use file processing systems today. This is especially true in
the creation
of backups
for a database system. In addition, if you understand some of the limitations
of a
file
processing system such as program-data dependence, duplication of data, limited
data
sharing,
lengthy development times, and excessive program maintenance, you can try and
avoid them
as you design and develop a databases.
78. What are
some of the disadvantages associated with conventional file processing
systems?
There are
five disadvantages. Program-data dependence occurs when file descriptions need
to
be changed
in all programs whenever a file description changes. Duplication of data is
storing
the data
more than one time. Limited data sharing occurs when the files are private so
no one
outside of
one application can access the data. Lengthy development times exist because
file
processing
systems takes longer to develop. Lastly, excessive program maintenance exists
since the
effort to maintain a program is larger in this environment.
79. The
range of database applications can be divided into five categories. Explain
the five
different categories.
Databases
can support from a single user (personal database) up to supporting the
requests
of the world
(internet database). In between, a database can support a workgroup (a
relatively
small group of people), department database (a functional unit in an
organization
such as
marketing), or an enterprise database (entire organization).
80. Explain
the differences between an intranet and an extranet.
An Internet
database is accessible by everyone who has access to a Web site. An intranet
database
limits access to only people within a given organization. An extranet database
limits
access to
only people within a company and a company's customers and suppliers.
81. Briefly
explain the five components of an Information Systems Architecture and
their
relationship to data.
Processes
that manipulate the data. Networks that transport the data. People who process
and
use the
data. Events and point in time when processes need to be performed with the
data.
Reasons for
events and rules to govern the processing of the data.
82. Explain
the systems development life cycle.
This is the
traditional method to develop, maintain, and replace information systems that
includes the
project identification and selection, project initiation and planning,
analysis,
logical
design, physical design, implementation, and maintenance steps. The process is
a
series of
steps that are ideally completed in a linear fashion. In reality, the process
is not
linear and
the process requires steps to be revisited and an overlap of steps.
83. Explain
the differences of the two principal types of packaged data models.
Universal
data models are common to many organizations. These models may be useful for
similar
functions that are used across companies or organizations such as purchasing
and
accounting.
Industry-specific data models are used by specific industries.
84. Who can
make up a systems or database team? Provide a brief explanation of
each.
The team
includes a system analyst who identifies the need for information services to
meet
opportunities
of the business, database analysts who design the database, users who monitor
that the
system will meet their needs, programmers who write computer programs, database
and data
administrators who have responsibility for current and future databases and
other
technical
experts.
85. Briefly
describe the six database activities that occur during the systems
development
life cycle.
The
enterprise modeling that analyzes the current data processing. Conceptual data
modeling
that
identifies entities, relationships, and attributes. The logical database design
that identifies
data
integrity and security requirements. The physical database design and
definition that
defines the
database to a DBMS. The database implementation that installs and converts data
from prior
systems. Database maintenance that fixes errors in the database and database
applications.
86. Briefly
explain an ERD.
An ERD is a
detailed logical representation of the data for an organization. The ERD
includes
entities,
attributes, relationships, and cardinalities. An ERD is the mechanism where an
entityrelationship
model is
displayed.
87. List
some of the chrematistics of good data definitions.
Definitions
are gathered from the same sources and should be accompanied diagrams. A
definition will
include special conditions, examples, how the data is created, whether the data
can change,
who owns the data, whether the data is optional, and whether the data can be
broken into
something more atomic.
88. Explain
minimum and maximum cardinality.
Minimum
cardinality is the minimum number of instances of an entity that can be
associated
with each
instance of another entity. Maximum cardinality is the maximum number of
instances of
an entity that can be associated with each instance of another entity.
89. Describe
the naming of Relationships?
Relationships
are verb phrases and represent actions usually in the present tense. The name
should
exclude vague names such as "has". The name should explain what
action is being
taken and
possibly why it is important.
90. Why is
modeling time-dependent data with a time stamp important?
The values
of data may change. A time stamp helps to ensure that the previous value of the
data stays
in the database after it has changed so that you can see the before and after
values
through
time. Without a time stamp, you will most likely lose some of the history.
91. Explain
the difference between total specialization and partial specialization.
Total
specialization exists when every instance of a supertype must also be an
instance of a
subtype.
Partial specialization exists when every instance of a supertype does not have
to be
an instance
of a subtype.
92. Explain
the difference between an ERD and EER.
An EER
includes everything in an ERD and an EER allows for more complex relationships
than
an ERD. An
EER allows for object-oriented data modeling and include supertypes and
subtypes
entities and
inheritance.
93. Explain
the difference between the disjoint and overlap rule.
The disjoint
rule states an entity instance of a supertype can only be a member of one
subtype. The
overlap rule states an entity instance of a supertype can be a member of
multiple
subtypes.
94. List the
three types of business rules and define each of them.
A derivation
is a statement that is derived from other knowledge. A structured assertion is
a
statement
that expresses some aspect of the static structure of an organization. An
action
assertion is
a statement of a constraint on the actions of an organization.
95. Explain
how a scenario is used for business rules.
A scenario
is used to test business rules. It is a short script that describes how a
business
reacts to
certain situations.
96. Explain
some of the main goals of normalization.
Normalization
should minimize data redundancy. It should also simplify referential integrity
constraints.
Normalization will also make it easier to insert, update, and delete data. And
finally, it
provides better design.
97. List
some of the properties of a relation.
Relations in
a database have a unique name and no multivalued attributes exist. Each row is
unique and
each attribute within a relation has a unique name. The sequence of both
columns
and rows is
irrelevant.
98. Explain
what needs to happen to convert a relation to third normal form
First you
must verify that a relation is in both first normal form and second normal
form. If the
relation is
not, you must convert into second normal form. After a relation is in second
normal
form, you
must remove all transitive dependencies.
99. Describe
how a supertype/subtype relationship is mapped into a relation.
A separate
relation is created for each supertype and subtype. The attributes common for
all
of the
subtypes are assigned to the supertype. Each subtype has the primary key from
the
supertype
assigned to it. A subtype discriminator is added to the supertype.
100.
Describe domain constraints.
Domain
constraints include entity integrity and referential integrity. The domain is a
set of
values that
may be assigned to an attribute. The entity integrity rule states that no part
of a
primary key
cannot be null. Referential integrity states that each foreign key value must
match a
primary key value or be null.
101. What
are the four objectives of the selection of a data type?
A data type
should be selected so that all possible values are represented using minimal
storage
space. The data type should help to ensure data integrity and support all
possible data
manipulations
(i.e., cannot place a letter in a field such as salary where a number is
required).
102.
Describe the four types of indexes.
A unique
primary index is unique and is used to find and store a row. A nonunique
primary
index is not
unique and is used to find a row but also where to store a row (based on its
unique
primary index). A unique secondary index is unique for each row and used to
find table
rows. A
nonunique secondary index is not unique and used to find table rows.
103. What is
denormalization and why would someone consider doing so?
Denormalization
is the process of taking normalized relations and changing them so that they
are not
longer normalized. This process may lead to anomalies and create data
redundancy as
negative
consequences. However, the revised relations should improve database
performance.
104. Compare
a hierarchical and network database model?
The
hierarchical model is a top-down structure where each parent may have many
children
but each
child can have only one parent. This model supports one-to-one and one-to-many
relationships.
The network model can be much more flexible than the hierarchical model since
each parent
can have multiple children but each child can also have multiple parents. This
model
supports one-to-one, one-to-many, and many-to-many relationships.
105.
Describe the differences between vertical and horizontal portioning.
Horizontal
portioning is where the rows in a relation are separated by some criteria and
placed
into a new
relation or file with the same layout as the original relation (in this case
only the
records in
each file differ). Vertical portioning is where the columns in a relation are
separated
by some
criteria and placed into a new relation or file with a different layout as the
original
relation.
106. Explain
the difference between a dynamic and materialized view.
A dynamic
view may be created every time that a specific view is requested by a user. A
materialized
view is created and or updated infrequently and it must be synchronized with
its
associated
base table(s).
107. Discuss
some of the techniques that can be used to tune operational
performance.
Choosing
primary and secondary keys can increase the speed of row selection, joining,
and
row ordering.
Selecting the appropriate file organization for base tables and indexes can
also
improve
performance. Clustering related rows together and maintaining statistics about
tables
and indexes
can lead to increased efficiency.
108. Briefly
describe the three types of SQL commands.
Data
definition language commands are used to create, alter, and drop tables. Data
manipulation
commands are used to insert, modify, update, and query data in the database.
Data control
language commands help the DBA to control the database.
109. What
are the steps to follow when preparing to create a table?
1. Identify
the data type, length, and precision for each attribute. 2. Identify the
columns that
can accept a
null value. 3. Identify the columns that need to be unique. 4. Identify primary
and related
foreign keys with the parent table being created before the child. 5. Determine
default
values. 6. Determine where the domain values are that need to be constrained.
7.
Create the
indexes.
110. What
are some disadvantages of a standard language such as SQL?
A standard
language can hinder the effort to create a new language. One standard is never
enough to
meet all of the business needs. A standard can be a compromise among interested
parties
which can cause the standard to not be ideal. If a standard is altered by some,
than
portability
between platforms could be hurt.
No comments:
Post a Comment