Pune University BE(CSE) Question Papers May June 2014 - 2008 Pattern / Previous year BE(Computer Science Engineering) question papers of Pune University / BE CSE Advanced Databases Question Paper / Pune University Questions with Answers
UNIVERSITY
OF PUNE
B.
E. (CSE) Examination - 2014
ADVANCED
DATABASES
(2008
Pattern)
[Time
: 3 Hours]
[Max. Marks : 100]
[Total
No. of Questions : 12] [Total No.
of Printed Pages :3]
Instructions
:
(1)
Answers to the section should be written in separate books
(2)
Neat diagrams must be drawn wherever necessary.
(3)
Assume suitable data, if necessary.
(4)
Section I :Q1or Q2, Q3 or Q4, Q5 or Q6
(5)
Section II: Q7 or Q8, Q9 or Q10, Q11 or Q12
SECTION-I
OR
Q2) a) What is meant by Skew? Explain the different ways of handling Skew. [10]
Answer:
Skew: When breaking down a single task into number of
parallel small tasks, it is very hard to make them equal in size. Hence, the
performance of the system depends on the slowest CPU which processes the larger
sub-task. This type of uneven distribution of a job is called skew. For
example, if a task of size 100 is divided into 10 parts, and the division is
skewed, there may be some tasks of size less than 10 and some tasks of size
more than 10; if even one task happens to be of size 20, the speedup obtained
by running the tasks in parallel is only five, instead of ten as we would have
hoped.
Ways to handle skew:
- Use range- instead of hash-partitions
- Ensure that each range gets same number of tuples
- Example: {1, 1, 1, 2, 3, 4, 5, 6 } --> [1,2] and [3,6]
- Virtual processor partitioning
- Create more partitions than nodes
- And be smart about scheduling the partitions
- Use subset-replicate (i.e., “skewedJoin”)
- Given an extremely common value ‘v’
- Distribute R tuples with value v randomly across k nodes (R is the build relation)
- Replicate S tuples with value v to same k machines (S is the probe relation)
OR
Q4) a) Remote backup systems and
replication in Distributed Databases are two alternative approaches for
providing high availability. Explain the difference between them. [6]
Q5) a) How XML data is stored in
Relational Databases? Explain. [8]
b) Explain in detail XML schema. [8]
OR
Q6) a) Explain in detail XQuery. [10]
b) Write short note on: XML
applications. [6]
SECTION
II
Q7) a) In real world data, tuples
with missing values for some attributes is a common occurrence. Describe
various methods for handling this problem. [10]
b) Explain with suitable example
any two operations on multidimensional data. [6]
OR
Q8) a) Explain the following with
respect to data preprocessing [6]
Answer:
i) Data reduction
Data reduction obtains a reduced representation of the data set that is much smaller in volume, yet produces the same (or almost the same) analytical results. Data reduction strategies include dimensionality reduction and numerosity reduction.
ii) Data
Discretization
Data discretization is defined as a process of converting continuous data attribute values into a finite set of intervals with minimal loss of information.
b) Explain different conceptual
schemas design for data warehouse with suitable example. [10]
Q9) a) Explain classification and
prediction with suitable example. [8]
b) Explain outlier analysis. [8]
OR
Q10) a) How are decision trees
used for classification? Explain with example. [8]
b) State and explain apriori
algorithm. [8]
Q11) a) Define Information retrieval System. How it is different from Database system? [6]
Answer:
Relational Database Management Systems (RDBMS):
- Semantics of each object are well defined
- Complex query languages (e.g., SQL)
- Exact retrieval for what you ask
- Emphasis on efficiency
Information Retrieval (IR):
- Semantics of object are subjective, not well defined
- Usually simple query languages (e.g., natural language query)
- You should get what you want, even the query is bad
- Effectiveness is primary issue, although efficiency is important
b) Explain the following terms [12]
i)Web Crawlers
ii) Vector space model
Answer:
The
Vector Space Model (VSM) is based on the notion of similarity. The model
assumes that the relevance of a document to query is roughly equal to the
document-query similarity. Both the documents and queries are represented using
the bag-of-words model. For a document collection, we first determine a
set of terms (i.e., vocabulary) and order the terms. Next, documents are
represented as n-dimensional vectors, where each dimension corresponds
to a term. Terms are weighted using tf-idf or BM25 methods. Queries are also
represented as n-dimensional vectors.
iii) Synonyms
iv) Proximity
OR
Q12) a) How to measure retrieval
effectiveness? [6]
b) Explain the following terms [12]
i) Page Rank ii) Full text
retrieval iii) Ontologies iv) Homonyms
*******************
Go back to Pune University Question Papers page
No comments:
Post a Comment