Pune University BE(CSE) Question Papers May June 2014 - 2008 Pattern / Previous year BE(Computer Science Engineering) question papers of Pune University / BE CSE Advanced Databases Question Paper / Pune University Questions with Answers

UNIVERSITY OF PUNE

B. E. (CSE) Examination - 2014

ADVANCED DATABASES

(2008 Pattern)

[Time : 3 Hours] [Max. Marks : 100]

[Total No. of Questions : 12] [Total No. of Printed Pages :3]

Instructions :

(1) Answers to the section should be written in separate books

(2) Neat diagrams must be drawn wherever necessary.

(3) Assume suitable data, if necessary.

(4) Section I :Q1or Q2, Q3 or Q4, Q5 or Q6

(5) Section II: Q7 or Q8, Q9 or Q10, Q11 or Q12

SECTION-I

Q1) a) Compare the Round-robin and Range partitioning Techniques. [8]

b) Explain Fragment and Replicate Join. [8]

Q2) a) What is meant by Skew? Explain the different ways of handling Skew. [10]

Answer:

Skew: When breaking down a single task into number of parallel small tasks, it is very hard to make them equal in size. Hence, the performance of the system depends on the slowest CPU which processes the larger sub-task. This type of uneven distribution of a job is called skew. For example, if a task of size 100 is divided into 10 parts, and the division is skewed, there may be some tasks of size less than 10 and some tasks of size more than 10; if even one task happens to be of size 20, the speedup obtained by running the tasks in parallel is only five, instead of ten as we would have hoped.

Ways to handle skew:

Use range- instead of hash-partitions

Ensure that each range gets same number of tuples

Example: {1, 1, 1, 2, 3, 4, 5, 6 } --> [1,2] and [3,6]

Virtual processor partitioning

Create more partitions than nodes

And be smart about scheduling the partitions

Use subset-replicate (i.e., “skewedJoin”)

Given an extremely common value ‘v’

Distribute R tuples with value v randomly across k nodes (R is the build relation)

Replicate S tuples with value v to same k machines (S is the probe relation)

b) What is the difference between interquery and intraquery parallelism? [6]

Q3) a) If we are to ensure atomicity, all sites in which a transaction T is executed must agree on the final outcome of the execution. T must either commit at all sites or it must abort at all sites. Describe the Protocol used to ensure this property. [8]

b) Explain in detail Replication with respect to Distributed Databases. [10]

Q4) a) Remote backup systems and replication in Distributed Databases are two alternative approaches for providing high availability. Explain the difference between them. [6]

b) How Deadlock handling is done in Distributed Databases? Explain. [12]

Q5) a) How XML data is stored in Relational Databases? Explain. [8]

b) Explain in detail XML schema. [8]

Q6) a) Explain in detail XQuery. [10]

b) Write short note on: XML applications. [6]

SECTION II

Q7) a) In real world data, tuples with missing values for some attributes is a common occurrence. Describe various methods for handling this problem. [10]

b) Explain with suitable example any two operations on multidimensional data. [6]

Q8) a) Explain the following with respect to data preprocessing [6]

Answer:

i) Data reduction

Data reduction obtains a reduced representation of the data set that is much smaller in volume, yet produces the same (or almost the same) analytical results. Data reduction strategies include dimensionality reduction and numerosity reduction.

ii) Data Discretization

Data discretization is defined as a process of converting continuous data attribute values into a finite set of intervals with minimal loss of information.

b) Explain different conceptual schemas design for data warehouse with suitable example. [10]

Q9) a) Explain classification and prediction with suitable example. [8]

b) Explain outlier analysis. [8]

Q10) a) How are decision trees used for classification? Explain with example. [8]

b) State and explain apriori algorithm. [8]

Q11) a) Define Information retrieval System. How it is different from Database system? [6]

Answer:

Relational Database Management Systems (RDBMS):

Semantics of each object are well defined
Complex query languages (e.g., SQL)
Exact retrieval for what you ask
Emphasis on efficiency

Information Retrieval (IR):

Semantics of object are subjective, not well defined
Usually simple query languages (e.g., natural language query)
You should get what you want, even the query is bad
Effectiveness is primary issue, although efficiency is important

b) Explain the following terms [12]

i)Web Crawlers

ii) Vector space model

Answer:

The Vector Space Model (VSM) is based on the notion of similarity. The model assumes that the relevance of a document to query is roughly equal to the document-query similarity. Both the documents and queries are represented using the bag-of-words model. For a document collection, we first determine a set of terms (i.e., vocabulary) and order the terms. Next, documents are represented as n-dimensional vectors, where each dimension corresponds to a term. Terms are weighted using tf-idf or BM25 methods. Queries are also represented as n-dimensional vectors.

iii) Synonyms

iv) Proximity

Q12) a) How to measure retrieval effectiveness? [6]

b) Explain the following terms [12]

i) Page Rank ii) Full text retrieval iii) Ontologies iv) Homonyms

*******************

Go back to Pune University Question Papers page

Past University Exam Papers - Engineering and Technology

Sunday, 25 April 2021

Advanced databases - BE-CSE - Pune University Questions April May 2014

Pune University BE(CSE) Question Papers May June 2014 - 2008 Pattern / Previous year BE(Computer Science Engineering) question papers of Pune University / BE CSE Advanced Databases Question Paper / Pune University Questions with Answers

Q1) a) Compare the Round-robin and Range partitioning Techniques. [8]

b) Explain Fragment and Replicate Join. [8]

Q2) a) What is meant by Skew? Explain the different ways of handling Skew. [10]

Ways to handle skew:

b) What is the difference between interquery and intraquery parallelism? [6]

b) Explain in detail Replication with respect to Distributed Databases. [10]

b) How Deadlock handling is done in Distributed Databases? Explain. [12]

Q11) a) Define Information retrieval System. How it is different from Database system? [6]

ii) Vector space model

No comments:

Post a Comment

Database Management Systems Anna University Exam Questions and Answers

Report Abuse

Labels