Sunday, 25 April 2021

Advanced databases - BE-CSE - Pune University Questions April May 2014

Pune University BE(CSE) Question Papers May June 2014 - 2008 Pattern / Previous year BE(Computer Science Engineering) question papers of Pune University / BE CSE Advanced Databases Question Paper / Pune University Questions with Answers




UNIVERSITY OF PUNE
B. E. (CSE) Examination - 2014
ADVANCED DATABASES
(2008 Pattern)
[Time : 3 Hours]                                                                [Max. Marks : 100]
[Total No. of Questions : 12]                                     [Total No. of Printed Pages :3]
Instructions :
(1) Answers to the section should be written in separate books
(2) Neat diagrams must be drawn wherever necessary.
(3) Assume suitable data, if necessary.
(4) Section I :Q1or Q2, Q3 or Q4, Q5 or Q6
(5) Section II: Q7 or Q8, Q9 or Q10, Q11 or Q12

SECTION-I
OR

Q2) a) What is meant by Skew? Explain the different ways of handling Skew. [10]

Answer:
Skew: When breaking down a single task into number of parallel small tasks, it is very hard to make them equal in size. Hence, the performance of the system depends on the slowest CPU which processes the larger sub-task. This type of uneven distribution of a job is called skew. For example, if a task of size 100 is divided into 10 parts, and the division is skewed, there may be some tasks of size less than 10 and some tasks of size more than 10; if even one task happens to be of size 20, the speedup obtained by running the tasks in parallel is only five, instead of ten as we would have hoped.
 

Ways to handle skew:

  1. Use range- instead of hash-partitions
    • Ensure that each range gets same number of tuples
    • Example: {1, 1, 1, 2, 3, 4, 5, 6 } --> [1,2] and [3,6]
  2. Virtual processor partitioning
    • Create more partitions than nodes
    • And be smart about scheduling the partitions
  3. Use subset-replicate (i.e., “skewedJoin”)
    • Given an extremely common value ‘v’
    • Distribute R tuples with value v randomly across k nodes (R is the build relation)
    • Replicate S tuples with value v to same k machines (S is the probe relation)
 

OR
Q4) a) Remote backup systems and replication in Distributed Databases are two alternative approaches for providing high availability. Explain the difference between them. [6]

Q5) a) How XML data is stored in Relational Databases? Explain. [8]
b) Explain in detail XML schema. [8]
OR
Q6) a) Explain in detail XQuery. [10]
b) Write short note on: XML applications. [6]

SECTION II
Q7) a) In real world data, tuples with missing values for some attributes is a common occurrence. Describe various methods for handling this problem. [10]
b) Explain with suitable example any two operations on multidimensional data. [6]
OR
Q8) a) Explain the following with respect to data preprocessing      [6]
Answer:
i) Data reduction 
Data reduction obtains a reduced representation of the data set that is much smaller in volume, yet produces the same (or almost the same) analytical results. Data reduction strategies include dimensionality reduction and numerosity reduction.
ii) Data Discretization 
Data discretization is defined as a process of converting continuous data attribute values into a finite set of intervals with minimal loss of information.
 
b) Explain different conceptual schemas design for data warehouse with suitable example. [10]

Q9) a) Explain classification and prediction with suitable example. [8]
b) Explain outlier analysis. [8]
OR
Q10) a) How are decision trees used for classification? Explain with example. [8]
b) State and explain apriori algorithm. [8]

Q11) a) Define Information retrieval System. How it is different from Database system? [6]

Answer:

Relational Database Management Systems (RDBMS):

  • Semantics of each object are well defined
  • Complex query languages (e.g., SQL)
  • Exact retrieval for what you ask
  • Emphasis on efficiency

Information Retrieval (IR):

  • Semantics of object are subjective, not well defined
  • Usually simple query languages (e.g., natural language query)
  • You should get what you want, even the query is bad
  • Effectiveness is primary issue, although efficiency is important
 
b) Explain the following terms             [12]
i)Web Crawlers 

ii) Vector space model 

Answer:
The Vector Space Model (VSM) is based on the notion of similarity. The model assumes that the relevance of a document to query is roughly equal to the document-query similarity. Both the documents and queries are represented using the bag-of-words model. For a document collection, we first determine a set of terms (i.e., vocabulary) and order the terms. Next, documents are represented as n-dimensional vectors, where each dimension corresponds to a term. Terms are weighted using tf-idf or BM25 methods. Queries are also represented as n-dimensional vectors.
 
iii) Synonyms 
iv) Proximity
OR
Q12) a) How to measure retrieval effectiveness? [6]
b) Explain the following terms             [12]
i) Page Rank ii) Full text retrieval iii) Ontologies iv) Homonyms

*******************



Go back to Pune University Question Papers page










No comments:

Post a Comment

Database Management Systems Anna University Exam Questions and Answers

Database management systems university question papers with answers, Anna university DBMS exam questions, Solved university exam questions f...