410253DC Big Data and Data Analytics Pune University question answers April 2019, Big data and data analytics course exam question paper with answers

*Exam*	B.E DEGREE SEMESTER EXAMINATIONS
*Academic Year*	April 2019
*Subject Code*	410253DC
*Subject Name*	Big Data and Data Analytics
*Branch*	Computer Engineering
*Semester*	Semester II
*Regulation*	2015

B.E DEGREE SEMESTER EXAMINATIONS, APR 2019

Computer Engineering

Semester II

410253DC – Big Data and Data Analytics

(Pattern 2015)

Time : 2 and Half hours Answer A L L Questions Max. Marks 70

Q1) a) Explain with the given dataset how Decision Support System will help, Laptop shop to predict whether the customer will buy or not buy laptop. [5]

b) Differentiate Operational data and Informational data. [6]

c) Explain following phases of data Analytics lifecycle with example. [6]

i) Data Discovery

ii) Model Building

Q2) a) Explain Hadoop Eco system with diagram. [8]

b) Smoothe the following data set using binning 3,12,1,7,8,5. [6]

c) Justify Snow-Flake schema is better than Star schema. [6]

Q3) a) What is linear regression? Explain with Example. [8]

b) What is the significance of Support Vector Machine Classifier Model with example. [5]

c) Differentiate between supervised and unsupervised learning. One more link[4]

Answer:

The main distinction between the two approaches is the use of labeled datasets. To put it simply, supervised learning uses labeled input and output data, while an unsupervised learning algorithm does not.

In supervised learning, the algorithm “learns” from the training dataset by iteratively making predictions on the data and adjusting for the correct answer. While supervised learning models tend to be more accurate than unsupervised learning models, they require upfront human intervention to label the data appropriately. For example, a supervised learning model can predict how long your commute will be based on the time of day, weather conditions and so on. But first, you’ll have to train it to know that rainy weather extends the driving time.

Unsupervised learning models, in contrast, work on their own to discover the inherent structure of unlabeled data. Note that they still require some human intervention for validating output variables. For example, an unsupervised learning model can identify that online shoppers often purchase groups of products at the same time. However, a data analyst would need to validate that it makes sense for a recommendation engine to group baby clothes with an order of diapers, applesauce and sippy cups.

Q4) a) What is logistic regression? Explain with example. [8]

Answer:

It is a predictive algorithm using independent variables to predict the dependent variable, just like Linear Regression, but with a difference that the dependent variable should be categorical variable.

b) Explain with suitable example to predict whether a student will pass or not using Support vector machine. [5]

c) What is Time series analysis ? Give example. [4]

Q5) a) A database has 6 transactions. Let minimum support = 60% and Minimum confidence = 70%. Find all frequent item sets and association rules using Apriori algorithm [8]

Transaction ID

Toys Bought

{A, B, C, E, F}

{A, C, D, E}

{B, C, E, F}

{A, C, D, E}

{C, D, E, F}

{A, D, E}

b) What is agglomerative clustering. Give example. [5]

c) Explain the role of Bayes theorem in decision making. [4]

Q6) a) What is Bayesian Classifier? Elaborate the training process of a Bayesian classifier with suitable example. [8]

b) Explain with example following terms: [4]

i) Lexicographic order

ii) Confidence

c) Differentiate between single link and complete link methods used in Hierarchical Clustering. [5]

Q7) a) Write and explain R code for Naive bayes classification. [8]

b) Differentiate between Data Frames and data lists. [4]

Answer:

A data frame is a list with the following characteristics:

The elements of the list are vectors and/or factors.
Those vectors and factors are the columns of the data frame.
The vectors and factors must all have the same length; in other words, all columns must have the same height.
The equal-height columns give a rectangular shape to the data frame.
The columns must have names.

A list has the following characteristics:

Lists are heterogeneous.

Lists can be indexed by position.

You can extract sublists from lists.

c) What is the role of R in machine learning? [4]

Q8) a) Explain data processing with R? [8]

b) How data is exported from R. [4]

c) Write short notes on Handling Data in R Workspace. [4]

***********

Pune University MCA Question Papers / Previous year question papers of Pune University / MCA Advanced Databases Question Paper

Total No of Questions: [12] SEAT NO. :

[Total No. of Pages : 02]

[4366]- 503

TYMCA (Engg. Faculty)

ADVANCED DATABASES

(Semester - V) (2008 Pattern) (710903)

MAY 2013 EXAMINATIONS

[Time: 3 Hours] [Max. Marks : 70]

Instructions to the candidates:

1) Answers to the two sections should be written in separate books.

2) Neat diagrams must be drawn wherever necessary.

3) Assume Suitable data if necessary.

SECTION I

Q1) a) With suitable diagrams explain the steps in query processing. [5]

b) Explain the external sort merge algorithm with suitable example. [6]

Q2) a) What are the measures of query cost? [5]

b) Explain the different ways of executing pipelines. [6]

Q3) a) Explain Transaction Server Process Structure. [6]

b) What are the implementation issues of distributed systems. [6]

Q4) a) Explain Speed up & Scale up. [6]

b) Explain centralized and client server database architecture [6]

Q5) a) Explain object identity and reference type? [6]

b) Why OODBMS is required Differentiate between DBMS, RDBMS and OODBMS. [6]

Q6) a) Explain Array and Multiset in SQL with example. [6]

b) Explain persistent C++ system. [6]

SECTION II

Q7) a) While analyzing the data, it was found that many tuples have no recorded values for several attributes. How this problem of missing values can be solved? [6]

b) Explain snowflake schema for multidimensional database. [6]

Q8) a) Explain in brief OLAP. What are the possible operations on cube? [6]

b) Explain star schema for multidimensional database. [6]

Q9) a) Form clusters using clustering K-Means algorithm. Use appropriate distance formula. [8]

RID	Age	Years of Service
1	30	5
2	50	25
3	50	15
4	25	5
5	30	10
6	55	25

b) Explain outlier analysis [4]

Q10) a) Find frequently occurred item using apriori algorithm. [8]

ITD	ITEM
100	1,3,4
200	2,3,5
300	1,2,3,5
400	2,5

b) Explain descriptive & predictive data mining. [4]

Answer:

Descriptive data mining - It is the idea of using the data to identify the relationships. Find human-interpretable patterns that describe the data. Clustering, association rule mining and sequential pattern discovery are some of the descriptive approaches.

Predictive data mining - It is the idea of using data to make a prediction. It uses some variables to predict unknown or future values of other variables. Classification, and regression are some of the predictive approaches.

Q11) a) Describe the ranking using TF-IDF. [8]

b) Define the following terms. [3]

1) Hub 2) Authority 3) Web crawler

Q12) a) Describe the popularity ranking. [8]

b) Define the following terms- [3]

1) Ontology 2) Search engine spamming 3) False positive

Answer:

False positive: A false positive is where you receive a positive result for a test, when you should have received a negative results. It’s sometimes called a “false alarm” or “false positive error.” It’s usually used in the medical field, but it can also apply to other arenas (like software testing). Continue reading.

************************

Past University Exam Papers - Engineering and Technology

Monday, 3 May 2021