Monday, 3 May 2021

410253DC Big Data and Data Analytics Pune University question answers April 2019

410253DC Big Data and Data Analytics Pune University question answers April 2019, Big data and data analytics course exam question paper with answers

 

Exam

B.E DEGREE SEMESTER EXAMINATIONS

Academic Year

April 2019

Subject Code

410253DC

Subject Name

Big Data and Data Analytics

Branch

Computer Engineering

Semester

Semester II

Regulation

2015

 

B.E DEGREE SEMESTER EXAMINATIONS, APR 2019

Computer Engineering

Semester II

410253DC – Big Data and Data Analytics

(Pattern 2015)

Time : 2 and Half hours                  Answer A L L Questions                Max. Marks 70

 

Q1) a) Explain with the given dataset how Decision Support System will help, Laptop shop to predict whether the customer will buy or not buy laptop. [5]



b) Differentiate Operational data and Informational data. [6]

c) Explain following phases of data Analytics lifecycle with example. [6]

i) Data Discovery

ii) Model Building

OR

Q2) a) Explain Hadoop Eco system with diagram. [8]

b) Smoothe the following data set using binning 3,12,1,7,8,5. [6]

c) Justify Snow-Flake schema is better than Star schema. [6]

 

Q3) a) What is linear regression? Explain with Example. [8]

b) What is the significance of Support Vector Machine Classifier Model with example. [5]

c) Differentiate between supervised and unsupervised learning. One more link[4]

Answer:

The main distinction between the two approaches is the use of labeled datasets. To put it simply, supervised learning uses labeled input and output data, while an unsupervised learning algorithm does not.

In supervised learning, the algorithm “learns” from the training dataset by iteratively making predictions on the data and adjusting for the correct answer. While supervised learning models tend to be more accurate than unsupervised learning models, they require upfront human intervention to label the data appropriately. For example, a supervised learning model can predict how long your commute will be based on the time of day, weather conditions and so on. But first, you’ll have to train it to know that rainy weather extends the driving time.

Unsupervised learning models, in contrast, work on their own to discover the inherent structure of unlabeled data. Note that they still require some human intervention for validating output variables. For example, an unsupervised learning model can identify that online shoppers often purchase groups of products at the same time. However, a data analyst would need to validate that it makes sense for a recommendation engine to group baby clothes with an order of diapers, applesauce and sippy cups.

 

OR

Q4) a) What is logistic regression? Explain with example. [8]

Answer:

It is a predictive algorithm using independent variables to predict the dependent variable, just like Linear Regression, but with a difference that the dependent variable should be categorical variable.

 

b) Explain with suitable example to predict whether a student will pass or not using Support vector machine. [5]

c) What is Time series analysis ? Give example. [4]

 

Q5) a) A database has 6 transactions. Let minimum support = 60% and Minimum confidence = 70%. Find all frequent item sets and association rules using Apriori algorithm [8]

Transaction ID

Toys Bought

T1

T2

T3

T4

T5

T6

{A, B, C, E, F}

{A, C, D, E}

{B, C, E, F}

{A, C, D, E}

{C, D, E, F}

{A, D, E}

 

b) What is agglomerative clustering. Give example. [5]

c) Explain the role of Bayes theorem in decision making. [4]

 

Q6) a) What is Bayesian Classifier? Elaborate the training process of a Bayesian classifier with suitable example. [8]

b) Explain with example following terms: [4]

i) Lexicographic order

ii) Confidence

c) Differentiate between single link and complete link methods used in Hierarchical Clustering. [5]

 

Q7) a) Write and explain R code for Naive bayes classification. [8]

b) Differentiate between Data Frames and data lists. [4]

Answer:

A data frame is a list with the following characteristics:

  • The elements of the list are vectors and/or factors.

  • Those vectors and factors are the columns of the data frame.

  • The vectors and factors must all have the same length; in other words, all columns must have the same height.

  • The equal-height columns give a rectangular shape to the data frame.

  • The columns must have names.

A list has the following characteristics:

  • Lists are heterogeneous.

  • Lists can be indexed by position.

  • You can extract sublists from lists.

 

c) What is the role of R in machine learning? [4]

OR

Q8) a) Explain data processing with R? [8]

b) How data is exported from R. [4]

c) Write short notes on Handling Data in R Workspace. [4]

 

***********

 

 

 

No comments:

Post a Comment

Database Management Systems Anna University Exam Questions and Answers

Database management systems university question papers with answers, Anna university DBMS exam questions, Solved university exam questions f...