410253DC Big Data and Data Analytics Pune University question answers April 2019, Big data and data analytics course exam question paper with answers
Exam |
B.E DEGREE SEMESTER EXAMINATIONS |
Academic Year |
April 2019 |
Subject Code |
410253DC |
Subject Name |
Big Data and Data Analytics |
Branch |
Computer Engineering |
Semester |
Semester II |
Regulation |
2015 |
B.E DEGREE SEMESTER EXAMINATIONS, APR 2019
Computer Engineering
Semester II
410253DC – Big Data and Data Analytics
(Pattern 2015)
Time : 2 and Half hours Answer A L L Questions Max. Marks 70
Q1) a) Explain with the given dataset how Decision Support System will help, Laptop shop to predict whether the customer will buy or not buy laptop. [5]
b) Differentiate Operational data and Informational data. [6]
c) Explain following phases of data Analytics lifecycle with example. [6]
i) Data Discovery
ii) Model Building
OR
Q2) a) Explain Hadoop Eco system with diagram. [8]
b) Smoothe the following data set using binning 3,12,1,7,8,5. [6]
c) Justify Snow-Flake schema is better than Star schema. [6]
Q3) a) What is linear regression? Explain with Example. [8]
b) What is the significance of Support Vector Machine Classifier Model with example. [5]
c) Differentiate between supervised and unsupervised learning. One more link[4]
Answer:
The main distinction between the two approaches is the use of labeled datasets. To put it simply, supervised learning uses labeled input and output data, while an unsupervised learning algorithm does not.
In supervised learning, the algorithm “learns” from the training dataset by iteratively making predictions on the data and adjusting for the correct answer. While supervised learning models tend to be more accurate than unsupervised learning models, they require upfront human intervention to label the data appropriately. For example, a supervised learning model can predict how long your commute will be based on the time of day, weather conditions and so on. But first, you’ll have to train it to know that rainy weather extends the driving time.
Unsupervised learning models, in contrast, work on their own to discover the inherent structure of unlabeled data. Note that they still require some human intervention for validating output variables. For example, an unsupervised learning model can identify that online shoppers often purchase groups of products at the same time. However, a data analyst would need to validate that it makes sense for a recommendation engine to group baby clothes with an order of diapers, applesauce and sippy cups.
OR
Q4) a) What is logistic regression? Explain with example. [8]
Answer:
It is a predictive algorithm using independent variables to predict the dependent variable, just like Linear Regression, but with a difference that the dependent variable should be categorical variable.
b) Explain with suitable example to predict whether a student will pass or not using Support vector machine. [5]
c) What is Time series analysis ? Give example. [4]
Q5) a) A database has 6 transactions. Let minimum support = 60% and Minimum confidence = 70%. Find all frequent item sets and association rules using Apriori algorithm [8]
Transaction ID |
Toys Bought |
T1 T2 T3 T4 T5 T6 |
{A, B, C, E, F} {A, C, D, E} {B, C, E, F} {A, C, D, E} {C, D, E, F} {A, D, E} |
b) What is agglomerative clustering. Give example. [5]
c) Explain the role of Bayes theorem in decision making. [4]
Q6) a) What is Bayesian Classifier? Elaborate the training process of a Bayesian classifier with suitable example. [8]
b) Explain with example following terms: [4]
ii) Confidence
c) Differentiate between single link and complete link methods used in Hierarchical Clustering. [5]
Q7) a) Write and explain R code for Naive bayes classification. [8]
b) Differentiate between Data Frames and data lists. [4]
Answer:
A data frame is a list with the following characteristics:
The elements of the list are vectors and/or factors.
Those vectors and factors are the columns of the data frame.
The vectors and factors must all have the same length; in other words, all columns must have the same height.
The equal-height columns give a rectangular shape to the data frame.
The columns must have names.
A list has the following characteristics:
Lists are heterogeneous.
Lists can be indexed by position.
You can extract sublists from lists.
c) What is the role of R in machine learning? [4]
OR
Q8) a) Explain data processing with R? [8]
b) How data is exported from R. [4]
c) Write short notes on Handling Data in R Workspace. [4]
***********
No comments:
Post a Comment