Adv. Data Science Using R | Assignment – 3

Quiz – 3

Q1. Which of the following language is used in Data science?
R
C
C++
Ruby

Q2. What is the primary file type of R?
Vector
Text file
RScripts
Statistical file

Q3. Which one of the following R packages is used for data products?
haven
igraph
slidify
forecast

Q4. Which of the following is valid for checking categorical variable?
Level
Table
Unique
All of the above

Q5. Suppose ABC is the matrix of 3 rows and 4 columns. Choose correct option(s) to rename columns:
row_names(ABC)= c(“row1”,”row2”,”row3”)
rownames(ABC)=c(“row1”,”row2”)
row(ABC)=c(“row1”,”row2”)
rownames(ABC)=c(“row”,”row2”,”row3”)

Q6. Arrange in proper order of data type:
Logical, integer, numeric, character
Integer, numeric, character, logical
Character, logical, integer, numeric
Numeric, integer, character, logical

Q7. What is the output of below code:
A=10
B=20
print(A,B)

10 20
Error
(10, 20)
None of the above

Q8. Return statement is compulsory while writing function in R
True
False

Q9. Last variable in function is by default return variable in R
True
False

Q10. What package is need to be install for reading?
Read_excel
Readxl
Readcsv
read_csv

Q11. what is the output of below mentioned code?
logic1=c(T,F,F,T,F,T)
print(which (logic))

1 4 6
2 3 6
6 4 1
1 2 3

Q12. If A = c (1, 13, 42, 13, 4)  then what is A = A [ -4 ]?
1, 13, 42, 4
1, 13, 42, 13
13
1, 42, 13, 4

Q13. what function can be used to split the string?
Output will be : “Navin”      “Mr. Naresh J”

strsplit(name,”[.]”)
charsplit(name,”[,]”)
stringsplit(name)
strsplit(name,”[,]”)

Q14. i=100 , how to find out data type of i
Option 1
type(i)
class(i)
none of the above

Q15. Dt = “01-12-2020” is in the form of character. What is the option to convert date into “MM-DD-YYYY”
To_date (dt, ”MM – DD – YYYY”)
date( x = dt, format = “%m / %d / %Y”)
Date ( x = dt, format = “%m / %d / %Y”)
none of the above


Assignment – 3

1. What Is KNN Algorithm? Features Of KNN Algorithm. How Does KNN Algorithm Work? Write KNN algorithm pseudocode and Practical Implementation Of KNN Algorithm In R.

The K-Nearest Neighbors (KNN) algorithm is a non-parametric, instance-based method for classification and regression. It is a supervised learning algorithm that stores all available cases and classifies new cases based on a similarity measure, such as Euclidean distance.

Features of KNN Algorithm:

  1. Simple to understand and implement
  2. No assumptions about the distribution of the data
  3. Can be used for both classification and regression problems

The algorithm works by taking a new data point and finding the k number of closest points in the training set. The new data point is then classified by the majority class of the k nearest neighbors.

Pseudocode for the KNN algorithm:

  1. Initialize the number of nearest neighbors (k)
  2. For each point in the dataset: a. Calculate the distance between the point and the new data point b. Add the distance and the point to a list
  3. Sort the list by distance
  4. Take the first k elements from the sorted list
  5. Determine the majority class among the k elements
  6. Classify the new data point as the majority class

Practical Implementation of KNN Algorithm in R:

# Load the library
library(class)

# Create a sample data set
x <- cbind(rnorm(50), rnorm(50))
y <- gl(2, 25, labels = c("A", "B"))

# Fit a KNN model with k = 3
fit <- knn(x, x, y, k = 3)

# Predict the class of new data points
newdata <- rbind(c(1, 2), c(3, 4))
predicted_class <- predict(fit, newdata)

This code creates a sample dataset of 50 points, with two features and two classes (A and B). Then it fits a KNN model with k = 3, and predicts the class of two new data points.

2. Develop a Machine Learning Model using SVM in R to solve A Business Problem. Add Screenshots of the graphs and code to validate your answer.

Applying SVM for solving a Business use Case

The data source is https://www.kaggle.com/c/talkingdata-adtracking-fraud-detection

Read the data and check the structure of both train and test

library(lubridate)
library(caret)
library(dplyr)
library(DMwR)
library(ROSE)
library(ggplot2)
library(randomForest)
library(rpart)
library(rpart.plot)
library(data.table)
library(e1071)
library(gridExtra)

train <-fread('../input/train_sample.csv', stringsAsFactors = FALSE,
data.table = FALSE)
test <-fread('../input/test.csv', stringsAsFactors = FALSE, data.table
= FALSE)
str(train)
str(test)

There is no difference between a train and test data except we need to
predict target (is_attributed) in test and attributed_time (Time taken
to download Application) is not given in test data)
Missing value checking and estimation

colSums(is.na(train))

There is no missing value at all, data is very clean and clear

colSums(train=='')

Attributes_time (Time taken to download) having blank entries, this
is logically correct
Lets check the target variable how many are not downloaded in train data

table(train$is_attributed)

Our assumption is correct since blank entries in Attributes_time is
matching with Application not downloaded in train data.
As it’s logically correct, we don’t need to do any further action on this
And also notice that, this variable is not present in test data, so no point
of keeping it in the train data too

train$attributed_time=NULL

Applying the SVM on the data
Linear Support Vector Machine


Before going into model, lets tune the cost Parameter

set.seed(1234)
liner.tune=tune.svm(is_attributed~.,data=smote_train,kernel="linear",cost=c
(0.1,0.5,1,5,10,50))
liner.tune


We will get the best parameters for the SVM linear kernel, it uses multi-fold
cross validation method

Lets see how our Linear model works

Lets get a best.liner model

best.linear=liner.tune$best.model

#Predict data

best.test=predict(best.linear,newdata=test_val,type="class")
confusionMatrix(best.test,test_val$is_attributed)

The Kernel Trick – Radial Support vector Machine

set.seed(1234)
rd.poly=tune.svm(is_attributed~.,data=smote_train,kernel="radial",gamma=
seq(0.1,5))
summary(rd.poly)

Lets predict the test data

best.rd=rd.poly$best.model
pre.rd=predict(best.rd,newdata = test_val)
confusionMatrix(pre.rd,test_val$is_attributed)

3. Write down the step by step classification of naïve bayes classification in R.

The step by step classification of naïve bayes classification in R are as Follows:

  1. Load the necessary libraries, such as the “e1071” library for the Naive Bayes classifier.
  2. Prepare the data for the model. This includes splitting the data into a training and test set, and converting any categorical variables into factors.
  3. Train the model by fitting the training data to the Naive Bayes classifier.
  4. Use the trained model to predict the class of the test data.
  5. Evaluate the performance of the model by comparing the predicted class to the actual class of the test data. This can be done using metrics such as accuracy, precision, and recall.
  6. Repeat steps 3-5 for different input data and/or different model parameters to find the best model for the given data.

* The material and content uploaded on this website are for general information and reference purposes only and don’t copy the answers of this website to any other domain without any permission or else copyright abuse will be in action.

Please do it by your own first!

DMCA.com Protection Status

5 1 vote
Article Rating
Subscribe
Notify of
guest

3 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
mohit
mohit
1 year ago

2nd ka answer hai mere pas

Samridhi
Samridhi
1 year ago
Reply to  mohit

bhej do yahin pr plz🙏🙏




3
0
Would love your thoughts, please comment.x
()
x