Data must use double bracket and as.character()lapply(sms_corpus[1:2], as.character)

Data Preview:

heads sms_raw

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

 

##   type## 1  ham## 2  ham## 3 spam## 4  ham## 5  ham## 6 spam##                                                                                                                                                          text## 1 Go until jurong point, crazy.. Available only in bugis n great world la e buffet… Cine there got amore wat…## 2 Ok lar… Joking wif u oni…                                                                                                                    ## 3 Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive entry question(std txt rate)T&C’s apply 08452810075over18’s## 4                                                                                                           U dun say so early hor… U c already then say…## 5                                                                                               Nah I don’t think he goes to usf, he lives around here though## 6  FreeMsg Hey there darling it’s been 3 week’s now and no word back! I’d like some fun you up for it still? Tb ok! XxX std chgs to send, 3022431.50 to rcv

  

str(sms_raw)

 

## ‘data.frame’:    5574 obs. of  2 variables:##  $ type: Factor w/ 2 levels “ham”,”spam”: 1 1 2 1 1 2 1 1 2 2 …##  $ text: chr  “Go until jurong point, crazy.. Available only in bugis n great world la e buffet… Cine there got amore wat…” “Ok lar… Joking wif u oni…” “Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive entry question(std txt rate)T&C”| __truncated__ “U dun say so early hor… U c already then say…” …

 

Target Variable:

Count and proportions

table(sms_raw$type)

 

## ##  ham spam ## 4827  747

 

 

round(prop.table(table(sms_raw$type)), digits = 2)

 

## ##  ham spam ## 0.87 0.13

 

Now we will convert dataset in to a bag of data which
have no order.

spam learnsms_corpus_clean ” “wk” “” “” …

 

#filter the DTM sparse matrix to only contain words with at least 5 occurence#reducing the features in our DTMsms_dtm_freq_train 0, “Yes”, “No”)} #apply to train and test reduced DTMs, applying to columnsms_train ” “wk” “” “” …

 

str(sms_test)

 

##  chr 1:1394, 1:1166 “No” “No” “No” “Yes” “No” “No” …##  – attr(*, “dimnames”)=List of 2##   ..$ Docs : chr 1:1394 “4181” “4182” “4183” “4184” …##   ..$ Terms: chr 1:1166 “” “wk” “” “” …

 

 

 

 

 

 

Step 3 – Model training

Now we will finally apply the Naïve Bayes Algorithm.

# applying Naive Bayes to training setsms_classifier NIR :