A Naive Bayes classifier is a probabilistic machine learning model that’s used for classification task. The crux of the classifier is based on the Bayes theorem.

Bayes Theorem:

Using Bayes theorem, we can find the probability of A happening, given that B has occurred. Here, B is the evidence and A is the hypothesis. The assumption made here is that the predictors/features are independent. That is presence of one particular feature does not affect the other. Hence it is called naive.

Lets explore the parts of Bayes Theorem:

· P(A|B) - Posterior Probability

o The conditional probability that event A occurs given that event B has occurred.

· P(A) - Prior Probability

o The probability of event A.

· P(B) - Evidence

o The probability of event B.

· P(B|A) - Likelihood

o The conditional probability of B occurring given event A has occurred.

Now, lets explore the parts of Bayes Theorem through the eyes of someone conducting machine learning:

· P(A|B) - Posterior Probability

o The conditional probability of the response variable (target variable) given the training data inputs.

· P(A) - Prior Probability

o The probability of the response variable (target variable).

· P(B) - Evidence

o The probability of the training data.

· P(B|A) - Likelihood

o The conditional probability of the training data given the response variable.

Example 1:

Let us take an example to get some better intuition. Consider the problem of playing golf. The dataset is represented as below.

Day	Outlook	Temperature	Humidity	Wind	Play Tennis
D1	Sunny	Hot	High	Weak	No
D2	Sunny	Hot	High	Strong	No
D3	Overcast	Hot	High	Weak	Yes
D4	Rain	Mild	High	Weak	Yes
D5	Rain	Cool	Normal	Weak	Yes
D6	Rain	Cool	Normal	Strong	No
D7	Overcast	Cool	Normal	Strong	Yes
D8	Sunny	Mild	High	Weak	No
D9	Sunny	Cool	Normal	Weak	Yes
D10	Rain	Mild	Normal	Weak	Yes
D11	Sunny	Mild	Normal	Strong	Yes
D12	Overcast	Mild	High	Strong	Yes
D13	Overcast	Hot	Normal	Weak	Yes
D14	Rain	Mild	High	Strong	No

(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)

Steps:

1. Convert the data set into a frequency table---Prior Probability

2. Create Likelihood table by finding the probabilities—Conditional Probabilities

3. Use Naive Bayesian equation to calculate the posterior probability

Step 1: Prior Probability:

For Yes: Total 14 and 9 Yes

P=(Play tennis=Yes)=9/14=0.64

For No: Total 14 and 5 No

P=(Play tennis=No)=5/14=0.36

Outlook

In the first attribute which is outlook we have three categorical values which are (sunny, overcast, rainy ) and our target is that we find each of the probability for Sunny, Overcast and Rainy based on target attribute which is play (yes, no).

Temperature:

In the second attribute which is Temperature we have three categorical values which are (Hot, mild, cool) and our target is that we find each of the probability for Hot, mild and cool based on target attribute which is play (yes, no).

Temperature	Yes	No
Hot	2/9	2/5
Mild	4/9	2/5
Cool	3/9	1/5

Humidity:

In the third attribute which is Humidity we have two categorical values which are ( High, Normal) and our target is that we find each of the probability for High, Normal based on target attribute which is play (yes, no).

Humidity	Yes	No
High	3/9	4/5
Normal	6/9	1/5

Wind:

In the fourth attribute which is Wind we have two categorical values which are ( False, True) and our target is that we find each of the probability for False, True based on target attribute which is play (yes, no).

Wind	Yes	No
Strong	3/9	3/5
Weak	6/9	2/5

Play:

In the Last attribute which is predicted or Target, we have binary value (Yes, No) yes mean play and no mean not play and we find the probability for yes from the total instance which is 14 and no also.

Outlook	Temperature	Humidity	Wind	Play Tennis
Sunny	Cool	High	strong	?

Based on yes probability:

P(X/C) P(C) or P(X/play= yes)P(play=yes)

P(X/C) P(C) or P(X/play= No)P(play=No)

P(X/play= No)P(play=No)=3/5 * 1/5 *4/5 *3/5 * 5/14

P(X/play= No)P(play=No)= 0.0206

So Finally

For yes (Should play Tennis)

For no (Should watch movies)

In the End

0.9421 > 0.2424

So the probability for no is highest as compare to yes.

Our predicted result is

Outlook	Temperature	Humidity	Wind	Play Tennis
Sunny	Cool	High	True	No

Example 2:

ExamExample 3:

New Instance=(Red, SUV, Domestic)

ior Prior Probabilities:

P(Yes)=5/10

P(No)=5/10

P(Yes\Newinstance)=P(yes)*P(color=red \Yes)*(P(Type=SUV/Yes)*P(Origin=Domestic/yes)

P(Yes\New instance)=0.5*3/5*1/5*2/5=0.024

P(No\New instance)=P(No)*P(color=red /NO)*(P(Type=SUV/No)*P(Origin=Domestic/No)

P(No\New instance)=0.5*2/5*3/5*3/5=0.072

P(Yes\New instance)< P(No\New instance)

The New instance is Classified to NO

Advantages of Naive Bayes Classifier:

• It is simple and easy to implement.

• It doesn't require as much training data.

• It handles both continuous and discrete data.

• It is highly scalable with the number of predictors and data points.

• It is fast and can be used to make real-time predictions.

Pros:

· It is easy and fast to predict class of test data set. It also perform well in multi class prediction

· When assumption of independence holds, the classifier performs better compared to other machine learning models like logistic regression or decision tree, and requires less training data.

· It perform well in case of categorical input variables compared to numerical variable(s). For numerical variable, normal distribution is assumed (bell curve, which is a strong assumption).

Cons:

· If categorical variable has a category (in test data set), which was not observed in training data set, then model will assign a 0 (zero) probability and will be unable to make a prediction. This is often known as “Zero Frequency”. To solve this, we can use the smoothing technique. One of the simplest smoothing techniques is called Laplace estimation.

· On the other side, Naive Bayes is also known as a bad estimator, so the probability outputs from predict_proba are not to be taken too seriously.

· Another limitation of this algorithm is the assumption of independent predictors. In real life, it is almost impossible that we get a set of predictors which are completely independent.

Applications of Naive Bayes Algorithms

· Real-time Prediction: Naive Bayesian classifier is an eager learning classifier and it is super fast. Thus, it could be used for making predictions in real time.

· Multi-class Prediction: This algorithm is also well known for multi class prediction feature. Here we can predict the probability of multiple classes of target variable.

· Text classification/ Spam Filtering/ Sentiment Analysis: Naive Bayesian classifiers mostly used in text classification (due to better result in multi class problems and independence rule) have higher success rate as compared to other algorithms. As a result, it is widely used in Spam filtering (identify spam e-mail) and Sentiment Analysis (in social media analysis, to identify positive and negative customer sentiments)

· Recommendation System: Naive Bayes Classifier and Collaborative Filtering together builds a Recommendation System that uses machine learning and data mining techniques to filter unseen information and predict whether a user would like a given resource or not.

Types of NB Models:

· Gaussian Naive Bayes: gaussiannb is used in classification tasks and it assumes that feature values follow a Gaussian distribution.

· Multinomial Naive Bayes: It is used for discrete counts. For example, let’s say, we have a text classification problem. Here we can consider Bernoulli trials which is one step further and instead of “word occurring in the document”, we have “count how often word occurs in the document”, you can think of it as “number of times outcome number x_i is observed over the n trials”.

· Bernoulli Naive Bayes: The binomial model is useful if your feature vectors are boolean (i.e. zeros and ones). One application would be text classification with ‘bag of words’ model where the 1s & 0s are “word occurs in the document” and “word does not occur in the document” respectively.

· Complement Naive Bayes: It is an adaptation of Multinomial NB where the complement of each class is used to calculate the model weights. So, this is suitable for imbalanced data sets and often outperforms the MNB on text classification tasks.

· Categorical Naive Bayes: Categorical Naive Bayes is useful if the features are categorically distributed. We have to encode the categorical variable in the numeric format using the ordinal encoder for using this algorithm.

Search This Blog

Machine Learning and Deep Learning

Naive Bayes Classifier

Example 1:

Applications of Naive Bayes Algorithms

Comments

Post a Comment

Popular posts from this blog

Support Vector Machines- I

Support Vector Machine-II

Linear Regression