Performance Metrics for Regression model

 Performance Metrics for the Regression model

In a machine learning Algorithm, once the model is built, the next step is to use of various performance criteria to evaluate Machine learning Models.

Various performance Metrics/Criteria used for regression models and Classification models are different and are listed below.

Regression Model: In Regression analysis output is a continuous value, therefore for Regression performance following methods are used

  1. Mean squared error (MSE)
  2. Mean average error(MAE)
  3. Root mean squared error (RMSE)
  4. R Square

Classification Model: In the classification model output is Discret form and for classification performance following methods are used

  1. Confusion matrix
  2. Accuracy
  3. Precision
  4. Recall (sensitivity)
  5. Specificity
  6.  ROC curve (AUC) ROC Area Under Curve is useful when we are not concerned about whether the small dataset/class of dataset is positive or not, in contrast to the F1 score where the class being positive is important.
  7. F-score(F1 score is useful when the size of the positive class is relatively small)

Performance metrics should be chosen based on the problem domain, project goals, and objectives. 

1.   Mean Squared Error: 

MSE or Mean Squared Error is one of the most preferred metrics for regression tasks. It is simply the average of the squared difference between the target value and the value predicted by the regression model. As it squares the differences, it penalizes even a small error which leads to over-estimation of how bad the model is. It is preferred more than other metrics because it is differentiable and hence can be optimized better.



Here n is the number of samples, y- is the target value and Ŷ cap is the predicted valure

  1. Mean Average Error (MAE)

MAE is the absolute difference between the target value and the value predicted by the model. The MAE is more robust to outliers and does not penalize the errors as extremely as mse. MAE is a linear score which means all the individual differences are weighted equally. It is not suitable for applications where you want to pay more attention to the outliers.



3.    Root Mean Squared Error: 

RMSE is the most widely used metric for regression tasks and is the square root of the averaged squared difference between the target value and the value predicted by the model. It is preferred more in some cases because the errors are first squared before averaging which poses a high penalty on large errors. This implies that RMSE is useful when large errors are undesired.


4.   R Square:

The coefficient of Determination or R² is another metric used for evaluating the performance of a regression model. The metric helps us to compare our current model with a constant baseline and tells us how our model is better. The constant baseline is chosen by taking the mean of the data and drawing a line at the mean. R² is a scale-free score that implies it doesn't matter whether the values are too large or too small, the R² will always be less than or equal to 1.

The Mean Squared Error (MSE) of the baseline model is calculated by taking the average of the squared differences between the actual values and the mean of the actual values.

The baseline model assumes that every prediction is simply the mean of the observed data, so the MSE of the baseline model is given by:


Where:

·         yi​ is the actual value of the dependent variable.

·         yˉ  is the mean of the actual values.

·         n is the number of observations.

This value represents the average squared difference between the actual values and the mean of the actual values. ​​

The model's Mean Squared Error (MSE) can be calculated by taking the average of the squared differences between the actual values and the predicted values from the regression model.

·         yi is the actual value.

·         y^i​ is the predicted value from the model.

·         n is the number of observations.

This indicates that the model's predictions are very close to the actual values, reflecting the model's high accuracy.

 

 


Comments

Popular posts from this blog

Linear Regression

Logistic Regression

k-Nearest Neighbors Algorithm