Data Science: The four (and a half) metrics to understand your model

Decorative image with the blog post title

Data Science: The four (and a half) metrics to understand your model

March 5, 2022

Data Science, Machine Learning, Python, Spatial Statistics

Did you ever wonder what it means to grade your work by a teacher? 👨‍🏫

Grades for a bad teacher are the form of punishment 💥
Grades for a good teacher are used as tools used to make ourselves better prepared for real-world problems 🚀

We should think the same about the model evaluation metrics. Those are tools to penalize weak models but not only… We should use them with understanding. Those metrics tell us a story about the model, and each metric introduces a different character of this story! Do you want to learn more?

We will look into four (and a half) metrics that work well in synergy. They cover almost everything that a Data Science newbie should know about the model. They are the starter pack for data analysts, so don’t forget to add them to your toolbelt 🛠!

Forecast Bias

$$e_{fb} = \frac{\sum_{i}^{N}{y_{i} – \bar{y_{i}}}}{N}$$

Forecast Bias (FB) is probably the most straightforward metric. It is the mean of differences between observations and predictions. With its simplicity comes evaluation power. Why? Because FB has an excellent property: it has a sign ➕or ➖. And the sign before FB value is a mark of our model performance!

➕ sign tells us that our model underestimates actual observations. Is it a bad situation? Yes, it is. Imagine that you were asked to forecast the expenditures of your corporate branch in the following year. A few months before the end of the year, the costs are dangerously close to the predicted value, and you must find a way to cut costs, hurting your business. Are you ready to fire someone from your team?
➖ sign tells us that our model overestimates actual observations. It is a common logistics problem. We shouldn’t overestimate the number of eggs customers will buy over a specific period. If we take too much and overestimate the demand for eggs, we will lose money.

Mean Absolute Error

$$e_{mae} = \frac{\sum_{i}^{N}{|y_{i} – \bar{y_{i}}|}}{N}$$

Mean Absolute Error is easy to compute. It is the mean of absolute differences between predictions and observations. Subtraction is a high-speed operation, and if we aim into a computational speed 🏎 over the model accuracy, we should use this metric instead of RMSE or MSE. Sometimes we don’t care about the model accuracy – if we have a lot of well-prepared data, we can forget about the MSE and RMSE and use MAE instead.

The real-world datasets are usually trickier, and we should use this metric along with RMSE. Why is that? You will learn in the next paragraph.

Root Mean Squared Error & Mean Squared Error

$$e_{rmse} = \sqrt{\frac{\sum_{i}^{N}({y_{i} – \bar{y_{i}})^2}}{N}}$$

$$e_{mse} = \frac{\sum_{i}^{N}({y_{i} – \bar{y_{i}})^2}}{N}$$

Mean Squared Error (MSE) is the mean of squared differences between predictions and observations. It is always positive. The downside of this metric is that it doesn’t have the same units as predicted values. It’s hard to understand what MSE means for a model. That’s why we take the Root from it, or Root Mean Squared Error (RMSE) – the error value is normalized and in the same order of magnitude as the analyzed data.

Why do we use MSE in our ML models if RMSE gives a better glimpse of data? The root operation is costly, and it dramatically reduces the computation speed. It is not wise to use RMSE as an accuracy metric during the training.

RMSE, on the other hand, is great for human analysis. RMSE is not the same as MAE! Why is that? RMSE is sensitive to outliers (extreme differences between predictions and observations). Let’s compare the MAE and RMSE of this array of differences:

[1, 1, 2, 2, 10, 100]

$$e_{mae} = \frac{1 + 1 + 2 + 2 + 10 + 100}{6} = \frac{116}{6} = 19.33$$

$$e_{rmse} = \sqrt{\frac{1^{2} + 1^{2} + 2^{2} + 2^{2} + 10^{2} + 100^{2}}{6}} = \sqrt{\frac{1 + 1 + 4 + 4 + 100 + 10000}{6}}=\sqrt{\frac{10110}{6}}=\sqrt{1685}=41.04$$

Do you see how huge a difference is? It is a great property to track the extremely wrong predictions, we compare MAE and MSE, and if they are distant, we can be sure that our model is not the best performer.

We are ready to wrap up information about MSE and RMSE:

MSE: great from the perspective of machine 💻 Faster than RMSE but at the same time it penalizes outliers more than MAE.
RMSE: pleasant for humans 👥 We can compare this metric directly to the data values.
RMSE + MAE: this couple has a unique skill to track wrong predictions 👀 Our model may seem to be ok, and the mean error could be relatively small, but it can produce bad results for some records. It is hard to track every record, but if we compare RMSE to MAE and we see a big difference we can be more than sure, that our model is cheating and it doesn’t fit data very well in every scenario.

Symmetric Mean Absolute Percentage Error

$$e_{smape} = \frac{100}{N} \sum_{i}^{N}{\frac{|\bar{y_{i}} – y_{i}|}{|y_{i}|+|\bar{y_{i}}|}}$$

MAE, MSE, and RMSE are the most popular metrics to evaluate model performance. This set is a long way to be completed. One of the more interesting error measurement techniques is the Symmetric Mean Absolute Percentage Error (SMAPE). It returns a relative error: on a scale from 0 to 100%. We can compare multiple models of different characteristics with this metric 🍎❓🍐❓🍐❓🍊❓🍋❓🍉. Unfortunately, this metric has glitches. It penalizes more underestimates than overestimates. Differences are small but significant. And another danger with SMAPE is that if our prediction and observation are equal to zero, we may encounter undefined behavior – zero is divided by zero. It is more important from the implementation point of view, but it could happen.

However, SMAPE is a good metric. Especially for reporting to the customers and business intelligence 💸

How do we use those metrics? – Summary

Forecast Bias: check if the model underestimates or overestimates observations.
Mean Absolute Error is used for fast accuracy computation for very large models.
Mean Squared Error: compute model accuracy when we want to ensure that wrong prediction is penalized heavily.
Root Mean Squared Error: slower than MSE, but it has the same units as data. It is better for human-to-human explanations.
Symmetric Mean Absolute Percentage Error: it returns an error on a scale between 0-100%. Great for comparing multiple models over datasets with different orders of magnitude or explanations.

You may, and you should join those insights! For example:

FB + MAE + RMSE: you can check model bias towards over-or underestimations and if the model generates significant and unreliable results.
FB + SMAPE: check the forecasting models’ accuracy between different R&D teams and track bias in those models.

Python Implementation

import numpy as np


def forecast_bias(predicted_array: np.array, real_array: np.array) -> float:
    """Function calculates forecast bias of prediction.
    Parameters
    ----------
    predicted_array : numpy array
                      Array with predicted values.
    real_array : numpy array
                 Array with real observations.
    Returns
    -------
    fb : float
         Forecast Bias of prediction.
    Notes
    -----
    How do we interpret forecast bias? Here are two important properties:
        - Large positive value means that our observations are usually higher than prediction. Our model
          underestimates predictions.
        - Large negative value tells us that our predictions are usually higher than expected values. Our model
          overestimates predictions.
    Equation:
    (1) $$e_{fb} = \frac{\sum_{i}^{N}{y_{i} - \bar{y_{i}}}}{N}$$
        where:
        * $e_{fb}$ - forecast bias,
        * $y_{i}$ - i-th observation,
        * $\bar{y_{i}}$ - i-th prediction,
        * $N$ - number of observations.
    """

    fb = float(np.mean(real_array - predicted_array))
    return fb


def mean_absolute_error(predicted_array: np.array, real_array: np.array) -> float:
    """Function calculates forecast bias of prediction.
    Parameters
    ----------
    predicted_array : numpy array
                      Array with predicted values.
    real_array : numpy array
                 Array with real observations.
    Returns
    -------
    fb : float
         Forecast Bias of prediction.
    Notes
    -----
    MAE is not the same as RMSE. It is a good idea to compare the mean absolute error with the root mean squared error.
    We can get information about predictions that are very poor. RMSE that is larger than the MAE is a sign that for
    some lags our predictions are very poor. We should check those lags.
    Equation:
    (1) $$e_{mae} = \frac{\sum_{i}^{N}{|y_{i} - \bar{y_{i}}|}}{N}$$
        where:
        * $e_{mae}$ - mean absolute error,
        * $y_{i}$ - i-th observation,
        * $\bar{y_{i}}$ - i-th prediction,
        * $N$ - number of observations.
    """
    mae = float(
        np.mean(
            np.abs(real_array - predicted_array)
        )
    )
    return mae


def root_mean_squared_error(predicted_array: np.array, real_array: np.array) -> float:
    """
    Parameters
    ----------
    predicted_array : numpy array
                      Array with predicted values.
    real_array : numpy array
                 Array with real observations.
    Returns
    -------
    rmse : float
           Root Mean Squared Error of prediction.
    Notes
    -----
    Important hint: it is a good idea to compare the mean absolute error with the root mean squared error. We can get
    information about predictions that are very poor. RMSE that is larger than the MAE is a sign that for some
    lags our predictions are very poor. We should check those lags.
    Equation:
    (1) e_{rmse} = \sqrt{\frac{\sum_{i}^{N}({y_{i} - \bar{y_{i}})^2}}{N}}$$
        where:
        * $e_{rmse}$ - root mean squared error,
        * $y_{i}$ - i-th observation,
        * $\bar{y_{i}}$ - i-th prediction,
        * $N$ - number of observations.
    """
    rmse = np.sqrt(
        np.mean(
            (real_array - predicted_array)**2
        )
    )
    return rmse


def symmetric_mean_absolute_percentage_error(predicted_array: np.array, real_array: np.array) -> float:
    """
    Parameters
    ----------
    predicted_array : numpy array
                      Array with predicted values.
    real_array : numpy array
                 Array with real observations.
    Returns
    -------
    smape : float
            Symmetric Mean Absolute Percentage Error.
    Notes
    -----
    Symmetric Mean Absolute Percentage Error is an accuracy measure that returns error in percent. It is a relative
    evaluation metrics. It shouldn't be used alone because it can return different values for overforecast and
    underforecast. The SMAPE penalizes more underforecasting, thus it should be compared to Forecast Bias to have
    a full view of the model properties.
    SMAPE is better than RMSE or FB because it is better suited to compare multiple models with different number
    of parameteres, for example, number of ranges.
    More about SMAPE here: https://en.wikipedia.org/wiki/Symmetric_mean_absolute_percentage_error
    Equation:
    (1) $$e_{smape} = \frac{100}{N} \sum_{i}^{N}{\frac{|\bar{y_{i}} - y_{i}|}{|y_{i}|+|\bar{y_{i}}|}}$$
        where:
        * $e_{smape}$ - symmetric mean absolute percentage error,
        * $y_{i}$ - i-th observation,
        * $\bar{y_{i}}$ - i-th prediction,
        * $N$ - number of observations.
    """
    smape = 100 * np.mean(
        np.abs(predicted_array - real_array) / (np.abs(real_array) + np.abs(predicted_array))
    )
    return smape