Learning Library

← Back to Library

Evaluating Forecast Accuracy with Loss Functions

10m • Unknown Channel • ai-ml • tutorial • intermediate • Watch on YouTube ↗

Key Points

A loss function quantifies the error between an AI model’s predicted output and the actual value, with larger differences indicating higher loss.
In a real‑world case, a colleague’s model that forecasted YouTube video views performed poorly, illustrating the need to assess and improve predictions using loss metrics.
By calculating loss, we can iteratively adjust model parameters: decreasing loss means the model improves, while increasing loss signals deterioration, guiding the training process toward a predefined error threshold.
Loss functions fall into regression (for continuous targets like video views, house prices, temperature) and classification categories, with common regression losses including Mean Squared Error (MSE) that heavily penalizes large mistakes and Mean Absolute Error (MAE).

Sections

Full Transcript

# Evaluating Forecast Accuracy with Loss Functions **Source:** [https://www.youtube.com/watch?v=v_ueBW_5dLg](https://www.youtube.com/watch?v=v_ueBW_5dLg) **Duration:** 00:10:09 ## Summary - A loss function quantifies the error between an AI model’s predicted output and the actual value, with larger differences indicating higher loss. - In a real‑world case, a colleague’s model that forecasted YouTube video views performed poorly, illustrating the need to assess and improve predictions using loss metrics. - By calculating loss, we can iteratively adjust model parameters: decreasing loss means the model improves, while increasing loss signals deterioration, guiding the training process toward a predefined error threshold. - Loss functions fall into regression (for continuous targets like video views, house prices, temperature) and classification categories, with common regression losses including Mean Squared Error (MSE) that heavily penalizes large mistakes and Mean Absolute Error (MAE). ## Sections - [00:00:00](https://www.youtube.com/watch?v=v_ueBW_5dLg&t=0s) **Loss Functions in Forecasting Models** - The speaker explains how loss functions quantify prediction errors and illustrates their use with a YouTube view‑forecasting AI model that performed poorly, underscoring the need for model adjustments. - [00:03:05](https://www.youtube.com/watch?v=v_ueBW_5dLg&t=185s) **Choosing Between MSE, MAE, Huber** - The passage explains the characteristics of mean squared error, mean absolute error, and Huber loss—how they handle outliers—and offers guidance on selecting the appropriate regression loss based on the presence and impact of extreme values. - [00:06:15](https://www.youtube.com/watch?v=v_ueBW_5dLg&t=375s) **Cross‑Entropy vs Hinge Loss** - The passage explains entropy, describes how cross‑entropy loss quantifies the uncertainty of model predictions against certain ground‑truth labels, and contrasts this with hinge loss, which enforces confident, margin‑based correctness especially in binary classification. - [00:09:27](https://www.youtube.com/watch?v=v_ueBW_5dLg&t=567s) **Loss Function Guides Model Training** - The loss function measures model performance and, via its gradient, directs optimization algorithms to update weights and biases until the loss is minimized. ## Full Transcript

0:00How good is an AI model at forecasting? 0:04We can put an actual number on it. 0:06In machine learning a loss function tracks the degree of error in the output from an AI model, 0:16and it does this by quantifying the difference or the loss between a predicted value. 0:22So let's say that that is five, the model gave us five, as the output and then comparing that to the actual value. 0:33So maybe the model gave us ten and we call that the ground truth. 0:40Now, if the model's predictions are accurate, then the difference between these two numbers, 0:47the loss, in effect, is comparatively small. 0:53If it's predictions are inaccurate, let's say it came back with an output of one instead of five, then the loss is larger. 1:03So let me give you an example of how we can use this. 1:06Now, I have for a colleague who built an AI model to forecast how many views his videos would receive on YouTube. 1:14He fed the model YouTube titles and then the model forecast how many views that video would receive in its first week. 1:23Here they are. 1:24Little bit vain, if you ask me. 1:26But it wasn't me. 1:28It was my colleague. 1:29Now, how well did the model do? 1:30Well, when comparing the model forecasts to the actual number of real YouTube views, 1:36the model wasn't getting too close. 1:38The model predicted that the cold brew video would bomb, and that pour over guide video would be a big hit. 1:44Just wasn't the case, though. 1:46Now, this is a hard problem to solve and clearly this model needs some adjustments 1:50and that's where loss functions can help. 1:54Loss functions 1:55let us define how well a model is doing mathematically. 1:58And if we can calculate loss, we can then adjust model parameters and see if that increases loss, 2:04meaning it's made it worse, or if it decreases loss, meaning it's made it better. 2:09And at some point we can say that a machine learning model has been sufficiently trained. 2:13When loss has been minimized below some predefined threshold. 2:18Now at a high level, we can divide loss functions into two types, regression loss functions and then classification loss functions. 2:26And let's start. 2:29With regression, which measures errors in predictions involving continuous values. 2:36Predictions like the price of a house or the temperature for a given day or well, the views for a YouTube video. 2:42Now, in these cases 2:44the loss function measures how far off the model's predictions are from the actual continuous target values. 2:50Now, regression loss must be sensitive to two things, basically whether the forecast is correct or not. 2:56But also the degree to which it diverges from the ground truth. 3:00And there are multiple ways to calculate regression loss functions. 3:05Now, the most common of those is called MSE or mean squared error. 3:13Now, as its name suggests, 3:15MSE is calculated as the average of the squared difference 3:18between the predicted value and the true value across all training examples. 3:23And squaring the error means the MSE gives large mistakes a disproportionately heavy impact on overall loss, 3:31which strongly punishes outliers. 3:34So that's MSE. 3:36MAE or mean absolute error measures the average absolute difference between the predicted value 3:45and MAE and is less sensitive to outliers compared to MSE as it doesn't square the errors. 3:51So how do you decide which regression loss function to pick? 3:55Well, if your ground truth data has relatively few extreme outliers with minimal deviation. 4:02Like, I don't know, the temperature ranges in the month of July in the southern US, which, trust me, is basically always hot. 4:09Well then MSE is a particularly useful option for you 4:13as you want to heavily penalize predictions that are far off from the actual values. 4:18MAE is a better option when data does contain more outliers. 4:22And we don't want those outliers to overly influence the model. 4:26Forecasting demand for a product. 4:28That's a good example where occasional surges in sales shouldn't overly skew the model. 4:34But there is a third choice. 4:35The third choice is called huber loss. 4:40Now, hubar loss is a compromise. 4:42It's a compromise between MSE and MAE. 4:46It behaves like MSE for small errors and MAE for large errors, 4:50which makes it useful when you want the benefits of penalizing large errors but not too harshly. 4:57Now I've calculated the lost functions for the YouTube example. 5:00This is the MAE value summing up the absolute differences, 5:04meaning on average the predictions were off by about 16,000 views per video. 5:10The MSE lost function, that's over 400 million. 5:16It skyrockets and that's due to the squaring of large errors, and the huber loss. 5:21That also indicates poor predictions, but provides a more balanced perspective, 5:25penalizing large errors less severely than MSI. 5:29But look, these numbers don't mean a whole lot on their own. 5:33We want to adjust the model's parameters, generate new forecasts and see where we move the needle on loss. 5:39But before we get to how to do that, let's talk about the other type of loss function classification. 5:45Unlike regression loss functions which deal with predicting continuous numerical values, 5:50classification loss functions, well, they're focused on determining the accuracy of categorical predictions. 5:58Is an email spam or not spam? 6:01Are these plants classified into their correct species based on their features? 6:05So the loss function in classification tasks measures how well the predicted 6:10probabilities or labels match the actual categories. 6:16Now cross entropy loss is one way of doing this, and it's the most widely used loss function for classification tasks. 6:28Now, what is entropy? 6:30It's a measure of uncertainty within a system. 6:33So if you're flipping a coin, there are only two possible outcomes heads or tails. 6:37The uncertainty is pretty low. 6:40So low entropy. 6:42Running a six sided die means there's more uncertainty about which of these six possible numbers will come up. 6:47The entropy is higher. 6:49Now cross entropy loss measures how uncertain the model's predictions are compared to the actual outcomes. 6:55In supervised learning, model predictions are compared to the ground truth classifications provided by data tables. 7:02Those ground truth labels are certain, and so they have low or in fact no entropy. 7:07As such, we can measure the loss in terms of the difference in certainty we'd have using the ground truth labels 7:13to the certainty of the labels predicted by the model. 7:17Now, an alternative to this is called hinge loss instead. 7:23Now, this is commonly used in support of vector machines 7:26and hence loss encourages the model to make both correct predictions and to do so with a certain level of confidence. 7:35It's all about measuring that level of confidence, 7:38and it focuses on maximizing the margin between classes with the goal that the model is not just correct, 7:44but it's confidently correct by a specified margin. 7:48And this makes the hinge loss particularly useful in binary classification tasks 7:53where the distinction between classes needs to be as clear and as far apart as possible. 7:59So we've calculated our loss function. 8:01Great, but what can we do with that information? 8:06Now remember that the primary reason for calculating the loss function is to guide the model's learning process. 8:11The last function provides a numeric value that indicates how far off the model's predictions are from the actual results. 8:19And by analyzing this loss, we can adjust the model's parameters typically through a process called optimization. 8:26In essence, the loss function acts as a feedback mechanism, 8:29telling the model how well it's performing and where it needs to improve. 8:33The lower the loss, the better the model's predictions align with the true outcomes. 8:38Now, after adjusting the YouTube prediction model, 8:41we get a new set of forecasts and we can now compare the loss functions between the two models, 8:48and in all three cases, the loss function is now lower, 8:53indicating less loss with the greatest effect on MSE, mean squared error. 8:59As the model reduced the large prediction error for the poorer the video. 9:04Now that's lost function as an evaluation metric, 9:07but it can also be used as inputs into an algorithm that actually influences the model parameters. 9:13To minimize loss, for example, by using gradient descent. 9:17And that works by calculating the gradient or the slope of a loss function 9:24with respect to each parameter. 9:27Using the gradient of the loss function 9:30Optimization algorithms determine which direction to step the model in order to move down the gradient 9:37and therefore reduce loss. 9:40The model learns by updating the weight and bias terms until the loss function has been sufficiently minimized. 9:48So that's loss function. 9:50It's both a scorekeeper that measures how well your model is performing, 9:54and a guide that directs the model's learning process, 9:58and a thanks to lost function. 10:00My a, my colleague, can keep tweaking his YouTube AI model 10:05to minimize the loss and teach that model to make better predictions.