Learning Library

← Back to Library

Decision Trees, Random Forests, Golf Choice

Key Points

  • A simple decision‑tree example classifies “golf yes” vs. “golf no” based on time availability, weather, and having clubs, illustrating how sequential rules make predictions.
  • Individual decision trees can suffer from bias and over‑fitting, prompting the use of ensemble methods like Random Forests.
  • Random Forest builds many trees on random subsets of data and features, aggregating their votes to improve accuracy and to mitigate over‑fitting and bias.
  • Setting up a Random Forest involves tuning parameters such as node size, number of trees, and number of features, balancing predictive performance against training time and memory usage.

Full Transcript

# Decision Trees, Random Forests, Golf Choice **Source:** [https://www.youtube.com/watch?v=gkXX4h3qYm4](https://www.youtube.com/watch?v=gkXX4h3qYm4) **Duration:** 00:05:07 ## Summary - A simple decision‑tree example classifies “golf yes” vs. “golf no” based on time availability, weather, and having clubs, illustrating how sequential rules make predictions. - Individual decision trees can suffer from bias and over‑fitting, prompting the use of ensemble methods like Random Forests. - Random Forest builds many trees on random subsets of data and features, aggregating their votes to improve accuracy and to mitigate over‑fitting and bias. - Setting up a Random Forest involves tuning parameters such as node size, number of trees, and number of features, balancing predictive performance against training time and memory usage. ## Sections - [00:00:00](https://www.youtube.com/watch?v=gkXX4h3qYm4&t=0s) **Decision Tree Golf Example** - The speaker walks through a basic decision‑tree model for choosing whether to play golf, highlights its role as a binary classification task, and briefly introduces random forests as an ensemble approach to mitigate tree bias and overfitting. - [00:03:08](https://www.youtube.com/watch?v=gkXX4h3qYm4&t=188s) **Configuring Random Forest Parameters** - The speaker explains how to set node size, number of trees, and feature selection for a random forest, balances accuracy against training time and memory usage, illustrates diverse real‑world applications, and humorously lets the model decide whether to play golf. ## Full Transcript
0:00I just can't decide, should I play a round of golf today? 0:04Well, let's use this decision tree to make the decision. 0:09So first off, do I have the time? 0:13If I don't, well, then that's an easy decision. 0:18No golf. 0:20But let's say I do. 0:22Second decision point, is it sunny today? 0:27If there's sun, then I don't care about any other factor. I'm playing golf. 0:33If there's no sun, let's go down to the next level. 0:36Well, do I have my clubs with me? 0:38Do I have them handy? 0:40If I do not, then I'm not going to bother playing if it's not sunny. 0:45If I do, then I absolutely will. 0:51The decision tree here is an example of a classification problem 0:55where the class labels are "golf yes" and "golf no". 1:00And, while they're helpful, decision trees they can though be prone to problems. 1:05Things like bias and overfitting. 1:07But that is where something called "random forest" comes in to play. 1:18Random forest is a type of machine learning model that uses an ensemble of decision trees to make its predictions. 1:24And why do we call it random forest? 1:26Well, the reason is because it's actually built by taking a random sample of my data 1:31and then building an ongoing series of decision trees on the subsets. 1:35So we're essentially creating a whole bunch of decision trees together. 1:45And those give us a larger model or group. 1:49Look, the chances are that other people have built different and maybe better decision trees to answer the same question. 1:56Maybe those trees consider things like the time of day, which I didn't consider, or the difficulty of the course. 2:02The more decision trees that I use with different criteria, 2:05the better my random forest will perform because it's essentially increasing my prediction accuracy. 2:11And if one or two of these smaller decision trees are not relevant on a certain day, well, we just ignore them. 2:21One of the primary benefits of random forest is that it can help reduce overfitting. 2:34And this occurs when your model starts to memorize the data 2:37rather than trying to generalize from making predictions on future data. 2:41Essentially, it helps me get around the limitations of my data, 2:45which might not be fully representative of all golfers or all the best features in my model. 2:50It can also help reduce something else, and that's bias. 2:54Bias can occur when there is a certain degree of error introduced into the model. 2:59Bias occurs when you're not evenly splitting your instance space during training. 3:03So instead of seeing all of the data points, you might see only half because of how you set your model up. 3:08Now to set up a random forest, you will set some parameters. 3:16We have parameters for node size. 3:23We have parameters for number of trees. 3:30And we also have parameters for a number of features. 3:39And it can be challenging at first because you'll want to use a lot of trees, like as many as you can, 3:45to get the best predictive accuracy, but you don't want so many trees that it'll take you a long time to train the model 3:51and use a lot of memory space. 3:53But once you've set up these parameters, you'll use a random forest model to make predictions on your test data. 3:59And you can even segment or slice your results by different criteria. 4:03Maybe you want to know how your random forest does on certain types of golf courses 4:07or how it performs during different times of day. 4:10Random forest is pretty popular among data science professionals and with good reason. 4:15It can be extremely helpful in all sorts of classification problems. 4:20In finance, for example, it can be used to predict the likelihood of a default. In a medical diagnosis, 4:34it can be used to predict prognosis or survival rates depending on treatment options and in economics. 4:41It can be used to sort of help understand whether a policy is effective or ineffective. 4:47So, what do you think? 4:48Should I play golf today? 4:50Well, the sum of all my random forest decision trees say yes. 4:56I'll see you out on the course. 4:58If you have any questions, please drop us a line below, 5:01and if you want to see more videos like this in the future, please like and subscribe. 5:06Thanks for watching.