|Microsoft Office Excel 2003
This article was adapted from Microsoft Excel Data Analysis and Business Modeling by Wayne L. Winston. Visit Microsoft Learning to learn more about this book.
This classroom-style book was developed from a series of presentations by Wayne Winston, a well known statistician and business professor who specializes in creative, practical applications of Excel. So be prepared — you may need to put your thinking cap on.
Sample files You can download the sample files that relate to excerpts from Microsoft Excel Data Analysis and Business Modeling from Microsoft Office Online. This article uses the files NFL2002Ratings.xls, NBA02_03.xls, NBA01_02.xls, and NFL01.xls.
- Can I use Excel to set NFL point spreads?
Many of us follow basketball, football, hockey, and baseball. Bookmakers set point spreads on games in all these sports and others. For example, the bookmakers’ best guess was that the Oakland Raiders would win the 2003 Super Bowl by 3 points. How can you use Excel to come up with team "ratings" that generate reasonable point spreads?
Using a simple Solver model, you can generate reasonable point spreads. The changing cells for the Solver model will be a rating for each team and the size of the home-field advantage. For example, if the Indianapolis Colts have a rating of +5 and the New York Jets have a rating of +7, the Jets are considered 2 points better than the Colts.
With regard to the home-field edge, in most years, home professional football teams tend to win by an average of 3 points (while home college basketball teams tend to win by an average of 5 points). We can define the outcome of an NFL game to be the number of points by which the home team outscores the visitors. We can predict the outcome of each game by using the following equation (which I’ll refer to as equation 1).
(Predicted points by which home team outscores visitors) = (Home team rating) - (Visiting team rating) + (Home field edge)
For example, if the home-field edge equals 3 points, when the Colts host the Jets, the Colts will be a 1 point favorite (5 + 3 – 7). If the Jets host the Colts, the Jets will be a 5 point favorite (7 – 5 + 3).
What target cell will yield "good" ratings? Our goal is to find the set of values for team ratings and home-field edge that best predict the outcome of all games. In short, we want the prediction for each game to be as close as possible to the outcome of each game. This suggests that we want to minimize the sum over all games of (Actual outcome) – (Predicted outcome). The problem with using this target is that positive and negative prediction errors cancel each other out. For example, if we overpredict the home team margin by 50 points in one game and underpredict the home team margin by 50 points in another game, our target cell would yield a value of 0, indicating perfect accuracy when in fact we were off by 50 points a game. We can remedy this problem by minimizing the sum over all games by using [(Actual Outcome) – (Predicted Outcome)] 2. Now positive and negative errors will not cancel each other out.
Can I use Excel to set NFL point spreads?
Let’s now see how to determine accurate ratings for NFL teams by using the scores from the 2002–2003 regular season. You can find the data for this problem in the file NFL2002Ratings.xls, which is shown in the following figure. Note that I’ve hidden the ratings of some teams so that the ratings and model would fit on a single screen.
To begin, I named the range D2:D33, which contains each team’s rating, "rating". I also (for reasons that will soon become apparent) named the range B2:D33 "lookup" and the cell F2 "home_edge". I placed a trial home-edge value in that cell.
Starting in row 36, columns C and D contain the team code number (listed in B2:B33) for the home and away team for each game. For example, the first game (listed in row 36) is the San Francisco 49ers (team 28) playing at the New York Giants (team 21). Column E contains the home team’s score, and column F contains the visiting team’s score. As you can see, the 49ers beat the Giants 16-13. I can now compute the outcome of each game (the number of points by which the home team beats the visiting team) by entering the formula =E36-F36 in cell G36. By pointing to the lower-right portion of this cell and double-clicking the left mouse button, you can copy this formula down to the last game, which appears in row 291. (By the way, an easy way to find the last row of the data is to press CTRL+SHIFT+DOWN ARROW. This key combination takes you to the last row filled with data — row 291, in this case.)
In column H, I use equation 1 to generate the prediction for each game. The prediction for the first game is computed in cell H36 by using the formula Home_edge+VLOOKUP(C36,lookup,3)-VLOOKUP(D36,lookup,3). This formula creates a prediction for the first game by adding the home edge to the home team rating and then subtracting the visiting team rating. Note that VLOOKUP(C36,lookup,3) locates the home team rating by using the home team code number in column C, while VLOOKUP(D36,lookup,3) looks up the visiting team’s rating by using the visiting team’s code number in column D.
In column I, I compute the error (actual score – predicted score) for each game. Our error for the first game is computed in cell I36 by using the formula =G36-H36. In column J, I compute the squared error for each game. The squared error for the first game is computed in cell J36 by using the formula =I36^2. After selecting the cell range H36:J36, I copied the formulas down to the bottom of our spreadsheet (H291:J291).
In cell J34, I’ve computed our target cell by summing all the squared errors by using the formula SUM(J36:J291). (You can enter a formula for a large column of numbers such as this by typing =SUM( and then selecting the first cell in the range you want to add together. Press CTRL+SHIFT+DOWN ARROW to enter the range from the cell you’ve selected to the bottom row in the column, and then add the closing parenthesis.)
It is convenient to make our average team rating equal to 0. A team with a positive rating is better than average, and a team with a negative rating is worse than average. I’ve computed the average team rating in cell D34 by using the formula AVERAGE(rating).
I can now fill in the Solver Parameters dialog box as shown in the following figure.
We minimize the sum of our squared prediction errors for all games (computed in cell J34) by changing each team’s rating and the home edge. The constraint D34=0 ensures that the average team rating is 0. From the following figure, we find that the home team has an advantage of 2.25 points over the visiting team. Our 10 highest-rated teams are shown in the following figure.
Note that we would have predicted that Oakland would play Tampa Bay in the Super Bowl. Unfortunately, our predicted Super Bowl outcome was Oakland by about 2 points (10.64 – 8.8 = 1.84 points). There’s no home team in the Super Bowl!
Our model is not linear, because the target cell adds together terms of the form (Home Team Rating + Home Field Edge – Visiting Team Rating)2 . Recall that for a Solver model to be linear, the target cell must be created by adding together terms with the form (changing cell)*(constant). This relationship doesn’t exist in this case, so our model is not linear. Solver does obtain the correct answer, however, for any sports rating model in which the target cell minimizes the sum of squared errors.
- The file NBA02_03.xls contains scores for every regular season game during the 2002–2003 NBA season. Rate the teams.
- The file NBA01_02.xls contains scores for every game during the 2001-2002 NBA season. Rate the teams.
- The file NFL01.xls contains scores for every regular season game during the 2001 NFL season. Rate the teams. Which team would you have forecast to make the Super Bowl?
- True or False: An NFL team could lose every game and be an above average team.
- Our method of rating teams works fine for basketball. What problems arise if we apply our method to hockey or baseball?