Wednesday, September 2, 2009

Winning Streaks

After starting their season 0-2, the Indiana Fever embarked on an 11 game winning streak that propelled them to the top of the Eastern Conference after just 13 games of a 24-game season. They've been there ever sense, and so far, no one has come close to matching that run.

From what I can tell, here are the longest streaks of the season over the last six years of WNBA play. The lines are in the following format:

year-team-total wins over season-longest win streak

2009 Fever ??? 11
2008 Silver Stars 24 7
2007 Shock 24 7
2006 Sun 26 12
2006 Sparks 25 8
2005 Sun 26 8
2005 Monarchs 25 7
2004 Sparks 25 7

(The Shock had two different seven-game streaks in 2007.)

This leads to an interesting thought exercise:

"Suppose you were told that some time during the 2009 WNBA season, a team would win 11 straight games. You are not told the name of the team. You are not told where in the season this happens. Where do you suppose the team would finish? How many wins would that team have after 34 games?"

Answering this question is not a mathematically straightforward one for several reasons. Conventional probability is good for answering the question, "probability p of winning a game, what is the chance that the team would win x consecutive games from opening day", but it's not good about answering questions about winning streaks where the streak could happen anywhere in the season. That question is more about combinatorics than about probability.

We're also approaching the question from the probability end when it should be approached from the statistics end. I always think of the difference between the two as being that in probability, you start with probability p and ask about a result, where in statistics, you start with a result and ask about probability p.

Furthermore, a team's "true probability" might not have anything to do with the number of games it wins. To claim that probability and wins should be linked up would be like claiming that if you flip a coin 20 times, you will get 10 heads and 10 tails in every trial. For example, a teams "true probability" could be winning 12 games in a 34 game season, but it could finish with say 14 wins or 10 wins as probability is the opposite of deterministic.

I therefore created a spreadsheet that could simulate 10,000 34-game seasons. One could input a "true probability" of the team winning each of those games and the spreadsheet could also count the longest win streak. My approach was to then see how many seasons out of 10,000 had a winning streak of a given number of games.

I looked at probabilities between p = 0.25 (9-win season) and p = 0.75 (26-win season), incrementing by 0.025 with each observation of 10,000 seasons. We count the number of streaks in 10,000 trials and come up with a probability of having a streak of a given size given that the true probability of victory is "p".

(* * *)

For example, suppose the true probability of some team winning a game is 0.25 - that the team we're looking at is a 9-win, bottom of the barrel team on the average. In a simulation of 10,000 seasons, only in 9 of those seasons did the team put together an 11-game (or more) winning streak, or in just 0.09 percent of the trials. If you encounter a team with an 11-game winning streak, it's very unlikely that that team's true win probability is 0.25.

We can probably (there's that word again) start talking confidently about how good a team really is when we get to the 10 percent of all trials level. For example, if the team's true winning probability is 0.55 - where the team averages 19 wins a season - we get a case where in 13.31 percent of all trials (1,331 times out of 10,000) do we get an 11-game winning streak. We can probably say with more confidence that our team should be better than average if can win 11 games in a season.

What probability p is associated with 25 percent of the trials having an 11-game winning streak? When p = 0.625, 27.49 percent of all trials/seasons yield an 11-game winning streak. p=0.625 means that the team should win at least 21 games.

If you want to find the probability where you have a 50 percent chance of a team having an 11-game win streak, look at the case where p = 0.725. In that case, 56.63 percent of all trials/seasons contained an 11-game win streak. p = 0.725 is associated with a 25-win season. The 50 percent mark - where the chances of having an 11-game win streak are 1 in 2 - is a team that should win somewhere between 24 and 25 games in a 34-game season.

As it turns out, the Indiana Fever are 20-8 right now, which is a .714 winning percentage. If the win all of their remaining games at that rate, they would finish 24-10, or right on the money. If someone asked "how many games in a season should a team that has an 11-game winning streak win?" an answer of "24 games or so" is a very good guess.

Here's another way to think about it. Indiana won 11 games. That left 23 games that didn't belong to the win streak. Even if they finished those 23 games at only .500 or so, that would give them about 22 or 23 wins total. Once Indiana went 11-2, and then lost, it seemed that Indiana was pretty much foreordained to finish with a 20+ win season. But you don't need probability or an Excel spreadsheet to tell you that.

No comments: