Wednesday, October 1, 2008

On Winning the WNBA Fantasy Game

..subtitled, "Why I Am a Moron".

With the fantasy games starting with the help of our beloved Musicman 55 at the RebKell message boardsd, the first question was "how can I breeze to victory?" The first thing I began to do was look for patterns - for example, during the Detroit-Indiana game I looked for how well the Detroit players did at Indiana at home and away, and vice versa. Some Detroit players did better playing at Auburn Hills than they did at Conseco Fieldhouse. I tried to draw a conclusion by claiming "this player will do well because they did well in Indiana before."

I got my hat handed to me. That was when I realized something about the importance of sample size.

Obviously, we can't have say, Detroit and Indiana play 1000 games each in Detroit and Indiana and watch who does better at what in what situations. We don't have that luxury. Our total number of Detroit-Indiana matchups was limited to two this year before the playoffs - May 21st and September 5th. The May 21st matchup was so far back that many things had changed about both teams over the almost four months. This made the results of the September 5th matchup theoretically more reliable, but how much can you rely about the results of one game to tell you about the players in it? Detroit won 90-68, but this was in Detroit and the Shock had come off five days of rest versus the three days of rest for the Fever. All you could do was look at player performance and team performance and make some kind of intelligent conjecture about how well the Shock would fare against the Fever.

This begs two questions:

How much basketball do you have to watch before you get a good idea about how good a player is?
How much basketball do you have to watch before you get a good idea about how good a team is?

A statistican would say, "tell me about the distribution of performance, and I will tell you the answer."

I'll make a stab at what "distribution" means. Suppose I could yank 25 players out of the current WNBA class at random. I could get anybody. I could get Candace Parker, or Chioma Nnamaka. I could get Taj McWilliams-Franklin, or Coco Miller. "Distribution" asks questions about how talent is distributed. How many talented players could I expect? And how many lousy ones?

Most people believe - erroneously, I think - that talent is distributed on the "bell curve" out of high school - there should be a small number of players that stink and a small number of players that are of Diana Taurasi-caliber, with everyone else clogging the middle. If you believe that, you won't need to look at very many games to determine how well this group of players is doing - there are specialized distributions like the Student-T which can give you an idea of average performance based on small sample sizes. We watch all 25 or so players in a few games, and that's that.

I don't think talent, however, is normally distributed. I believe talent in basketball is more like a pyramid than a bell curve. There's a small number of really good players and you start sliding into crappy players almost immediately. It's more like an curve that starts high and drops rapidly, if talent is distributed from left to right.

So how many players do you have to look at to determine average player performance? How many games would you have to average for a single player to get an idea of how good (or bad) they are?

There is something in statistics called the Central Limit Theorem. The CLT says something like, "regardless of distribution, if you begin adding enough instances of the distribution up and looking at the mean and variance of the sum, the sum distribution starts to look like a normal distribution." Once you know mean and variance of a normal-like distribution, you can begin making some amazing predictions. (This is why you never get called for a presidential poll - they only need about 600 people to get the level of accuracy they want. If Obama is leading McCain 55-45, it's unlikely that your 600 people are going to distribute 60-40 McCain if it is a truly random sample.)

How many distributions do you have to add up to meet the requirements of the CLT? How big is "big enough"? You probably need at least a sum of 30 instances of the distribution to get a good idea of mean and variance.

Likewise, if you add up an individual player's performance over 30 games, you can get a good idea of their mean performance and how much their performance varies from the mean. You might want to equalize performance by minute, but I believe that 30 games is enough to get the measure of a player.

If your going to look at how well a player is likely to perform in a playoff game, you don't look at how well they did in the one or two instances they played their prospective opponent. You look at their regular season averages, which will be very accurate if they played a full season. Becky Hammon had a great efficiency average for 2008, and statistics suggests that that performance will carry over into the finals. Likewise, Deanna Nolan's efficiency average should also carry over. If you're going to look at how well Hammon or Nolan are going to perform, don't limit your view to the two games that San Antonio and Detroit played in 2008 - the sample size isn't big enough.

When Laimbeer evaluates the San Antonio Silver Stars, he's not going to limit his film to the two games that he lost against them this year. He might be able to squeeze out more statistical performance by "tilting the table" - figuring out what San Antonio doesn't do well, and then taking advantages. Maybe he can decipher how San Antonio plays their perimeter game, and break it up. If the game is broken up and turnovers result, the performance of each player for the Shock might perform marginally better if they can turn those turnovers into buckets. Or maybe Taj McWilliams-Franklin knows that Player Y is vulnerable to a quick feint to the right, and takes advantage of that fact. Coaches at best can tilt the table, but they can't make players fundamentally better in the short time of the playoffs.

So if you're picking two players that will do best for your pick 'em game, my advice would be to play the percentages. Becky Hammon scoring 10 points in a game is an off day. Sandora Irvin scoring 10 points in a game is almost a career high. Bet accordingly.

P. S. I am toying with the idea of a keeper WNBA fantasy league next year. Someone talk me out of this madness.


Frisco Del Rosario said...

I am toying with the idea of a keeper WNBA fantasy league next year. Someone talk me out of this madness.

Great idea.

pt said...

Great idea to be doing one, or great idea to be talked out of doing it? :D