|The Declarer (Floyd McWilliams' Blog)|
Sunday, March 28, 2004
Posts to this blog have been scarce of late, as my time has been consumed with two activities: A crunch at work, including a two-day trip to Bellevue, Washington; and my fantasy baseball league draft.
Last May I read Moneyball, Michael Lewis' account of how the Oakland Athletics have put together winning teams over the past few years despite a miniscule payroll (just one-third what the Yankees pay their players). I became an A's fan and followed baseball for the first time since I was a kid. I decided that I wanted to play fantasy baseball, and joined a league in which my friends Eric and Brian play. I chose the name "Floyd Rage" as a pun on "Roid Rage" (the violent behavior that allegedly afflicts heavy users of steroids).
The rules for our league are as follows: There are nine teams of 25 players. Throughout the season, hitting, fielding, and pitching statistics are collected for each team's players. For any given statistic, like "home runs," all players' statistics are added together, and then the teams' cumulative statistics are compared to other teams. At the end of the season, when all statistics have been collected, the teams are then "matchpointed" (as in a bridge competition). First place gets 9 points, second gets 8, all the way down to ninth who gets 1. For instance, if my hitters performed exactly as they did last year, they would hit 264 home runs. If three teams had more homers, and five others had fewer, then I would be fourth in that statistic and would get 6 points.
All of this is scored by friendly computers at Yahoo. Leagues can choose what statistics are used for scoring, and ours uses the following:
Position Players (which just means "not pitchers"):
I mentioned that our teams had 25 players. We are required to field players at each position: Catcher, first base, second base, shortstop, third base, and left, right, and center field. Also there are positions for any infielder, any two outfielders, a "utility" player who could be anyone, and a second catcher. We also platoon twelve pitchers, at least five of whom must be starters; a typical mix is six starters and six relievers. Finally there is a "bench" spot for a 26th player who does not play, but can be substituted for another player who will then take the bench.
So which players should I pick? Now if I were lazy, or smart, I would just buy some fantasy baseball magazines (paper or online). But I thought it would be fun and educational to do my own research. The league commissioner had emailed us a spreadsheet he obtained from Yahoo that ranked the major league players and listed their 2003 statistics. Two weeks ago I opened the spreadsheet and went to work.
Four Stars in Floyd's Little Black Book
My first goal was to combine the various statistics into a single number that would express a player's value. To do this I had to adjust, or "normalize," the various stats. I couldn't just add them together because of the different scales of the numbers. Would you rather have a player with 30 home runs and a 0.250 batting average, or a player who bats 0.300 and hits 29 homers? Obviously the latter, but adding the numbers together gives you 30.25 versus 29.30, and the former number is larger.
My initial thought was to adjust each stat by dividing it by the average for all players for that statistic. But average data was hard to come by, and somewhat misleading as it might include many players who didn't play much. Instead I looked up the 100th best "qualified" player for each statistic, and then divided by these numbers to get a "normalized value." For the curious, a hypothetical 100th best position player has these batting stats:
(A "qualified" position player has 400 at bats. Curiously, there were just 165 qualified players in 2003, which is only 5.5 players per team.)
For each stat column in my spreadsheet, such as H (hits), I created another column for the normalized value. I would divide by the 100th best statistic. I then multiplied the result by 100 so that I could deal with whole numbers and view statistics as a percentage value. So my Excel formulae looked like this:
I decided that I liked using the 100th best player as a baseline, because I could view the 100th best player as a "replacement player" -- with 11 major leagers per fantasy player better than him, such a player must be easy to pick up late in the draft. Thus each normalized statistic represented a major leaguer's percentage of a replacement player's contribution.
Accentuating the Negative
I haven't mentioned errors although they are a position player statistics. Errors are difficult because they are a "negative" statistic; you win the errors category by having fewer errors than other players. At first I adjusted errors by dividing 100 by the normalized error stat. Since 8 is the 100th best error number, the formula was:
There were two problems with this approach. First, a player with no errors would cause a division by zero error in the spreadsheet. Second, the formula vastly inflated the importance of having a small number of errors. A player with two errors would have a normalized error value of 400. If he committed one fewer error, that value would rise to 800! This is a scoring difference equivalent to hitting 72 additional home runs, yet obviously the difference between one and two errors is miniscule.
So I switched to an addition-based approach. I decided, somewhat arbitrarily, that 12 errors would be worth 100 points, yielding a formula of:
Run Rabbit Run
Here are the normalized statistics for the top ten position players (as ranked by Yahoo in the aforementioned commissioner's mailing):
I was surprised to see Beltran rated so highly. I knew he was KC's top player, but was he really a better player than A-Rod or Bonds? Beltran's high score came from his 41 steals. Many stats, like batting average and hits, do not vary much -- in the top ten hitters, Pujols' 212 hits were 60% more than Bonds', and Pujols' 0.359 average was 35% better than Thome's 0.266. But steals are much more varied. Beltran had 41 and Soriano 35; A-Rod and Sheffield had half as many (17 and 18), and Helton, Delgado, and Thome had no steals at all.
Chaff from Wheat
I separated players by position, and then sorted them by normalized score. I was looking to see if there were positions which had limited numbers of good players. I found one right away when I generated the numbers for second basemen:
(I maintained the Yahoo rankings, which is why the players are not sorted by normalized score.)
Alfonso Soriano and Bret Boone stand head and shoulders above the other second basemen. I suspected that Boone's numbers were an aberration (a "career year") and he would revert to the mean, so Soriano was the only outstanding second baseman.
Now I was fifth in the draft order, and Soriano was ranked fifth in the Yahoo material. I decided that I would draft Soriano if he was avaiable. Also I had a backup plan; if I couldn't get Soriano, I would draft a second baseman late.
The story was similar at shortstop:
This time, three good shortstops. I did a little research on Renteria, who plays for the Cardinals, and found that he has produced at a high level his whole career. I decided I would draft Renteria as well.
What about third base?
No real studs here, though perhaps some players to avoid. I decided that I would draft a third baseman late, probably Polanco or Blalock.
The Other End of the At-Bat
What about pitchers? Well, by the time I got around to the men on the mound it was the day before the draft and I was hacking spreadsheets like mad on the plane ride home. Comparing pitchers is difficult, because there are three positions to consider:
All pitchers generate ERA, WHIP, and strikeouts. ERA and WHIP are pro-rated for the number of innings pitched, so starters influence these stats much more than relievers. However I did not know this until 90 minutes before draft time, when I was chatting with Brian. I immediately scrambled to adjust my statistics. I had to scribble a modified value on my printouts, but at least I was spared the ignominy of drafting relievers earlier than their value would warrant.
Here is my crude attempt to calculate normalized value for starting pitchers:
There were some pitchers with very high scores, but ... Halladay and Loaiza had great years, not great careers. Martinez is notoriously fragile. Schmidt is coming off elbow surgery. So I figured I would wait a few rounds and then draft one or two pitchers with scores in the 120's or 130's.
I won't reproduce my reliever stats, which had to be annotated with feverish scribblings an hour before draft time. Anyway, as we will see tomorrow, these stats were missing some very important information.
Onward Sabermetric Soldiers
I spent the last of my research time Friday night looking at injury reports for all teams. It was my nightmare that I would draft a player, only to have someone sneer: "He'll be a good player for you -- when he returns from Tommy John surgery in 2005." Fortunately there were no injuries to major players that I did not already know about.
Saturday morning I printed the stat sheets, summarized by position. I came up with a plan for the first few rounds: Draft Soriano, or Beltran if Soriano were not available. Then the best player available, then Renteria, then Ichiro Suzuki, then a good position player, then a pitcher. But I did not have a set plan beyond Soriano and Renteria. I would use my data to find players who were big improvements over other players at their respective positions.
Tomorrow: Draft Day. Will Floyd Throw a Chair through a Wall?