Monday, February 1, 2010

Why won't BP get with the program?


SG's 2010 CAIRO MLB standings projections were released last week. This made everything right with the Yankee Universe. CAIRO has the pinstripes smoking the field, cruising to 102 wins, with all of baseball in the rearview mirror. The sound you heard was the entire Yankee fan base smirking, drooling for the season to begin.

Then Baseball Prospectus released its projections ... with the Yankees finishing OUT OF THE PLAYOFFS. If the good people at BP thought we Yankee fans were going to take this slight lying down, well then they just don't follow baseball as closely as I thought. The nearest thing to a drama that one can get in the Yankee blogosphere ensued. SG immediately found genuine flaws in the BP methodology. According to SG, BP had far too many unearned errors in its projection for the Yankees and the rest of baseball. This was hard to believe, considering that the Yankees project to be a very good fielding team. Baseball Prospectus responded, recalculated its projections, and now has the Yankees tied with the Red Sox in 1st place.

BP uses a bottom-up approach to making projections. It first makes player projections. Then, those projections are used to make team projections. I've followed BP's projections for sometime, and have come away a critic. In fact, my first Yankeeist post ever was a bit of a criticism of BP's inaccurate projections of the 2009 Yankee offense.

Once again I have criticisms, this time of their 2010 projections for the Yankees. Last time I benefited from hindsight when I explained why PECOTA was off the mark. This time I'm putting my proverbial money where my mouth is. Here's what PECOTA is getting wrong about the 2010 Yankees (or, Why I Prefer CAIRO Projections and the Bombers Should Smoke the Field):

1) Derek Jeter is not a 30.3 VORP player. Now, anyone who reads my work knows that my immediate previous post just slammed VORP as a statistic. That criticism stands, but it's impossible to analyze a BP projection without using VORP, since, you know, it's their baby. They had Jeter as a 72.8 VORP player last season. BP is projecting the Captain to lose more than half his value.

Here's what BP says Derek will do next season in slash stats: .296/.367/.425. That would rank among his worst seasons ever. Other than his injured 2008 campaign, Derek has never been so bad. Here's what he's done his last four seasons:

2006: .343/.417/.483
2007: .322/.388/.452
2008: .300/.363/.408
2009: .334/.406/.465

PECOTA seems to be very critical of future Hall of Famers, and aging players. For example, in 2009 it said that Albert Pujols would hit 35 home runs. He hit 47. Way to nail that one, BP. Joking aside, it is logical that PECOTA would struggle to predict that kind of performance. In general, only a handful of players can hit 35 dingers consistently, let alone practically 50. If you're in the prediction business you bet against Albert ever existing.

The same is true of Derek Jeter. A savvy gambler would never bank on a player accruing 2,500 hits in his career. It's too infrequent an occurrence relative to the number of baseball players. Jeter complicates things further because he's turning into a once-ever player. As I posted earlier, the players his career compares most favorably too were all finished by about 35. Jeter shows no evidence of slowing down, meaning that it's becoming less and less useful to use other players to project the trajectory of his remaining seasons. Unfortunately, PECOTA has to do just that, so it sees Jeter retiring very soon.

That's sloppy projecting. Consistently, the best projection of the immediate next outcome is the outcome that preceded it, so long as a full cycle of data is available. So, while I wouldn't use one at-bat to project a player's next at-bat, I would use the most recent season a player submitted as the strongest evidence of what he'll do next season, unless there is compelling evidence against using this methodology. For Derek, this means it's unlikely he'll submit one of his worst offensive seasons on the heels of one of his best, no matter how old he is. A safer bet would have at least been to predict he'll post his career line of .317/.388/.465, a regression from 2009 for sure, but far more probable than predicting Derek posts the second-worst OBP of his career.

2) I'm not sure their projection for Curtis Granderson accurately reflects his new team. BP adds its name to the list of prognosticators arguing that Curtis is going to come back in 2009. They see him posting a .270/.355/.502 slash line. That's excellent, but they also argue he'll only knock in 62 RBI.

I don't buy the RBI projection. While it's true that Granderson has consistently knocked in between 65 and 75 runners each season of his career, he's never been in the Yankee lineup before. (Also, BP is predicting that in the stacked bomber lineup Granderson will knock in fewer runs than he ever did in Detroit.)

Last season Johnny Damon hit 2nd (where BP says Granderson will hit) and put up an OBP of .365 -- one walk more than they predict for Granderson for every 100 plate appearances -- while knocking in 82 runs. The low RBI total projected for Granderson with virtually identical slash stats to Damon's in 2009 suggests that the computer is either underestimating the Yankee lineup (see above) or projecting for Granderson in a vacuum. Either way, it weakens the team projection.

3) A full season of 2010 Alex Rodriguez will be better than a partial season of 2009 Alex Rodriguez -- except if you're PECOTA. Once again I think a sense check was needed on the numbers. Last season BP put A-Rod at a 52.3 VORP. VORP is a counting stat that is adjusted for playing time. There are many reasons why A-Rod's 2010 season will be at least as productive as his partial 2009 season, but PECOTA has him regressing to a 47.4 VORP.

The dropoff appears to come from the .388 OBP PECOTA predicts for A-Rod. That number alone is great, except that Alex hasn't gotten on base at that low a rate (by A-Rod's standards) since 2004.

4) Jorge Posada doesn't suck, but BP thinks he does. Jorge's bounceback 2009 was a big part of the Yankee offense. He posted a slash line of .285/.363/.522 for a VORP of 35.7. PECOTA sees Jorge posting .270/.355/.418 for a VORP of 15.8. (Brett Gardner is predicted to have a VORP of 18.2!)

The BA and OBP predictions are fine. Its smarter to be conservative about predictions, especially for aging catchers. But why would Jorge, a switch hitter who should at least be able to feast on the short porch in right, suddenly lose all his power? They predict he'll only hit 11 homers. He's never hit so few in a single full season of his career.

Those are the problems with the offensive projections (and I'm not critiquing their conservative projection for Robinson Cano). Those differences alone are worth, potentially, several wins. The pitching projections are just awful. And away we go ...

5) CC Sabathia will be better in 2010. Here's what CC has done in his last four seasons:

2006: ERA 3.22 WHIP 1.173
2007: ERA 3.21 WHIP 1.141
2008: ERA 2.70 WHIP 1.115
2009: ERA 3.37 WHIP 1.148

The projection calls for him to have an ERA of 3.58 and a WHIP of 1.2. Until the Rays lit him up when he was going for his 20th win last season CC's ERA was around 3.15 and his WHIP was about 1.12. The Rays fiasco was an isolated data point. Without it CC's performance was a bit better than he was in 2006 or 2007.

Therefore, his 2009 stats seem like a floor for his 2010 production (I believe he's going to have a beast of a season now that he's comfortable). BP has him posting his worst season since 2005.

6) A.J. Burnett is a good pitcher. He wasn't great in 2009, but he still posted an ERA of 4.04. Although his WHIP was 1.401, he still managed a pitcher VORP of 37.1. BP feels that A.J. will lose about a third of his value next year and post a VORP of 24.7 on the heels of an ERA of 4.35 and a WHIP of 1.35.

PECOTA is projecting that A.J. will only pitch 190 innings, down from 207 last season, but it is still hard to see him posting his career-worst ERA (for a full season). Even if he doesn't prove as durable as he's been the past two years it's hard to imagine A.J.'s ERA inflating much since he's been pitching in the AL East. If anything it should come down a bit since he doesn't have to face the Yankees.

7) PECOTA projects that Andy Pettitte will post an ERA of 4.94 next year, with a WHIP of 1.48. That's bad. Andy has never been worse than a 4.7 ERA in his whole career, although his WHIP has been as high as 1.59.

Andy has been fairly consistent the last two seasons. In 2008 he pitched 204 innings of 4.54 ERA and 1.412 WHIP baseball. In 2009 he pitched 194 innings of 4.16 ERA and 1.382 WHIP baseball. The strongest evidence I see to support PECOTA's greatly diminished 2010 projection for Pettitte is his age. That's fine, except the magnitude of the decline seems hard to believe. I struggle to see a pitcher who held up so well through the entire postseason suddenly being an entire run worse per 9 innings in 2010.

8) Mariano Rivera and Phil Hughes are good pitchers. In 2009 Mariano was a 29.5 VORP pitcher, which is phenomenal when you consider he only tossed 66.3 innings. Philthy was right on his heels, with a 25.6 VORP over 86 innings of work.

BP predicts Mo's ERA will bloat to 3.07 next season, the worst it would be since 2007. If you exclude 2007, that would represent Mo's worst mark since his rookie season. Last year Mariano Rivera posted an ERA of 1.76 with a WHIP of 0.905. In the playoffs he turned it UP A NOTCH, allowing only 1 run and 15 baserunners in 16 innings of work against the best teams in the Majors.

BP sees Phil Hughes posting an ERA of 3.93, a WHIP of 1.29 and pitching only 59 innings. Hughes pitched 86 innings of 3.03 ERA and 1.116 baseball last year. This year he projects to serve almost exclusively as the 8th inning setup guy, a role in which he completely dominated last season. As with Mariano, it's difficult to see one of the Yankees' strongest run preventing assets declining so rapidly in 2010.

PECOTA's projections aren't entirely off the mark. It mostly nails the projections for the team's younger players (Granderson, Teixeira and Swisher, for example). It predicts an excellent season from Javier Vazquez (ERA of 3.79) and, frankly, sells way too high on Joba Chamberlain, predicting an ERA of 4.21 over 144 innings. But the misses pile up.

I've followed PECOTA's projections now for about two full seasons. It seems to be an excellent tool for fantasy owners looking for value in middle-of-the-road players. But these are the players who are easiest to predict because they compare so favorably to the vast majority of baseball data points.

PECOTA struggles to predict outstanding players, aging players who have quite a bit left in the tank, and volatile young players. The Yankees have all three kinds of guys on their squad next season. Jeter, Mo and A-Rod are outstanding players whose careers defied prediction every season. They also join Jorge and Pettitte as aging players who have yet to show evidence of any real decline. I avoided talking about Robinson Cano's projection, but it comes in on the low side, which I attribute to the volatility of his numbers in the past couple of seasons.

In total, PECOTA doesn't seem like the best system for projecting the 2010 Yankees. PECOTA was systemically off when it projected the 2009 Yankees. So far it seems just as inaccurate for the 2010 Yankees.

2 comments:

  1. Re: Granderson--seems like they didn't account for the fact that he won't be batting leadoff anymore. Jeter, for all his 212 hits last year, only drove in 66 due to lineup placement, a career low for a full season. Put Granderson where people will be on base for him and 85-95 RBIs is not out of the question

    ReplyDelete
  2. To my eyes its indicative of a process that is more computer driven than it should be. There needs to be a basic sense check on these numbers. No member of the Yankees last year - not even Leche - hat fewer than 66 RBI. Granderson hitting 62 suggests flaws in the projections.

    ReplyDelete