Jump to content

PECOTA projections


markakis8

Recommended Posts

I think if you took the fan forecasts and cut 10 wins a team off you'd do pretty well.
That would make the Orioles around a 77 win team?

Except for the Orioles, of course!

Here are the median results of Tony Soprano's annual "official" wins poll the last five years (actual team results in parentheses):

2011: 82-85 (69)

2012: 71-75 (93)

2013: 87-91 (85)

2014: 86-89 (96)

2015: 86-89 (81)

Projected wins: 412 - 429

Actual wins: 424

Not too shabby!

Link to comment
Share on other sites

  • Replies 190
  • Created
  • Last Reply

I have a few gripes with the way that the PECOTA model is used

First, the player forecasts are built on past performance data only, and as such are intentionally blind to things like the performance impact of specific injury and the potential for recovery. Moreover, I've not seen the historical standard error on their predicted player performance vs. actual, but I'd bet it's pretty wide. By the way, any reasonable statistician would NOT term unexplained variance as "luck", it is just unexplained by the data at hand and, perhaps, better data might improve the predictive power of the model. This would include so called intangibles, which are typically not included because no good quantitative measures exist, but could be very important.

Second, as correctly noted, using PECOTA on the team level requires assumptions about how many at bats each hitter has, how many hitters each pitcher faces, and how many chances each fielder handles. VERY difficult to get right and there is no evidence that the method used to make these assumptions is very scientific at all. There is also significant unexplained variance as to how individual hitting stats combine to predict runs scored and vice versa, how pitching and fielding stats combine to predict runs allowed.

Third, the so called "pythagorean" theorem tries to combine predicted runs allowed and runs scored to predicted wins - there is unexplained variance here too.

All of this reduces the predictive power of team PECOTA. Someone should back test the model going back as far as there are reliable player stats to see how accurate it is.

BTW if statistical modeling does such a poor job of predicting baseball records, why do we place so much confidence in the ability of statistical models to accurately predict the economy or the weather?

Link to comment
Share on other sites

I have a few gripes with the way that the PECOTA model is used

First, the player forecasts are built on past performance data only, and as such are intentionally blind to things like the performance impact of specific injury and the potential for recovery. Moreover, I've not seen the historical standard error on their predicted player performance vs. actual, but I'd bet it's pretty wide. By the way, any reasonable statistician would NOT term unexplained variance as "luck", it is just unexplained by the data at hand and, perhaps, better data might improve the predictive power of the model. This would include so called intangibles, which are typically not included because no good quantitative measures exist, but could be very important.

Second, as correctly noted, using PECOTA on the team level requires assumptions about how many at bats each hitter has, how many hitters each pitcher faces, and how many chances each fielder handles. VERY difficult to get right and there is no evidence that the method used to make these assumptions is very scientific at all. There is also significant unexplained variance as to how individual hitting stats combine to predict runs scored and vice versa, how pitching and fielding stats combine to predict runs allowed.

Third, the so called "pythagorean" theorem tries to combine predicted runs allowed and runs scored to predicted wins - there is unexplained variance here too.

All of this reduces the predictive power of team PECOTA. Someone should back test the model going back as far as there are reliable player stats to see how accurate it is.

BTW if statistical modeling does such a poor job of predicting baseball records, why do we place so much confidence in the ability of statistical models to accurately predict the economy or the weather?

The inventor of PECOTA, Nate Silver, wrote an entire book about forecasting, The Signal and the Noise. It covers forecasting of the economy, the weather, earthquakes, sports, political races and other things. Silver would be the first to tell you that there are large standard deviations in virtually all types of forecasting.

If you have ever seen an actual PECOTA page for a player, it doesn't contain just one projection. It has a whole range of projection by percentiles (e.g., the 50th percentile projection), also provides chances for a "breakout" or a "collapse" and an "attrition rate." I don't pretend to know how the sausage is made, but it's fair to say that when we talk about a player's PECOTA projection we are oversimplifying.

Link to comment
Share on other sites

I don't pretend to know how the sausage is made, but it's fair to say that when we talk about a player's PECOTA projection we are oversimplifying.

People shake their fists at the gods when a system projects the Orioles to 80 wins. Then they shake them some more when someone explains that the 80 win projection really says the Orioles are more likely than not to win somewhere between 75 and 85, but 70 or 90 isn't out of the question, and one or two teams a year are off by 20 or 30.

Link to comment
Share on other sites

People shake their fists at the gods when a system projects the Orioles to 80 wins. Then they shake them some more when someone explains that the 80 win projection really says the Orioles are more likely than not to win somewhere between 75 and 85, but 70 or 90 isn't out of the question, and one or two teams a year are off by 20 or 30.

The question is, do their projections have any better margin of error than (1) simply projecting every team at 81-81, or (2) simply projecting every team to have the same record as last year, or (3) something about that simplistic? If I'm planning to spend the day outside and the weather forecast is for 70 degrees, plus or minus 10, I really don't know how to dress, so that forecast isn't very helpful to me.

Link to comment
Share on other sites

The question is, do their projections have any better margin of error than (1) simply projecting every team at 81-81, or (2) simply projecting every team to have the same record as last year, or (3) something about that simplistic? If I'm planning to spend the day outside and the weather forecast is for 70 degrees, plus or minus 10, I really don't know how to dress, so that forecast isn't very helpful to me.

1) Yes...it is more accurate than saying each team is 81 and 81.

2) I have not compared it like that.

Link to comment
Share on other sites

Projections for team performance are useless and irrelevant. That's why they play the game. What some thinks they are going to do is worth nothing. If they picked the O's to Winn 95 games you'd love them but it would still be worthless. Bleacher Report projects the O's to be the 19th best team in baseball and finish 4 the East. So what.

Let's watch and not worry about projections. BTW WAR is unprovable

Link to comment
Share on other sites

The question is, do their projections have any better margin of error than (1) simply projecting every team at 81-81, or (2) simply projecting every team to have the same record as last year, or (3) something about that simplistic? If I'm planning to spend the day outside and the weather forecast is for 70 degrees, plus or minus 10, I really don't know how to dress, so that forecast isn't very helpful to me.

Some forecasting systems are better than projecting everyone at 81-81. Most that are serious. But often polls of writers on ESPN or wherever are worse. With increasing parity you get closer to the point where random variation and clumped records makes it harder to beat 81-81.

If you want really helpful projections they should double the size of MLB, or find some alternate methods of putting separation between teams. I'd rather have 25 of 30 teams being reasonably competitive, and I'm guessing you would, too.

Forecasting today's MLB might be a little like forecasting which Hawaiian island will be warmer today when they're all in the 70s.

Link to comment
Share on other sites

Some forecasting systems are better than projecting everyone at 81-81. Most that are serious. But often polls of writers on ESPN or wherever are worse. With increasing parity you get closer to the point where random variation and clumped records makes it harder to beat 81-81.

If you want really helpful projections they should double the size of MLB, or find some alternate methods of putting separation between teams. I'd rather have 25 of 30 teams being reasonably competitive, and I'm guessing you would, too.

Forecasting today's MLB might be a little like forecasting which Hawaiian island will be warmer today when they're all in the 70s.

I'm loving the competitive situation in the AL right now. The NL is nowhere near as balanced, with several teams in rebuild mode.

Link to comment
Share on other sites

I'm loving the competitive situation in the AL right now. The NL is nowhere near as balanced, with several teams in rebuild mode.

With interleague play and parity you could have an AL Lake Wobegon, where all the teams are above average. Every team goes 71-71 against the AL, and 11-9 against the NL and you pick the playoffs by drawing straws.

Link to comment
Share on other sites

The question is, do their projections have any better margin of error than (1) simply projecting every team at 81-81, or (2) simply projecting every team to have the same record as last year, or (3) something about that simplistic? If I'm planning to spend the day outside and the weather forecast is for 70 degrees, plus or minus 10, I really don't know how to dress, so that forecast isn't very helpful to me.

1) Easily.

2) Yes. Even without factoring in players added, PECOTA and systems like it try to mitigate fluctuations by doing things like averaging out three years of wins instead of just last year. It turns out a team that wins 60 games, then 90, is much more likely to win 75 the next season than 90.

3) To my knowledge no system has ever been able to beat PECOTA over a decent sample size, but I could be remembering incorrectly. I certainly know that no system with human input has ever been able to beat even fairly simple model projections. It just turns out a lot of what we view as common sense is incorrect.

Link to comment
Share on other sites

1) Easily.

2) Yes. Even without factoring in players added, PECOTA and systems like it try to mitigate fluctuations by doing things like averaging out three years of wins instead of just last year. It turns out a team that wins 60 games, then 90, is much more likely to win 75 the next season than 90.

3) To my knowledge no system has ever been able to beat PECOTA over a decent sample size, but I could be remembering incorrectly. I certainly know that no system with human input has ever been able to beat even fairly simple model projections. It just turns out a lot of what we view as common sense is incorrect.

I would like to see the studies. I've seen some that compare the various projection models in terms of how well they project player performance. I've never seen one that looks at how good any team projections were. And I don't know how the PECOTA-based team projections are done, but I believe there is some "human input" involved in deciding which players are projected to get the playing time.

Link to comment
Share on other sites

I wish I had saved all of the different studies I've seen about projection systems.

PECOTA projects playing time based on depth charts, so yes, technically at this point they are projected depth charts but eventually they will be the official ones. 538 seems to think that PECOTA is doing better than ever on individual and playing time predictions:

http://fivethirtyeight.com/features/is-2015-the-year-baseballs-projections-failed/

The PECOTA wiki site has some mild win accuracy analysis:

An independent evaluation by the website Vegas Watch showed that PECOTA had the lowest error in predicting Major League team wins in 2008 of all the best known forecasts, both those that were sabermetrically based and those that relied on individual expertise.[36] In 2009, however, PECOTA lagged behind all the well-known forecasters.[37]

A summary for the 2003 through 2007 seasons shows that PECOTA's average error between the predicted and actual team wins declined:[38] 2003 5.91 wins; 2004 7.71 wins; 2005 5.14 wins; 2006 4.94 wins; 2007 4.31 wins. Silver conjectures that the improvement has come in part from taking defense into account in the forecasts beginning in 2005. In 2008 the average error was 8.5 wins.[39]
Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.




×
×
  • Create New...