Jump to content

Fangraphs: The Orioles and Accepting Random Variation


Can_of_corn

Recommended Posts

When your statistical model has an expected variation of 10-20 wins, I'm not sure what the value of it is.

Honestly, local sportswriters have been as good (or better) at predicting the standings than Fangraphs for several years now. And the prediction models like ZIPS or whatever are no better than just jotting down the previous years stats as the prediction for the coming year.

Link to comment
Share on other sites

  • Replies 88
  • Created
  • Last Reply

That's an interesting article, but I feel like it may be missing something in all the math. I get that projection system allows for a certain number of outliers, and that the number of teams that beat the projection aren't statistically significant. But I find it very interesting that so many of the teams on that list were repeats. That the Angels beat their projection five years in a row, and the Astros and Twins appeared multiple times. That, to me, flies in the face of "there's a certain amont of random variation, and this is within an acceptable level within this model." It suggests to me that there are certain attributes that make a team more likely to beat their projection, and I think it would behoove someone to do a close ananlysis of those teams to see in what ways there are similar. Maybe it's nothing, but I think it's worth investigating. (It's very interesting to see the patterns here on which teams have done this conssitently. I was just thinking about this earlier this week.)

Link to comment
Share on other sites

Jeez. You guys don't seem to understand what a model is or the point of it. If I made a model for dice rolling, and you rolled snake eyes two out of ten times, would you immediately start investigating the dice?

What he is saying is that on the whole, this model is doing its job exactly as expected. If it weren't, a bunch of teams would be outliers along with the orioles. Now I don't think this is an attack on the O's or their style of baseball, just a defense of what is a very broad prediction system.

Actually, from this writeup I'd say that Cameron does not understand completely the distinction between uncertainty and randomness. A model that attributes a normal distribution of results completely to randomness may well be missing systematic effects that represent flaws in the model. To take your example, consider that I have two pair of dice, one loaded to give preference to results below 7 and the other loaded to give results above 7. If I sum all the data over both pair of dice, I might well conclude that the results are completely consistent with a purely random distribution about the mean. But if I look at results for each pair individually, I have to conclude that the distribution is not completely random. The normal distribution of overall results is not sufficient -- you also have to look at correlations.

When there are teams that consistently beat their projections for several-year stretches, one should certainly suspect that the model is missing something, and is just rolling that missing element into larger-than-supportable random uncertainty estimates. Cameron argues that the missing element is unlikely to just be the manager. That may be true, but there are many other possibilities, such as bullpen strength, poor evaluation of defensive prowess in the model, etc.

Overall, I think Cameron's response to the comments in this piece is pretty lazy, intellectually.

Link to comment
Share on other sites

When there are teams that consistently beat their projections for several-year stretches, one should certainly suspect that the model is missing something, and is just rolling that missing element into larger-than-supportable random uncertainty estimates. Cameron argues that the missing element is unlikely to just be the manager. That may be true, but there are many other possibilities, such as bullpen strength, poor evaluation of defensive prowess in the model, etc.

Overall, I think Cameron's response to the comments in this piece is pretty lazy, intellectually.

That part particular bothered me. He says that Scioscia was able to beat the projection for 5 years, but then dismisses that he had any thing to do with it because he "forgot" how to do it after that. Which ignores the many, many factors outside his control. Maybe Scioscia built a team with great defense, lots of home runs, and a great bullpen, and then his relivers got hurt, his power hitters left as free agents, and his best defenders retired. Dismissing the manager as a factor is pretty stupid without looking at how the composition of the team changed over time, and how that may have affected the team's performance.

Link to comment
Share on other sites

Cameron's work and pride revolves around sabermetrics. In 2012 he and many other sabermetric people called the Orioles "lucky" and "fluky" in 2012 because of our record in 1-run games and our Pythagorean record based on run differential. Of course, the next year we had a terrible record in 1-run games and our Pythagorean record was better than it was in 2012, and we won 85 games. The 1-run record and extra inning record in 2012 was fluky, no question, but the 2013 season kind of blew the whole "Orioles are lucky, not good" mantra out of the water.

Now in 2014, our Pythagorean record is pretty close to our actual record. So you can see the precarious position Cameron is in now. All the stats that he used to show that the Orioles were lucky in 2012 can't be used anymore. So he has two options remaining: admit he was wrong, or come up with new excuses as to why the Orioles are lucky and not really good. I don't think it's driven by a hatred of the Orioles, just a desire to prove that he is not wrong. And of course, when all else fails, he can just bring out his "randomness/coinflip" pull string dolls.

I suspect shortly we'll be seeing pre-season predictions predicting every team with 78-84 wins, with a +/- 40 game margin of error, and present it as highly sophisticated advanced projections.

Link to comment
Share on other sites

I suspect shortly we'll be seeing pre-season predictions predicting every team with 78-84 wins, with a +/- 40 game margin of error, and present it as highly sophisticated advanced projections.

Or at least a statistical model. Which tells us essentially nothing.

Link to comment
Share on other sites

The 2012 Orioles were a decent team that managed to distribute their runs in about the most effective manner possible, but there?s just no evidence to suggest that this is a repeatable skill over significant periods of time. And sure enough, after going 29-9 in one run contests in 2012, the 2013 Orioles went 20-31 in games decided by a lone run. For one year, the Orioles defied the odds, but as we?d expect, they couldn?t get that to carry over into the next season, and they won eight fewer games despite playing basically at the same level as the previous year.

I didn't really care for this. I won't argue that it's a repeatable skill or not, but the 2013 Orioles doesn't prove that it's not. The 2013 team was worse in 1 run games because their bullpen was worse and their closer blew 9 saves. He might be right, but he also thinks he's comparing apples to apples.

Link to comment
Share on other sites

Actually, from this writeup I'd say that Cameron does not understand completely the distinction between uncertainty and randomness. A model that attributes a normal distribution of results completely to randomness may well be missing systematic effects that represent flaws in the model. To take your example, consider that I have two pair of dice, one loaded to give preference to results below 7 and the other loaded to give results above 7. If I sum all the data over both pair of dice, I might well conclude that the results are completely consistent with a purely random distribution about the mean. But if I look at results for each pair individually, I have to conclude that the distribution is not completely random. The normal distribution of overall results is not sufficient -- you also have to look at correlations.

When there are teams that consistently beat their projections for several-year stretches, one should certainly suspect that the model is missing something, and is just rolling that missing element into larger-than-supportable random uncertainty estimates. Cameron argues that the missing element is unlikely to just be the manager. That may be true, but there are many other possibilities, such as bullpen strength, poor evaluation of defensive prowess in the model, etc.

Overall, I think Cameron's response to the comments in this piece is pretty lazy, intellectually.

Thank you for making this clear.

Link to comment
Share on other sites

That part particular bothered me. He says that Scioscia was able to beat the projection for 5 years, but then dismisses that he had any thing to do with it because he "forgot" how to do it after that. Which ignores the many, many factors outside his control. Maybe Scioscia built a team with great defense, lots of home runs, and a great bullpen, and then his relivers got hurt, his power hitters left as free agents, and his best defenders retired. Dismissing the manager as a factor is pretty stupid without looking at how the composition of the team changed over time, and how that may have affected the team's performance.

The Angels are my main problem with Cameron's dismissal of the arguments against his model. The Angels exceeded projections for 5 straight years and then regressed back to normal as far as the projections went. Of course they did. The roster aged, the roster changed and the pen was atrocious. But that doesn't mean that the Angels weren't actually much better than Cameron or his models believed them to be over a five year stretch. I think his model vastly underrates bullpens. He is big on the Tigers this year and yet their pen is atrocious (and we are seeing that now). Finally, if you flipped a coin 162 times, 5 rounds in a row, and heads far exceeded tails each round, I suspect you would begin looking at the coin to see what is wrong with it.

Link to comment
Share on other sites

I didn't really care for this. I won't argue that it's a repeatable skill or not, but the 2013 Orioles doesn't prove that it's not. The 2013 team was worse in 1 run games because their bullpen was worse and their closer blew 9 saves. He might be right, but he also thinks he's comparing apples to apples.

I don't see how this is disagreeing with him. You could easily say "The 2013 team was worse in 1 run games because their bullpen regressed to the mean and was worse and their closer regressed to the mean and blew 9 saves.

And he wasn't giving the 2013 O's as "proof" so much as an example of how extreme outliers usually regress.

Link to comment
Share on other sites

Having consulted with Gisele on this one, you are way off.

“You [have] to catch the ball when you’re supposed to catch the ball. My husband cannot [expletive] throw the ball and catch the ball at the same time. I can’t believe they dropped the ball so many times"

Link to comment
Share on other sites

Oh, I agree. It just amusing that it seem like the model can stretch to include everything. An if that is true what number is not valid.

They can say with the O's players they should win 81 games. But if they win 100 they are still correct under the standard deviation of the model. No prediction in wrong under those terms.

Sure it's valid. You just need to accept that in baseball random stuff makes up a fairly large percentage of the differences between teams. If every single team was a exact copy of every single other team you'd still observe teams winning 70 and other teams winning 90.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.


×
×
  • Create New...