Jump to content

Fangraphs: The Orioles and Accepting Random Variation


Can_of_corn

Recommended Posts

  • Replies 88
  • Created
  • Last Reply
The point is, you can't look at one result, and say "Well, the Orioles did better than the projections, clearly the system is flawed." That was what Cameron was getting at. Some outliers are to be expected. I just think he was very rigourous in this case, because the results he got matched with his preconceived notions.

Yes, you absolutely can look at one variable over time and judge a model. People get hired and fired every day when real money is riding on these types of forecasts.

For example, if I were a financial analyst and my model kept undervaluing a stock for five straight years and it ended up costing my firm $100M, you can bet that missing on that one variable would be significant enough to reassess the entire thing.

Look, I'm a statistician, I get the arguments being made and they are rational. But if something continues to happen for specific variables over time, there clearly are some extraneous variables impacting that variable. Simply chalking it up to "random variation" to your company's CEO when it keeps happening would get you fired. Cameron needs to look at things like HRs, ground ball rate, double plays, defense, etc., to answer why this is happening.

Link to comment
Share on other sites

Yes, you absolutely can look at one variable over time and judge a model. People get hired and fired every day when real money is riding on these types of forecasts.

For example, if I were a financial analyst and my model kept undervaluing a stock for five straight years and it ended up costing my firm $100M, you can bet that missing on that one variable would be significant enough to reassess the entire thing.

Look, I'm a statistician, I get the arguments being made and they are rational. But if something continues to happen for specific variables over time, there clearly are some extraneous variables impacting that variable. Simply chalking it up to "random variation" to your company's CEO when it keeps happening would get you fired. Cameron needs to look at things like HRs, ground ball rate, double plays, defense, etc., to answer why this is happening.

I didn't say "one variable", I said "one result." As in, one team's performance in one season.

Upthread, I already agreed with everything you said about Cameron needing to look deeper into these numbers. I think there are factors that aren't being accounted for that might explain why certain teams are beating their projections for multiple years.

Link to comment
Share on other sites

The point is, you can't look at one result, and say "Well, the Orioles did better than the projections, clearly the system is flawed." That was what Cameron was getting at. Some outliers are to be expected. I just think he was very rigourous in this case, because the results he got matched with his preconceived notions.

I don't think anyone is disputing this. But we are not looking at one result. We are looking at five straight years (for the Angels) and now three straight years (for the Orioles) as even last year we exceeded the projections, albeit by not as many wins. When it's the same team (and much of the same players) that has consistently exceeded the models over and over, at some point one must question why the model repeatedly underestimates that organization. As stated, I think part of the problem is that the models do not value the bullpens of a team as much as they should.

Link to comment
Share on other sites

Yes, you absolutely can look at one variable over time and judge a model. People get hired and fired every day when real money is riding on these types of forecasts.

For example, if I were a financial analyst and my model kept undervaluing a stock for five straight years and it ended up costing my firm $100M, you can bet that missing on that one variable would be significant enough to reassess the entire thing.

Look, I'm a statistician, I get the arguments being made and they are rational. But if something continues to happen for specific variables over time, there clearly are some extraneous variables impacting that variable. Simply chalking it up to "random variation" to your company's CEO when it keeps happening would get you fired. Cameron needs to look at things like HRs, ground ball rate, double plays, defense, etc., to answer why this is happening.

I think baseball is a sufficiently complex game, played by human beings not robots, that any mathematical model can only have limited predictive power. When results deviate, it could be purely random sometimes, and it could be due to variables the model hasn't adequately captured at other times. Fans will probably want to assume the latter, but we have to acknowledge that sometimes its the former. What's true of the current Orioles? I don't know, and I'm not sure I care.

Link to comment
Share on other sites

I don't think anyone is disputing this. But we are not looking at one result. We are looking at five straight years (for the Angels) and now three straight years (for the Orioles) as even last year we exceeded the projections, albeit by not as many wins. When it's the same team (and much of the same players) that has consistently exceeded the models over and over, at some point one must question why the model repeatedly underestimates that organization. As stated, I think part of the problem is that the models do not value the bullpens of a team as much as they should.

I know. You and I were agreeing on this two pages ago. LOL

I think somehow I've gotten sucked into two different sides of the same argument...

Link to comment
Share on other sites

I don't see anything wrong with that statement. The projection is just a most likely scenario. It's not an ironclad promise. But people act like everytime a statistical projection is off, that somehow the entire system behind it is bunk. Just because something is likely to happen, doesn't mean it's going to.

If Team A is projected to win 73 games, what does that mean? They have a 60% chance of hitting that number? And maybe 10% chance of winning 93? Well, sooner or later, that 10% chance is going to happen.

But one and done scenarios are not the same as continually defying the projections, like the Orioles have been doing.

Link to comment
Share on other sites

Where did they have the Red Sox? That would be two 20 win wrong projections. Right?

Yes. It happens.

One thing the model doesn't account for is firesales and giving up. Twice in the last three years the Red Sox blew it up and didn't field the team the projections were based on for the last 2-3 months of the season.

Link to comment
Share on other sites

I think baseball is a sufficiently complex game, played by human beings not robots, that any mathematical model can only have limited predictive power. When results deviate, it could be purely random sometimes, and it could be due to variables the model hasn't adequately captured at other times. Fans will probably want to assume the latter, but we have to acknowledge that sometimes its the former. What's true of the current Orioles? I don't know, and I'm not sure I care.

My attitude is that the Orioles haven't built their team to expect 95 wins. They built the team to hope for 80-some wins and some things going right, and it's a lot of fun when that works out.

It's more fun to drive a slow car fast than a fast car slow, so I'd much rather be a 80-win team (on talent) that laps the division than a 97-win team that sneaks into the wildcard.

Link to comment
Share on other sites

The best models will still be wrong some of the time. The best model isn't the one that predicts every single record correctly - that's impossible. That would just be extremely lucky. You have to accept that all models, even perfect ones, have the possiblity of being off by 20 wins for any one team.

I understand that. So what's the point of the model if it's going to continually be off for a team like the Orioles or the Angels or whoever by that many wins?

Link to comment
Share on other sites

The best models will still be wrong some of the time. The best model isn't the one that predicts every single record correctly - that's impossible. That would just be extremely lucky. You have to accept that all models, even perfect ones, have the possiblity of being off by 20 wins for any one team.

That's the thing. If that's the case, you can just say every team will win about 78 games and be right almost every single time (within the margin of error).

This, in a nutshell, is what is wrong with Cameron's argument for me.

Link to comment
Share on other sites

Another example is the hot hand question in basketball, where the researchers foolishly asserted there is no hot-hand after looking at the data and finding it fell within a standard model, rather than the much more sensible assertion, that the data confirms not only that the hot hand exists (and the cold hand), but also that hot streaks and cold streaks should be expected. Just because it?s impossible to predict, when and for how long such streaks wi occur, does not mean that the reason the streaks themselves occur is random.

This comment at the bottom of the article sums it up pretty well for me. Basically, just because outliers exist, doesn't mean we shouldn't be asking the question of why certain outliers exist. Especially when a team has been doing it for three years in a row.

Also, a point that I think is kind of lost in this is that the O's are only outperforming their Pythag by three games this year, meaning they'd still be in first place even if they were performing exactly as expected.

Link to comment
Share on other sites

Sure it's valid. You just need to accept that in baseball random stuff makes up a fairly large percentage of the differences between teams. If every single team was a exact copy of every single other team you'd still observe teams winning 70 and other teams winning 90.

I don't think it's reasonable to assume that everything you don't understand gets thrown in a bucket called 'random variation.' Many things that follow a random distribution given a large enough sample have a causal factor for individual outliers. Cameron didn't attempt to show that managers don't have year-over-year correlation in ability to do better on baseruns - he just dismissed the argument because Mike Scoscia was on a 3-year negative streak with it.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.


×
×
  • Create New...