Jump to content

HHP: Pythagorean record and the Gaussian Copula-Function: How I learned to stop worrying and love


cityknight

Recommended Posts

The Pythagorean record and the Gaussian Copula-Function: or, How I learned to stop worrying and love...the O's

I don't post, basically ever. But I feel compelled to comment on a common debate around here. What does the Pythagorean record mean to the 2012 Orioles?

I work with numbers. I would not say that I am good at statistics or math in general. I am not mathematically illiterate, but I am no PhD either. I may be completely wrong with this little spiel, and if I am, please tell me why so I may understand. But here it goes:

1. The Pythagorean expectation for a baseball team is the best easily accessible metric that baseball fans have available to them that allows them to accurately predict the future performance of a club most of the time. The Pythagorean expectation relies on a Weibull distribution for its mathematical derivation. This means that the 'proof' famously published by Professor Miller in 2006 makes the following assumption 'runs scored and runs allowed per game are statistically independent,' (stolen gladly from Wikipedia)

2. A common criticism of the record is that sometimes teams have 'pluck' or a 'great manager' who allows you to beat the statistical averages, thus the Pythagorean expectation is merely the hand-waving of nerds who don't understand 'real' baseball. I think this is a poor explanation. I'm a massive nerd who has loved baseball on the field, in the stands, and from my sofa since I was a young boy. I think that I understand real baseball. Don't dismiss the nerds.

3. However, the reverse applies as well: the hubris of mathematical 'proofs' is well documented. A rather apt comparison relates to the subprime mortgage crisis and what is commonly known as the Gaussian copula function. I'm not getting political here, so please read carefully. When the financiers were building the complicated mathematical equations and algorithms that allowed them to bundle up mortgages, repackage them and sell them later, one of the justifications that they used was based on a model built by a young man far removed from the buying and selling of homes. The 'Gaussian copula' was not a model or an equation that allowed one to predict the future, it was merely a good way to describe a relationship between disparate assets, like a bunch of houses in some part of the world. This was not the problem. The problem came when the limitations of the function were ignored by people who were so eager to believe the results that it seemed to spit out that they did not bother to learn what the function actually did. Many individuals took what they thought to be a concrete theory, and used it to create a value narrative that ended up being dramatically wrong. Some of these individuals probably did it out of pure greed, but I would posit that most thought they were just much cleverer than those old fashioned folks who did not use the fancy formula. But they were ignoring one of the great pieces of wisdom from the older generation: understanding half of something can be far worse than having the wisdom to admit that you do not understand it at all.

4. This brings me to my conclusion. The Orioles are outperforming their Pythagorean record by a significant margin. Luck is a factor. That is to be celebrated, not bemoaned. But, is there something more? Is the Weibull distribution a fair representation of how the game is played out? Could it be that Buck?s bullpen management means that scoring and allowing runs are more related than one would initially think? If certain poorer pitchers are only used in certain situations, then the mathematical foundations of the proof are themselves flawed. I'm not sure. But if I have time I'm going to find out.

Ultimately, there are so many smart passionate people writing about the Orioles as fans of the team and baseball alike. I hope that:

a. We recognize that the Pythagorean record of a team is fundamentally important

b. It has its limitations

c. Using it as a tool to close off debate and hiding behind the handwaving of math is boring. Break out the figures, learn some statistics, and let's figure out how to enjoy the game even more.

Pythagoras, the man, is a bit of an apocryphal character in history. He is variously a mathematician, a religious leader, and a fictional combination of a few disparate characters. The mythology tells us that the Pythagorean neophytes were the akousmatikoi ("listeners"). This is absolutely essential; we need to listen to one another. But that is not enough. Just because some guy on SI told me that the Orioles were doomed to failure does not mean that I have to believe him. We should all aspire to be like the Pythagorean inner circle, the mathematikoi ("learners"). Listening without questioning does not get us anywhere. I have been out of the country for a very long time, but I get back in just a few weeks. I hope with all of my soul that I will witness my first meaningful Orioles game in fifteen years. It has been a joy to wake up every morning and read about win after win. It makes even my spreadsheets bearable.

Tl;dr: The Pythagorean record of a team is important, but accepting it as the end of the debate on what makes a team win and lose is as foolish as the subprime crisis.

Link to comment
Share on other sites

  • Replies 108
  • Created
  • Last Reply

The most important point in your post is that the theory behind the Pythagorean record is that runs scored and runs allowed in a game are independent functions, but perhaps they are not, due to how pitchers get used, depending on the score. That is a very interesting point. I've decided that I don't want to spend my time thinking too hard about how the Orioles got here or how likely it is that they will continue to play above their Pythagorean record, though. I'm just savoring the season, game by game.

Link to comment
Share on other sites

Bluedog, when you're looking to model something there are a few steps that one follows. First, you look for a correlation. Then, you consider if there are any factors one could control for (If a batter has RBIs, is he good? Well, if you control for the players around him, the more RBIs he has is a pretty good indication that he is a good hitter), and then you consider the big picture and give your alleged causal relationship a sense check. With Pythag analysis, it works: if you score more runs than your oponnent, you're probably better than them. Statisticians also use various tests to check the relationship between the two variables. A classic is the r^2 value:

http://en.wikipedia.org/wiki/Coefficient_of_determination

Frobby, that point is not original, I have read it here on the board and seen it elsewhere. But I think there is something to it all the same.

As for the parrot: City Forever.

Link to comment
Share on other sites

If I am not mistaken, the "test" period for the development of the pythagorean method was 162 games. Not partial seasons.

The contango seen here between the O's current record and predicted record *should* minimize as the rest of the season's games get played out.

And that could be because the O's run differential improves or because their record worsens. With this team, we don't know which one it will be. And I find that exciting.

Link to comment
Share on other sites

Great, great post.

I agree w should value statistical analysis. Likewise, we should recognize the hubris w which they are often misused. You make a great point I've tried to make several times here, much more concisely than I: Understanding half of something can be far more dangerous than admitting you don't understand it at all.

Regarding the pythag record, I think it is, obviously, a very useful tool. It is not perfect, but it's accuracy is pretty well defined and quite acceptable.

I don't think RS and RA are independent of each other, but I also think that's fairly negligible over 162 game season. For all the extra runs you might allow because you're so far ahead in a game, you'll get some back in a game you're so far behind. Hence, why we see very few true outliers from pythag records throughout the history of the game.

I do, however, want to make one point where I disagree w you in you're OP: Pythag record makes no claim to being predicative. It simply measures what has already happened. To use it beyond that, you're going beyond it's true intent.

BTW, good Greek.

Link to comment
Share on other sites

1. The Pythagorean expectation for a baseball team is the best easily accessible metric that baseball fans have available to them that allows them to accurately predict the future performance of a club most of the time..

Maybe your post got better but I stopped reading at point number 1. This statement is wrong.

Link to comment
Share on other sites

I don't see (too much) to argue about in this post, in the end, save for the idea that Pythag was being used predictively w/o significant caveats.

The portion that talks about bullpen management is fine - it's common knowledge that a good bullpen properly leveraged can help a team outpace its Pythag. This baffles me a bit:

If certain poorer pitchers are only used in certain situations, then the mathematical foundations of the proof are themselves flawed. I'm not sure. But if I have time I'm going to find out.

I have no reason to believe that this isn't something that all managers do. If that's the case, then the proof isn't flawed. If the answer is, well, the Orioles get blown out more, so it's exacerbated, then the response is: exactly. And teams that get blown out frequently tend to be pretty flawed teams.

Finally, to the extent that this post compares those who have used Pythag* for the heuristic that it is (heavily caveat'd and w/ limitations acknowledged) to those who bet the the firm in the subprime market it is both unfair and a bit wrong-headed. If anything, the Pythag was used as a check against inflation, to counsel against over-investment in short-term trades. Pythag has seved as a hedge against those who may be over-valuing an asset (the Orioles) based on W-L.

Also, who is doing this exactly?

c. Using it as a tool to close off debate and hiding behind the handwaving of math is boring. Break out the figures, learn some statistics, and let's figure out how to enjoy the game even more.

*This seems to be the case, but frankly the post isn't entirely clear about its subject moment-to-moment.

Link to comment
Share on other sites

Maybe your post got better but I stopped reading at point number 1. This statement is wrong.

I was going to say the same thing, but at a second-level I sort-of agree. If Pythag is a snapshot of true talent due to the difficulty of significant mid-season upgrades, from a probabilistic stand-point, I guess it makes some sense: most teams will probably continue to perform roughly as they have over a relatively large sample.

In retrospect, I'm revising above. This is probably as far as I'd go predictively: absent evidence of probable changes in the inputs (runs for/against), then a Pythag at 110 games simply tells you that your team is flawed, and to invest in it in the short-term with that in mind.

I largely agree with you, though. My other points were smaller-scale.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.


×
×
  • Create New...