Jump to content

HHP: Pythagorean record and the Gaussian Copula-Function: How I learned to stop worrying and love


cityknight

Recommended Posts

Great post. I am aroused.

Also, Point One is basically correct. Pythagorean is the best of a few commonly available evils. I'd prefer a projection system like ZIPS or something, but I can't calculate that on the back of a napkin. Pyth is certainly more predictive than current record.

Link to comment
Share on other sites

  • Replies 108
  • Created
  • Last Reply
Great post. I am aroused.

Also, Point One is basically correct. Pythagorean is the best of a few commonly available evils. I'd prefer a projection system like ZIPS or something, but I can't calculate that on the back of a napkin. Pyth is certainly more predictive than current record.

This might be more accurate.

It's still not designed as a predicative tool, but, I understand what you're saying.

Link to comment
Share on other sites

Great post. I am aroused.

Also, Point One is basically correct. Pythagorean is the best of a few commonly available evils. I'd prefer a projection system like ZIPS or something, but I can't calculate that on the back of a napkin. Pyth is certainly more predictive than current record.

Predictive of what, exactly, I guess is the question?

BTW, what arouses you about the post? [To be clear, I ask this to generate discussion, not to question its value.]

I'm still struggling with the metaphor re: the subprime market, which - the more I think about it - seems just wholly, wildly incorrect.

Link to comment
Share on other sites

Great post. I am aroused.

Also, Point One is basically correct. Pythagorean is the best of a few commonly available evils. I'd prefer a projection system like ZIPS or something, but I can't calculate that on the back of a napkin. Pyth is certainly more predictive than current record.

This might be more accurate.

It's still not designed as a predicative tool, but, I understand what you're saying.

Link to comment
Share on other sites

Predictive of what, exactly, I guess is the question?

BTW, what arouses you about the post? [To be clear, I ask this to generate discussion, not to question its value.]

I'm still struggling with the metaphor re: the subprime market, which - the more I think about it - seems just wholly, wildly incorrect.

Predictive of your winning percentage for the remainder of the season.

And I'm aroused by things, however flawed, that suggest an intelligence at work.

The subprime market thing seems like a slightly unnecessary and long but not awful allegory about the dangers of misusing a statistic you don't understand. But I don't really know anything about that.

Link to comment
Share on other sites

I'm still struggling with the metaphor re: the subprime market, which - the more I think about it - seems just wholly, wildly incorrect.

In terms of what you're saying about the Pythag, I completely agree with you. I am saying that it should be used, but qualified as something interesting but not deterministic.

As for the metaphor:

Some background:

http://www.wired.com/techbiz/it/magazine/17-03/wp_quant?currentPage=all

People used the equation that was designed in a very limited sense, ignored all of the disclaimers that the author wrote about its predictive powers, and then pointed to rather opaque math to justify what they were doing. It is like when you speak with a lawyer and they say, "well actually mens rea parri passu ad infinatum". They can use their latinate phrases to cultivate ethos in lieu of logic.

Financiers abused the formula in the same fashion as a justification for what they wanted to say, but without really explaining why they were doing what they were doing. "It's math therefore I'm right".

So when I read things like,

"While the Orioles' run-differential suggests they're more of a pretender than a contender, the Rays may finally be a challenger capable of overtaking the Yankees (at 11 games behind and with an under-.500 record, the Red Sox are too far back at the moment to be considered a serious challenger)."

Read more: http://sportsillustrated.cnn.com/2012/baseball/mlb/08/13/fangraphs-power-rankings-week-18/index.html#ixzz23XZUaTfM

I wish that the author was a bit more nuanced and wrote things like,

"Even so, while run differential is predictive, it is not destiny, and the Birds? ability to even maintain a competitive front given the imbalance between their runs scored and runs allowed is one of those fascinating anomalies that bears watching"

http://mlb.si.com/2012/08/06/baltimore-orioles-postseason-drea/

I do not argue that the pythag should be treated as some sort of predictive algorithm, but with that said, it correlates with the odds of you winning on any given day better than previous W-L record. I just hope that because Pythag is relatively intuitive, people don't stop digging deeper. Perhaps we can figure out how the O's run production exhibits some sort of superior variance that leads to more wins than one would predict. Who knows. All I'm saying is that it might be fun to find out, but in the mean time, I'm going to enjoy the ride the O's are taking us on and brashly declare on internet forums that naysayers can plant a fat one on my cyberbehind.

Link to comment
Share on other sites

Predictive of your winning percentage for the remainder of the season.

And I'm aroused by things, however flawed, that suggest an intelligence at work.

The subprime market thing seems like a slightly unnecessary and long but not awful allegory about the dangers of misusing a statistic you don't understand. But I don't really know anything about that.

Forcing metaphors is a sanity-seeking-coping-mechanism: apologies.

Link to comment
Share on other sites

The Pythagorean record and the Gaussian Copula-Function: or, How I learned to stop worrying and love...the O's

I don't post, basically ever. But I feel compelled to comment on a common debate around here. What does the Pythagorean record mean to the 2012 Orioles?

I work with numbers. I would not say that I am good at statistics or math in general. I am not mathematically illiterate, but I am no PhD either. I may be completely wrong with this little spiel, and if I am, please tell me why so I may understand. But here it goes:

1. The Pythagorean expectation for a baseball team is the best easily accessible metric that baseball fans have available to them that allows them to accurately predict the future performance of a club most of the time. The Pythagorean expectation relies on a Weibull distribution for its mathematical derivation. This means that the 'proof' famously published by Professor Miller in 2006 makes the following assumption 'runs scored and runs allowed per game are statistically independent,' (stolen gladly from Wikipedia)

2. A common criticism of the record is that sometimes teams have 'pluck' or a 'great manager' who allows you to beat the statistical averages, thus the Pythagorean expectation is merely the hand-waving of nerds who don't understand 'real' baseball. I think this is a poor explanation. I'm a massive nerd who has loved baseball on the field, in the stands, and from my sofa since I was a young boy. I think that I understand real baseball. Don't dismiss the nerds.

3. However, the reverse applies as well: the hubris of mathematical 'proofs' is well documented. A rather apt comparison relates to the subprime mortgage crisis and what is commonly known as the Gaussian copula function. I'm not getting political here, so please read carefully. When the financiers were building the complicated mathematical equations and algorithms that allowed them to bundle up mortgages, repackage them and sell them later, one of the justifications that they used was based on a model built by a young man far removed from the buying and selling of homes. The 'Gaussian copula' was not a model or an equation that allowed one to predict the future, it was merely a good way to describe a relationship between disparate assets, like a bunch of houses in some part of the world. This was not the problem. The problem came when the limitations of the function were ignored by people who were so eager to believe the results that it seemed to spit out that they did not bother to learn what the function actually did. Many individuals took what they thought to be a concrete theory, and used it to create a value narrative that ended up being dramatically wrong. Some of these individuals probably did it out of pure greed, but I would posit that most thought they were just much cleverer than those old fashioned folks who did not use the fancy formula. But they were ignoring one of the great pieces of wisdom from the older generation: understanding half of something can be far worse than having the wisdom to admit that you do not understand it at all.

4. This brings me to my conclusion. The Orioles are outperforming their Pythagorean record by a significant margin. Luck is a factor. That is to be celebrated, not bemoaned. But, is there something more? Is the Weibull distribution a fair representation of how the game is played out? Could it be that Buck?s bullpen management means that scoring and allowing runs are more related than one would initially think? If certain poorer pitchers are only used in certain situations, then the mathematical foundations of the proof are themselves flawed. I'm not sure. But if I have time I'm going to find out.

Ultimately, there are so many smart passionate people writing about the Orioles as fans of the team and baseball alike. I hope that:

a. We recognize that the Pythagorean record of a team is fundamentally important

b. It has its limitations

c. Using it as a tool to close off debate and hiding behind the hand-waving of math is boring. Break out the figures, learn some statistics, and let's figure out how to enjoy the game even more.

Pythagoras, the man, is a bit of an apocryphal character in history. He is variously a mathematician, a religious leader, and a fictional combination of a few disparate characters. The mythology tells us that the Pythagorean neophytes were the akousmatikoi ("listeners"). This is absolutely essential; we need to listen to one another. But that is not enough. Just because some guy on SI told me that the Orioles were doomed to failure does not mean that I have to believe him. We should all aspire to be like the Pythagorean inner circle, the mathematikoi ("learners"). Listening without questioning does not get us anywhere. I have been out of the country for a very long time, but I get back in just a few weeks. I hope with all of my soul that I will witness my first meaningful Orioles game in fifteen years. It has been a joy to wake up every morning and read about win after win. It makes even my spreadsheets bearable.

Tl;dr: The Pythagorean record of a team is important, but accepting it as the end of the debate on what makes a team win and lose is as foolish as the sub-prime crisis.

You are Roger Bannister, aren't you ???

Link to comment
Share on other sites

Predictive of your winning percentage for the remainder of the season.

And I'm aroused by things, however flawed, that suggest an intelligence at work.

The subprime market thing seems like a slightly unnecessary and long but not awful allegory about the dangers of misusing a statistic you don't understand. But I don't really know anything about that.

Except it gets it precisely backwards. Pythag wasn't being used as impetus for anything - it was being used for the very limited purpose of providing some kind of insight into a flawed index and the (potentially) inflated value of a commodity (the Orioles). It stands as an example of how a statistic can urge caution when superficial values suggest "going all in." In fact, there were those who took it so far as to support selling short. In other words, it's the exact opposite of substituting an equation for reality - which, in itself, seems overly-simplified re: subprime issues. Those who are suggesting that DD has come up w/ a team that can "outthink" basic rules of run differential are more generally the ones engaging in a kind of hubristic faith in subprime-like alchemy. [Now, maybe I've got the OP's analysis backward, and this is what he's saying. But it sure doesn't seem that way.]

In other words, Pythag stands as a simple way of double-checking what appears to be "reality," and that's all anyone used it for.

Which is what baffles me about this whole discussion. No one is saying that Pythag is more "real" than our actual W-L. Our W-L is (first, awesome, for a change; second) the absolute fact of were we stand at this juncture. But in outlining the range of possible future outcomes (some of which are more probable than others), Pythag counsels against over-confidence (and thus over-investment).

In other words, those who are looking at Pythag as a valid (but not all-defining) input are a lot closer to Nouriel Roubini than those who are doing logical and mathematical gymnastics to evade it.

Maybe mags like SI took it too far, but people on this board (save for the short-sellers) didn't.

Link to comment
Share on other sites

In terms of what you're saying about the Pythag, I completely agree with you. I am saying that it should be used, but qualified as something interesting but not deterministic.

As for the metaphor:

Some background:

http://www.wired.com/techbiz/it/magazine/17-03/wp_quant?currentPage=all

People used the equation that was designed in a very limited sense, ignored all of the disclaimers that the author wrote about its predictive powers, and then pointed to rather opaque math to justify what they were doing. It is like when you speak with a lawyer and they say, "well actually mens rea parri passu ad infinatum". They can use their latinate phrases to cultivate ethos in lieu of logic.

Financiers abused the formula in the same fashion as a justification for what they wanted to say, but without really explaining why they were doing what they were doing. "It's math therefore I'm right".

So when I read things like,

"While the Orioles' run-differential suggests they're more of a pretender than a contender, the Rays may finally be a challenger capable of overtaking the Yankees (at 11 games behind and with an under-.500 record, the Red Sox are too far back at the moment to be considered a serious challenger)."

Read more: http://sportsillustrated.cnn.com/2012/baseball/mlb/08/13/fangraphs-power-rankings-week-18/index.html#ixzz23XZUaTfM

I wish that the author was a bit more nuanced and wrote things like,

"Even so, while run differential is predictive, it is not destiny, and the Birds? ability to even maintain a competitive front given the imbalance between their runs scored and runs allowed is one of those fascinating anomalies that bears watching"

http://mlb.si.com/2012/08/06/baltimore-orioles-postseason-drea/

I do not argue that the pythag should be treated as some sort of predictive algorithm, but with that said, it correlates with the odds of you winning on any given day better than previous W-L record. I just hope that because Pythag is relatively intuitive, people don't stop digging deeper. Perhaps we can figure out how the O's run production exhibits some sort of superior variance that leads to more wins than one would predict. Who knows. All I'm saying is that it might be fun to find out, but in the mean time, I'm going to enjoy the ride the O's are taking us on and brashly declare on internet forums that naysayers can plant a fat one on my cyberbehind.

I agree with this 100% CK, so sorry if it felt I was pressing to hard on your analysis. See my above comparison of the subprime issue with Pythag, which I hope illuminates, rather than defeats, what you're trying to do.

And I agree that this kind of thing is used in a dressy, pseudo-intellectual way by a lot of sportswriters out there. I was filtering it through many, many long discussions on this board, which (in hindsight, given your clarification) probably was unnecessary. That said, I hope the above posts spark something of interest.

Well, I agree until this:

ll I'm saying is that it might be fun to find out, but in the mean time, I'm going to enjoy the ride the O's are taking us on and brashly declare on internet forums that naysayers can plant a fat one on my cyberbehind.

Too many of us get painted as "naysayers" because we try to be objective. We probably get a bit sensitive because it gets a little old. I watch nearly every game. I yell at my television. I stand up cheering. I post on here. Clearly I care, if anything, too much. I'm not a naysayer, but there's fun to be had in trying to suss out how the team is doing what it's doing, and whether it's sustainable (and even thinking about FO strategies based on that).

Link to comment
Share on other sites

Except it gets it precisely backwards. Pythag wasn't being used as impetus for anything - it was being used for the very limited purpose of providing some kind of insight into a flawed index and the (potentially) inflated value of a commodity (the Orioles). It stands as an example of how a statistic can urge caution when superficial values suggest "going all in." In fact, there were those who took so far as to support selling short. In other words, it's the exact opposite of substituting an equation for reality - which, in itself, seems overly-simplified re: subprime issues. Those who are suggesting that DD has come up w/ a team that can "outthink" basic rules of run differential are more generally the ones engaging in a kind of hubristic faith in subprime-like alchemy. [Now, maybe I've got the OP's analysis backward, and this is what he's saying. But it sure doesn't seem that way.]

In other words, Pythag stands as a simple way of double-checking what appears to be "reality," and that's all anyone used it for.

Which is what baffles me about this whole discussion. No one is saying that Pythag is more "real" than our actual W-L. Our W-L is (first, awesome, for a change; second) the absolute fact of were we stand at this juncture. But it outlining the range of possible future outcomes (some of which are more probable than others), Pythag counsels against over-confidence (and thus over-investment).

In other words, those who are looking at Pythag as a valid (but not all-defining) input are a lot closer to Nouriel Roubini than those who are doing logical and mathematical gymnastics to evade it.

Maybe mags like SI took it too far, but people on this board (save for the short-sellers) didn't.

I have no strong opinions on any of this. :P

Link to comment
Share on other sites

Too many of us get painted as "naysayers" because we try to be objective. We probably get a bit sensitive because it gets a little old. I watch nearly every game. I yell at my television. I stand up cheering. I post on here. Clearly I care, if anything, too much. I'm not a naysayer, but there's fun to be had in trying to suss out how the team is doing what it's doing, and whether it's sustainable (and even thinking about FO strategies based on that).

Debating with you isn't fair if you're going to be magnanimous and articulate (and correct). No, my position is not directed at you, or those who are trying to be sensible about the Orioles' future. It directed at people I'll never meet because being contemptuous of their disdain for the Orioles is deeply satisfying every time we put another one in the WIN column.

Link to comment
Share on other sites

Debating with you isn't fair if you're going to be magnanimous and articulate (and correct). No, my position is not directed at you, or those who are trying to be sensible about the Orioles' future. It directed at people I'll never meet because being contemptuous of their disdain for the Orioles is deeply satisfying every time we put another one in the WIN column.

I have a date w/ a Red Sox fan this weekend. Though, to be fair, she disdains Red Sox nation and the class-ascent of the Sox over the last few years. On the flip side, she didn't realize that O's fans loathe the Sox. It felt like Duke chanting "Not our rivals" at me all over again. ;)

I get it. I truly do.

Link to comment
Share on other sites

Except it gets it precisely backwards. Pythag wasn't being used as impetus for anything - it was being used for the very limited purpose of providing some kind of insight into a flawed index and the (potentially) inflated value of a commodity (the Orioles). It stands as an example of how a statistic can urge caution when superficial values suggest "going all in." In fact, there were those who took it so far as to support selling short. In other words, it's the exact opposite of substituting an equation for reality - which, in itself, seems overly-simplified re: subprime issues. Those who are suggesting that DD has come up w/ a team that can "outthink" basic rules of run differential are more generally the ones engaging in a kind of hubristic faith in subprime-like alchemy. [Now, maybe I've got the OP's analysis backward, and this is what he's saying. But it sure doesn't seem that way.]

In other words, Pythag stands as a simple way of double-checking what appears to be "reality," and that's all anyone used it for.

Which is what baffles me about this whole discussion. No one is saying that Pythag is more "real" than our actual W-L. Our W-L is (first, awesome, for a change; second) the absolute fact of were we stand at this juncture. But in outlining the range of possible future outcomes (some of which are more probable than others), Pythag counsels against over-confidence (and thus over-investment).

In other words, those who are looking at Pythag as a valid (but not all-defining) input are a lot closer to Nouriel Roubini than those who are doing logical and mathematical gymnastics to evade it.

Maybe mags like SI took it too far, but people on this board (save for the short-sellers) didn't.

What Jim said. :D

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.


×
×
  • Create New...