Jump to content

HHP: Pythagorean record and the Gaussian Copula-Function: How I learned to stop worrying and love


cityknight

Recommended Posts

The Pythagorean record and the Gaussian Copula-Function: or, How I learned to stop worrying and love...the O's

I don't post, basically ever. But I feel compelled to comment on a common debate around here. What does the Pythagorean record mean to the 2012 Orioles?

I work with numbers. I would not say that I am good at statistics or math in general. I am not mathematically illiterate, but I am no PhD either. I may be completely wrong with this little spiel, and if I am, please tell me why so I may understand. But here it goes:

1. The Pythagorean expectation for a baseball team is the best easily accessible metric that baseball fans have available to them that allows them to accurately predict the future performance of a club most of the time. The Pythagorean expectation relies on a Weibull distribution for its mathematical derivation. This means that the 'proof' famously published by Professor Miller in 2006 makes the following assumption 'runs scored and runs allowed per game are statistically independent,' (stolen gladly from Wikipedia)

2. A common criticism of the record is that sometimes teams have 'pluck' or a 'great manager' who allows you to beat the statistical averages, thus the Pythagorean expectation is merely the hand-waving of nerds who don't understand 'real' baseball. I think this is a poor explanation. I'm a massive nerd who has loved baseball on the field, in the stands, and from my sofa since I was a young boy. I think that I understand real baseball. Don't dismiss the nerds.

3. However, the reverse applies as well: the hubris of mathematical 'proofs' is well documented. A rather apt comparison relates to the subprime mortgage crisis and what is commonly known as the Gaussian copula function. I'm not getting political here, so please read carefully. When the financiers were building the complicated mathematical equations and algorithms that allowed them to bundle up mortgages, repackage them and sell them later, one of the justifications that they used was based on a model built by a young man far removed from the buying and selling of homes. The 'Gaussian copula' was not a model or an equation that allowed one to predict the future, it was merely a good way to describe a relationship between disparate assets, like a bunch of houses in some part of the world. This was not the problem. The problem came when the limitations of the function were ignored by people who were so eager to believe the results that it seemed to spit out that they did not bother to learn what the function actually did. Many individuals took what they thought to be a concrete theory, and used it to create a value narrative that ended up being dramatically wrong. Some of these individuals probably did it out of pure greed, but I would posit that most thought they were just much cleverer than those old fashioned folks who did not use the fancy formula. But they were ignoring one of the great pieces of wisdom from the older generation: understanding half of something can be far worse than having the wisdom to admit that you do not understand it at all.

4. This brings me to my conclusion. The Orioles are outperforming their Pythagorean record by a significant margin. Luck is a factor. That is to be celebrated, not bemoaned. But, is there something more? Is the Weibull distribution a fair representation of how the game is played out? Could it be that Buck?s bullpen management means that scoring and allowing runs are more related than one would initially think? If certain poorer pitchers are only used in certain situations, then the mathematical foundations of the proof are themselves flawed. I'm not sure. But if I have time I'm going to find out.

Ultimately, there are so many smart passionate people writing about the Orioles as fans of the team and baseball alike. I hope that:

a. We recognize that the Pythagorean record of a team is fundamentally important

b. It has its limitations

c. Using it as a tool to close off debate and hiding behind the handwaving of math is boring. Break out the figures, learn some statistics, and let's figure out how to enjoy the game even more.

Pythagoras, the man, is a bit of an apocryphal character in history. He is variously a mathematician, a religious leader, and a fictional combination of a few disparate characters. The mythology tells us that the Pythagorean neophytes were the akousmatikoi ("listeners"). This is absolutely essential; we need to listen to one another. But that is not enough. Just because some guy on SI told me that the Orioles were doomed to failure does not mean that I have to believe him. We should all aspire to be like the Pythagorean inner circle, the mathematikoi ("learners"). Listening without questioning does not get us anywhere. I have been out of the country for a very long time, but I get back in just a few weeks. I hope with all of my soul that I will witness my first meaningful Orioles game in fifteen years. It has been a joy to wake up every morning and read about win after win. It makes even my spreadsheets bearable.

Tl;dr: The Pythagorean record of a team is important, but accepting it as the end of the debate on what makes a team win and lose is as foolish as the subprime crisis.

Link to comment
Share on other sites

  • Replies 108
  • Created
  • Last Reply

The most important point in your post is that the theory behind the Pythagorean record is that runs scored and runs allowed in a game are independent functions, but perhaps they are not, due to how pitchers get used, depending on the score. That is a very interesting point. I've decided that I don't want to spend my time thinking too hard about how the Orioles got here or how likely it is that they will continue to play above their Pythagorean record, though. I'm just savoring the season, game by game.

Link to comment
Share on other sites

Bluedog, when you're looking to model something there are a few steps that one follows. First, you look for a correlation. Then, you consider if there are any factors one could control for (If a batter has RBIs, is he good? Well, if you control for the players around him, the more RBIs he has is a pretty good indication that he is a good hitter), and then you consider the big picture and give your alleged causal relationship a sense check. With Pythag analysis, it works: if you score more runs than your oponnent, you're probably better than them. Statisticians also use various tests to check the relationship between the two variables. A classic is the r^2 value:

http://en.wikipedia.org/wiki/Coefficient_of_determination

Frobby, that point is not original, I have read it here on the board and seen it elsewhere. But I think there is something to it all the same.

As for the parrot: City Forever.

Link to comment
Share on other sites

If I am not mistaken, the "test" period for the development of the pythagorean method was 162 games. Not partial seasons.

The contango seen here between the O's current record and predicted record *should* minimize as the rest of the season's games get played out.

And that could be because the O's run differential improves or because their record worsens. With this team, we don't know which one it will be. And I find that exciting.

Link to comment
Share on other sites

Great, great post.

I agree w should value statistical analysis. Likewise, we should recognize the hubris w which they are often misused. You make a great point I've tried to make several times here, much more concisely than I: Understanding half of something can be far more dangerous than admitting you don't understand it at all.

Regarding the pythag record, I think it is, obviously, a very useful tool. It is not perfect, but it's accuracy is pretty well defined and quite acceptable.

I don't think RS and RA are independent of each other, but I also think that's fairly negligible over 162 game season. For all the extra runs you might allow because you're so far ahead in a game, you'll get some back in a game you're so far behind. Hence, why we see very few true outliers from pythag records throughout the history of the game.

I do, however, want to make one point where I disagree w you in you're OP: Pythag record makes no claim to being predicative. It simply measures what has already happened. To use it beyond that, you're going beyond it's true intent.

BTW, good Greek.

Link to comment
Share on other sites

1. The Pythagorean expectation for a baseball team is the best easily accessible metric that baseball fans have available to them that allows them to accurately predict the future performance of a club most of the time..

Maybe your post got better but I stopped reading at point number 1. This statement is wrong.

Link to comment
Share on other sites

I don't see (too much) to argue about in this post, in the end, save for the idea that Pythag was being used predictively w/o significant caveats.

The portion that talks about bullpen management is fine - it's common knowledge that a good bullpen properly leveraged can help a team outpace its Pythag. This baffles me a bit:

If certain poorer pitchers are only used in certain situations, then the mathematical foundations of the proof are themselves flawed. I'm not sure. But if I have time I'm going to find out.

I have no reason to believe that this isn't something that all managers do. If that's the case, then the proof isn't flawed. If the answer is, well, the Orioles get blown out more, so it's exacerbated, then the response is: exactly. And teams that get blown out frequently tend to be pretty flawed teams.

Finally, to the extent that this post compares those who have used Pythag* for the heuristic that it is (heavily caveat'd and w/ limitations acknowledged) to those who bet the the firm in the subprime market it is both unfair and a bit wrong-headed. If anything, the Pythag was used as a check against inflation, to counsel against over-investment in short-term trades. Pythag has seved as a hedge against those who may be over-valuing an asset (the Orioles) based on W-L.

Also, who is doing this exactly?

c. Using it as a tool to close off debate and hiding behind the handwaving of math is boring. Break out the figures, learn some statistics, and let's figure out how to enjoy the game even more.

*This seems to be the case, but frankly the post isn't entirely clear about its subject moment-to-moment.

Link to comment
Share on other sites

Maybe your post got better but I stopped reading at point number 1. This statement is wrong.

I was going to say the same thing, but at a second-level I sort-of agree. If Pythag is a snapshot of true talent due to the difficulty of significant mid-season upgrades, from a probabilistic stand-point, I guess it makes some sense: most teams will probably continue to perform roughly as they have over a relatively large sample.

In retrospect, I'm revising above. This is probably as far as I'd go predictively: absent evidence of probable changes in the inputs (runs for/against), then a Pythag at 110 games simply tells you that your team is flawed, and to invest in it in the short-term with that in mind.

I largely agree with you, though. My other points were smaller-scale.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.




  • Posts

    • Everything revolves around the health of the player. I think Gunnar has more. I think the collision with Mateo set Gunnar back and affected him in ways we will never know. I'm no mind specialist but Gunnar is young plays all out, and that had to bother him. I've watched that collision a number of times. No fault - just two players going all out and one is finished for the year. It just so happened that Gunnar got the yips and his batting went south soon thereafter. Maybe a coincidence but I think we will see a rejuvenated Gunnar next year and all stops are off. 
    • I’m not so sure the bolded part is true. I think a lot of that last bit can have to do with small skills: situational hitting/running, above average play in close games, generally things that can be boiled down to “luck.” I didn’t see this years team as having a major talent discrepancy from the 2023 version.
    • As great as Gunnar is can’t assume he matches last year. That said I like the odds of the team as a whole matching what we did. 
    • The real improvement of this team will come from within.    The 3-5 players they bring in from outside the org will supplement the roster…maybe put it over the top but the real improvement will come from those already in the org.
    • Yeah. -Would love to keep Burnes but I seriously doubt it. -I have a lot of faith in Adley.  - Holliday has huge ceiling even if he isn’t ready to be elite.  - doesn’t always work this way but the better your closer is tends to help rest of pen 
    • Nice OP. Thanks for the effort. Like the chart. Surprised it hasn't received much response. You sum up a lot of what I hope for as well. I'd add: I think a full - healty year of Westy will be even more valuable. I think Gunnar has even more in the tank. I want - hope that Holliday can develop into the lead off hitter and OBP table setter we need. And, I so want Cowser to cut down on strike outs and continue to develop as a professional hitter. I think he has the potential to cover for the loss of Santander while Big K develops on the right side. A lot to hope for but I believe these youngsters have a lot of potential yet to tap. And oh yes - I want Mayo to make Roy and all of us proud! Thanks again for the effort! I look at pitching as if we have a base. I agree with your points 1 and 2.
    • Postgame Pedro Martinez and Dusty Baker critiquing Clase tripling up on his weaker pitch before the Carpenter heroics. 78% cutters on the season for Clase - the key PA went cutter-cutter-cutter-slider-slider-slider.
  • Popular Contributors

×
×
  • Create New...