Jump to content

OBP....and its importance


Sports Guy

Recommended Posts

rshack,

Interesting philosophical question raised by this discussion. Does causation trump correlation? Your stance suggests it does. I'm not so sure. I think maybe demonstrated causation is just a more tangible form of correlation. If you have mathematical proof that OBP is 1.8 times as important as SLG in contributing to runs, you have a causal relationship. Any yet the correlation to runs is still equal. You can say the correlation is simply coincidental, or circumstantial, like the correlation with 5-10 in height. I would say that if, say, an increase in batters' height correlates strongly with runs produced, I would take seriously the recruitment of taller players even if I could not yet discover a causal relationship. Correlation might come about because of hidden factors of causation.

The analogy of walking and driving miles is an elegant way of illustrating your point. It also shows how we are talking about different issues here. One issue is, what are the most accurate weights of OBP and SLG in a formula that approximates RC?

- one answer (1.8:1) yields what is called OPS'

- one answer (1:1) yields what is called OPS

- yet, I believe (correct me if I'm wrong) that OPS' and OPS are actually closer to 1:1 when compared with each other, than 1:8 to 1.

The analogy of the miles breaks down when you consider the correlation chart as a whole, where different hitting metrics do not all share the same correlation with runs produced. Clearly homers contribute more to runs than walks - and this is reflected by higher correlation. Clearly SLG contributes more than homers alone - and this also is shown by higher correlation. Clearly OBP+SLG contributes, as well as correlates, more strongly to run production than either of those measures alone. The whole point of the chart is to show, by strength of correlation, which factors contribute most to run production. (And note, this is total runs scored, not the RC formula).

Correlation is not about how much one thing contributes to the other thing. It is about whether the relationship between them, regardless of what that relationship is, varies enough to be just about chance (an illusion) vs. is something that really exists.

The weight-factor about OBP vs. SLG is a measure of the value-relationship. It's a statement about how much value they each have for production.

The correlation is not about that, it's about how much confidence we have that the relationship is real, how tightly coupled the two things are, whatever the relationship between them might be, regardless of what the value-factors might be. It is about how much the relationship exists and how tight it is, not about what the details of the value-relationship are.

Link to comment
Share on other sites

  • Replies 115
  • Created
  • Last Reply

a slow day on the farm here... ;)

Obviously high correlation (pirates and global temps.) does not always mean a relationship is real. Some relationships are merely coincidental; some are meaningful. I would suggest that in the realm of batting metrics, they are meaningful. After all, the OPS formula itself became popular not simply because of the ease of calculation, but because of its high correlation with RC.

Link to comment
Share on other sites

I don't have time to look into it right now, but my first thoughts are that it's either wrong or being misinterpreted.

I believe it is being misinterpreted.

I am not even sure how the author of the article broke down SLG% into team runs.

Can someone explain to me how he did that?

Just looking at last year. The team's SLG% and R/G weren't anywhere near correlated.

Link to comment
Share on other sites

If you're talking about graph of the correlation between various stats and runs scored, it's not about SLG being broken down into how many runs it produces. It's about mapping various SLG percentages of the different teams from 2000-2004 and putting them on one axis, then putting the Runs Scored by those teams on the other axis, plotting the points, creating a graph of that information, then trying to create an equation that best fits the data. The correlation coefficient is pretty much a % of how closely the data fits the equation.

However, just because something has a high correlation, that doesn't mean there's a cause and effect. For example, a high SLG doesn't necessarily PRODUCE the runs scored, but a team that produces a lot of runs probably DOES have a lot of high SLG guys.

Math stuffs: http://en.wikipedia.org/wiki/Correlation_and_dependence

Link to comment
Share on other sites

I believe it is being misinterpreted.

I am not even sure how the author of the article broke down SLG% into team runs.

Can someone explain to me how he did that?

Just looking at last year. The team's SLG% and R/G weren't anywhere near correlated.

It's being misinterpreted. When I looked at his graph, I misinterpreted it too. Then I had to read what he did to figure out what's what.

The guy took 5 years of team-seasons and looked at O-stats for each team as a whole. So, that's 5*30=150 data samples. He did not look at hitters, just at summary team data. So, whatever you might wanna know about hitters based on 5 years of data is not coming out of what he did, because he lumped each team's hitters into one pile of team-numbers, which makes all the hitter-stuff disappear into lump-sum data per team-season. The whole thing is about teams as a whole, not about what individual hitters did.

He then told Excel (or something) to figure out what the correlation was between team-runs and umpteen flavors of team-O-stats. He found that the correlation between team-OBP and team-runs was just about the same as between team-SLG and team-runs. Both correlations were stronger than the one for team-HR and team-runs, and for team-AVG and team-runs, and neither was as strong as the correlation between team-OPS and team-runs. But none of that is what he made a big deal about. He seemed to think it was a big deal that the correlation between team-OPS and team-runs was almost as good as the correlation between team-runs and fancier-team-stats. Not sure why that's a big deal, but then I don't care much about summary team-stats. (It's fine with me if other folks do, but I don't.)

The problem is that, along the way, he made a dang graph that shows how well each of the umpteen flavors of team-stats correlate with team-runs. That thing shows almost identical correlations between team-runs and both OBP and SLG. Because of that, some folks are getting the wrong idea, as if that means that OBP and SLG have the same value in production, as if the 1.8 factor (or whatever) between OBP and SLG just went poof into thin air, when it doesn't mean that at all. All it means is that the correlation is the same for those two things for teams as a whole, not that their production value is the same. It says that team-OBP and team-SLG have about as much of real relationship with team-runs as the other one does, but it makes no statement of any kind about exactly what that relationship actually is for OBP vs. SLG (which is the 1.8 part).

I imagine that people are gonna be pulling up that dang graph for years to come, and erroneously claiming that it shows OBP and SLG to have the same production value. I wish the guy had not made the stupid graph, because then none of this would have happened. Had he just said it in the article, nobody would notice, but since he made a picture, people look at the picture and get the wrong idea. The guy should have realized this would happen and put in a disclaimer about what the dang graph was *not* saying, but he didn't. Either that or they should make people pass a test before they can read the Hardball Times, because this is the kind of stat-distinction that almost everybody is not gonna get right away. What makes it worse is that he called the article "Run Estimation for the Masses", as if he's being clever, but what he's mostly doing is confusing everybody for no good reason.

Link to comment
Share on other sites

It's being misinterpreted. When I looked at his graph, I misinterpreted it too. Then I had to read what he did to figure out what's what.

The guy took 5 years of team-seasons and looked at O-stats for each team as a whole. So, that's 5*30=150 data samples. He did not look at hitters, just at summary team data. So, whatever you might wanna know about hitters based on 5 years of data is not coming out of what he did, because he lumped each team's hitters into one pile of team-numbers, which makes all the hitter-stuff disappear into lump-sum data per team-season. The whole thing is about teams as a whole, not about what individual hitters did.

He then told Excel (or something) to figure out what the correlation was between team-runs and umpteen flavors of team-O-stats. He found that the correlation between team-OBP and team-runs was just about the same as between team-SLG and team-runs. Both correlations were stronger than the one for team-HR and team-runs, and for team-AVG and team-runs, and neither was as strong as the correlation between team-OPS and team-runs. But none of that is what he made a big deal about. He seemed to think it was a big deal that the correlation between team-OPS and team-runs was almost as good as the correlation between team-runs and fancier-team-stats. Not sure why that's a big deal, but then I don't care much about summary team-stats. (It's fine with me if other folks do, but I don't.)

The problem is that, along the way, he made a dang graph that shows how well each of the umpteen flavors of team-stats correlate with team-runs. That thing shows almost identical correlations between team-runs and both OBP and SLG. Because of that, some folks are getting the wrong idea, as if that means that OBP and SLG have the same value in production, as if the 1.8 factor (or whatever) between OBP and SLG just went poof into thin air, when it doesn't mean that at all. All it means is that the correlation is the same for those two things for teams as a whole, not that their production value is the same. It says that team-OBP and team-SLG have about as much of real relationship with team-runs as the other one does, but it makes no statement of any kind about exactly what that relationship actually is for OBP vs. SLG (which is the 1.8 part).

I imagine that people are gonna be pulling up that dang graph for years to come, and erroneously claiming that it shows OBP and SLG to have the same production value. I wish the guy had not made the stupid graph, because then none of this would have happened. Had he just said it in the article, nobody would notice, but since he made a picture, people look at the picture and get the wrong idea. The guy should have realized this would happen and put in a disclaimer about what the dang graph was *not* saying, but he didn't. Either that or they should make people pass a test before they can read the Hardball Times, because this is the kind of stat-distinction that almost everybody is not gonna get right away. What makes it worse is that he called the article "Run Estimation for the Masses", as if he's being clever, but what he's mostly doing is confusing everybody for no good reason.

My question is how he is getting his data. Is he looking at a SLG% multiply it by some random number and then getting how many runs the team should have scored. Then he compares that number to the actual runs they scored.

(example: NYY scored 915 runs and had a .478 SLG. So is it like 1900 x SLG = expected team runs?)

Or is he just comparing the rankings in SLG to the rankings in Runs scored?

Link to comment
Share on other sites

If you're talking about graph of the correlation between various stats and runs scored, it's not about SLG being broken down into how many runs it produces. It's about mapping various SLG percentages of the different teams from 2000-2004 and putting them on one axis, then putting the Runs Scored by those teams on the other axis, plotting the points, creating a graph of that information, then trying to create an equation that best fits the data. The correlation coefficient is pretty much a % of how closely the data fits the equation.

However, just because something has a high correlation, that doesn't mean there's a cause and effect. For example, a high SLG doesn't necessarily PRODUCE the runs scored, but a team that produces a lot of runs probably DOES have a lot of high SLG guys.

Math stuffs: http://en.wikipedia.org/wiki/Correlation_and_dependence

Ok thank you very much I should have read this post before I posted again.

Link to comment
Share on other sites

If you're talking about graph of the correlation between various stats and runs scored, it's not about SLG being broken down into how many runs it produces. It's about mapping various SLG percentages of the different teams from 2000-2004 and putting them on one axis, then putting the Runs Scored by those teams on the other axis, plotting the points, creating a graph of that information, then trying to create an equation that best fits the data. The correlation coefficient is pretty much a % of how closely the data fits the equation.

Right, and that's true regardless of what the details of the equation might be. The proper weighting for value might be a multiplicative factor bigger than 1 for OBP compared to SLG. They can each have different factors about how much they contribute to runs. Correlation doesn't know or care about any of that, correlation just cares about how right the equation is, whatever it is. How right it is can be seen visually by looking at the graph of the plotted points and seeing how near all the plotted points are to forming a straight line in the graph. The correlation coefficient is a numerical representation of how close to a straight line the plotted points are as a group. If you drew a line connecting each of the plotted points, and if that line formed a perfectly straight line, then it would be a perfect correlation of 1.0 for a linear equation. When we see a correlation coefficient of .9, it means the group of plotted points stay pretty close to an imaginary straight line, but are not exactly on it. The lower the correlation coefficient, the more the plotted points are scattered away from forming a straight line.

In the guy's article, he show two different scatter graphs that show the plotted points and an imaginary straight line. One is for OPS, the other is for XR. Since his artcile is not about the OBP/SLG thing, he did not show the two scatter graphs for them. If he had, judging from the correlation numbers in his bar graph, those two scatter graphs would be similar to the ones we see, except that they would have slightly more scatter away from forming a straight line. While the plotted point divergence from the straight line would be similar to each other, I'm guessing the scale on the X-axis, and probably the angle of the straight line would differ between the two.

Link to comment
Share on other sites

Wikipedia's page also shows an example of 4 different graphs with the same correlation (0.816), but they have vastly different plot points. This would illustrate rshack's point about how you can't just look at the correlation coefficient without also looking at the graph for a point of reference.

http://upload.wikimedia.org/wikipedia/commons/b/b6/Anscombe.svg

As just a side note, the 2nd graph would get a much better correlation from using an exponential (parabolic) equation, if I'm remembering my maths stuff right.

Link to comment
Share on other sites

Wikipedia's page also shows an example of 4 different graphs with the same correlation (0.816), but they have vastly different plot points. This would illustrate rshack's point about how you can't just look at the correlation coefficient without also looking at the graph for a point of reference.

http://upload.wikimedia.org/wikipedia/commons/b/b6/Anscombe.svg

As just a side note, the 2nd graph would get a much better correlation from using an exponential (parabolic) equation, if I'm remembering my maths stuff right.

Right. When I referred to forming a straight line, I was assuming we're talking about just linear equations. Whether it's linear or exponential is something you can see visually from looking at the shape of the line that the scatter-plot (almost) forms...

ps: If the guy had not made his dang bar graph, we wouldn't have to worry about this kind of junk ;-)

Link to comment
Share on other sites

Honestly, I think both are true. Hitters who are more feared usually will draw a lot of walks, but some guys just draw a lot of walks because they are disciplined.

I'm betting that the non-power hitters who draw lots of walks have very quick bats. You have to be a very good two-strike hitter to have the chops to take enough pitches to walk ... that is, assuming that since you're not a big power threat that pitchers aren't off the plate as obviously as they are to the Pujols' of the world.

Link to comment
Share on other sites

- yet, I believe (correct me if I'm wrong) that OPS' and OPS are actually closer to 1:1 when compared with each other, than 1:8 to 1.

... and this, presumably, is because there is a very high correlation between OBP and SLG. These are not independent events. For one thing, BA features prominently in both. For another, as many posters have pointed out in this thread, the best (power) hitters (ie, high SLG) are also often intentionally walked and pitched around, hence more walks.

Link to comment
Share on other sites

If you're talking about graph of the correlation between various stats and runs scored, it's not about SLG being broken down into how many runs it produces. It's about mapping various SLG percentages of the different teams from 2000-2004 and putting them on one axis, then putting the Runs Scored by those teams on the other axis, plotting the points, creating a graph of that information, then trying to create an equation that best fits the data. The correlation coefficient is pretty much a % of how closely the data fits the equation.

However, just because something has a high correlation, that doesn't mean there's a cause and effect. For example, a high SLG doesn't necessarily PRODUCE the runs scored, but a team that produces a lot of runs probably DOES have a lot of high SLG guys.

Math stuffs: http://en.wikipedia.org/wiki/Correlation_and_dependence

This also begs the question of timeseries vs cross-sectional. Cross-sectional correlations would go much further toward suggesting causality than time series, in this case. If one year SLG and RS were both low, and the next year MLB introduced a new juiced ball, almost all teams would show a higher SLG and RS the second year. Each team's timeseries correlation between SLG and RS would be astronomically high - but that wouldn't mean squat about whether or not SLG effects RS. However, a cross-sectional correlation - or, better yet a series of cross-sectional correlations - would go much further to express causality.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.


×
×
  • Create New...