Jump to content

Estimating catcher defense (long)


KAZ97

Recommended Posts

This is a follow up to a recent thread about estimating the defensive contributions of a catcher. I moved it over to the MLB section so it is easier to ignore for those who don’t care to get into statistical nitty gritty. My apologies in advanced for it being long.

It’s my opinion that questions which are best answered with statistical data are very specific. Questions such as, who is the best defensive catcher? are too broad. If one attempts to answer a broad question such as that, you naturally have to define what is meant by “best”. That leads to a debate about which statistic captures this notion of “best”. Thus CERA, UZR, some framing metric are all debated back and forth until we reach the conclusion that any given statistic is “worthless”.

I suggest stepping back and instead of focusing on trying to create a new stat that answers a broad question, we start with a narrow question and ask the data for its best answer. So to that end, say we are broadly interested in making a quantitative estimate of the defensive of two catchers. Who knows, maybe this is relevant to an arbitration decision, a free agent signing or even an in-game defensive replacement. What seems relevant is given I have a choice between the two players, which player will lead to my team giving up the fewest runs. First we must be even more specific than that. Over what time frame should we consider, a season? a game? an inning? an at bat? I suggest using an inning. Each inning starts with none on and none out, thus giving us a clean slate. It is also the finest unit starting with a clean slate, so we will have the most opportunities to observe each player’s performance.

If I can choose which catcher to send out there in a given inning, what is the expected difference in runs scored in that inning? I suggest that is now narrowly enough defined for statistics to be useful. (For those statistically trained, you may recognize the formulation of the potential outcomes framework, where catcher 1 and catcher 2 define the treatment levels, the units are the innings, the outcome is runs and the timing of the treatment is the beginning of the inning).

We can ask that question of the data. Naively, we may just look at the innings that catcher 1 started and sum up all the runs that are scored and compare that sum to the innings that catcher 2 started. That would be one answer to our question, though most recognize that this answer would be wrong (in statistical terms, biased). This is because all sorts of things could be different between the innings we observe catcher 1 playing and the innings we observe catcher 2 playing. Recall how our question was phrased, “If I can choose which catcher to send out there …” Well I can’t. Someone else made that decision (the manager in game, the GM at the beginning of the season deciding who will be on the 25 man roster etc). This is known as the fundamental problem of causal inference. There is no getting around it. If you perform a statistical analysis and don’t run into this problem you haven’t framed your question specifically enough.

There are two ways to move forward. Throw up your hands, give up and just call OldFan and ask him who is better, or make assumptions, assess those assumptions and make the best estimate we can. I prefer the latter, though I recognize the former might be more entertaining.

The relevant question now is what else, besides the defensive abilities of our two catchers, can influence the number of runs scored in the inning? Maybe we believe that three important factors are the talent of the pitchers in that inning, the talent of the hitters and what ball park the game is played in. There are most likely others, but if either (1) those other effects are small or (2) they are correlated with those three, then our answer will validly answer our question (or close to it after accounting for random variation). This allows us to state explicitly our assumption: Given we start the inning with a certain pitcher, against a certain line-up and in a certain park, are the innings we observe catcher 1 and catcher 2 roughly exchangeable? If that assumption holds, we can use some fancy math to come up with an answer.

So let’s go through an example. Joe Maurer won the gold glove in 2010. If I could choose between sending Joe Maurer out there defensively or Matt Wieters, what is the difference in expected runs scored? To answer that question, let’s look at all the innings that Joe Maurer and Matt Wieters caught in 2010. Joe Maurer caught 936 innings and Matt Wieters caught 1005 innings in 2010. So that’s our dataset. Not one pitcher pitched to both guys in 2010. On average, Maurer caught the more talented staff, average ERA of Maurer’s pitchers was 3.99 and the average ERA of Wieters pitchers was 4.59 (remember this is 2010, though I know that is painful). On average, Wieters tended to face tougher lineups. The average OPS of the lineups Wieters faced was .740 and for Maurer it was 0.737. They both caught in the AL so they played in roughly the same parks except obviously Maurer was in Minnesota half the time and Wieters was in Baltimore. Their relative park factors were Maurer 0.973 and Wieters 0.988. So all three important factors are stacking the deck against Wieters.

There is a statistical technique that allows us to summarize those three factors and estimate the likelihood of each catcher starting an inning in a given run scoring environment. As you might expect, Wieters was faced with some real stinkers of situations in 2010. For example, on May 8th, Wieters had to face the Yankees at Camden Yards with Albert Castillo on the mound. On the other hand, Maurer never came close to a situation that bad. In fact, Maurer only caught 8 innings anywhere near that bad and 7 of them came on August 23rd in Texas with Nick Blackburn on the mound. Recall again how our question was phrased. We want to ask a question about sending out two different catchers in a given inning. If that inning is setting up to be a high run scoring inning, we would effectively be comparing Matt Wieters’ body of work with one day of Joe Maurer in Texas. Said another way, there simply isn’t data available to answer that question for high run scoring innings.

How well did our model do in matching up other comparative innings? For one example, our model would predict roughly the same runs scored in the eighth inning of an April 20th game in Minnesota against Cleveland with Kevin Slowey pitching as it would from the 3rd inning in Kansas City with Brain Matusz pitching. Note that assessing balance of all 1900+ innings is not a matter of statistics, it’s a matter of baseball judgment. Thus I never understood the dichotomy between “stats” and “scouts”. Every proper statistical analysis relies on baseball judgment, traditionally the domain of the “scout” crowd. And if you don’t believe that a scout’s grade of a 70 arm strength or an opinion of a plus-plus pitch isn’t a statistic, then you have too narrow a definition of what a statistic is. /minor digression over.

What’s the answer to our question? In comparable innings, we would expect Joe Maurer to give up 0.01 runs less per inning. We can translate that into a number that is perhaps more familiar to baseball fans. That would be equivalent to the difference we would expect between two pitchers with a ~ 0.1 difference in their ERA, say 4.15 and 4.05.

That answer, or perhaps more precisely that estimate, is the best guess from the available data. There are many estimates which are consistent with the data available. The data is also consistent with there being absolutely no difference in the defensive abilities of Joe Maurer and Matt Wieters in 2010. To say another way, that estimate comes with a wide variance. If I was forced to write an executive summary of the defensive difference between Maurer and Wieters in 2010, I’d say they were roughly equivalent with some evidence that Maurer was slightly better. It will be interesting to ask a similar question comparing Wieters to the gold glove winner in 2011 if indeed he doesn’t win it.

Thanks for indulging me. In that earlier thread which started all this, Drungo made a comment that if one could correct the flaws in CERA that might be of interest to an MLB team. I’m not sure I’ve done that, but the above is how I would approach the issue of estimating defensive impact of a catcher. On the off chance that Drungo is correct and someone from the Orioles reads this, I’ll give the hometown team first dibs. Oriole's new GM: feel free to contact me, I’m season ticket holder #259670.

Link to comment
Share on other sites

I don't see how you can avoid circularity, if you are going are going to use the ERA of the pitchers in your analysis. That is because the ERA of a team's pitchers would depend in some significant part on the defensive skills of their catchers, probably most on the pitches they call for.

Link to comment
Share on other sites

What’s the answer to our question? In comparable innings, we would expect Joe Maurer to give up 0.01 runs less per inning. We can translate that into a number that is perhaps more familiar to baseball fans. That would be equivalent to the difference we would expect between two pitchers with a ~ 0.1 difference in their ERA, say 4.15 and 4.05.

Why is a difference of 0.01 runs for a catcher equivalent to a 0.10 difference in ERA between two pitchers? Why doesn't 0.01 = 0.01?

Link to comment
Share on other sites

Oh, duh (headslap).

Instead of estimating the Park Factors the way you did with OPS, you might want to consider the pitchers ERA+ and work backwards from that. It should give you a better idea of the Park/Talent distribution that the pitching staff faced better than your method. If not, than at least us OPS+ and pro-rate the competition levels based on games played. That should be fairly easy to do and narrow your margin of error.

Link to comment
Share on other sites

Okay, look, I don't know how else to say this, and I'm sure you put a lot of effort into the OP, but it makes no sense. I've read it three times.

Now, it's possible that I'm missing something. But you show no work. I have no idea how you got your final figure of a difference of .10 ERA. I still have no idea what your methodology is. If you have something new, I'm intrigued. But right now it looks like you just threw a bunch of concepts and words into a post and randomly threw out a number.

Link to comment
Share on other sites

A few comments. CA-Oriole, I did not use OPS to estimate park factors. I let ESPN estimate the park factors, and used their 2010 park factors found here. Sorry if I wasn't clear about that.

I've been playing around with this some. I was never comfortable with how earned runs are calculated. It seems to give a free pass alot of times after an error is committed. As if the pitcher and defense don't have any responsibilities after a two out error. So I changed the above analysis to look at total runs scored in the inning and accounting for errors made. That results in, for an example, estimating the total number of runs scored given that two errors were committed in the inning. Granted the catcher can make some of those errors and should be held accountable for them, so the above method will basically estimate his defensive contribution independent of the errors he commits (so accounting for all the other things like pitching calling, framing, running game etc).

Which brings up a bigger point that was mentioned by Erstwhile (kudos to him for identifying the problem). If the catcher potentially influences the outcome an at-bat, then using the aggregated results from those at-bats to estimate a pitcher's talent is flawed. To generalize his point, not only would the pitcher's ERA be influenced but any measure based on results. For example, if the catcher's ability to frame pitches is important, than its likely to influence strikeouts and walks. Not only would that effect k/bb ratio or ERA, traditionally "pitcher stats" but if the catcher can influence strikeouts and walks he also has an effect on OBP, traditionally thought of as a batter stat.

At the end of the day, we need a measure of the pitcher's talent and batter's talent that is not based on outcomes of at-bats and thus not influenced by the catcher. Ideally maybe some scouts opinion of their talent at that moment. Given I don't have access to the team's reports from advanced scouts, I used what I did have access to, flawed as it may be. In the example I gave above, comparing Wieters to Mauer, that probably wont be that big a deal for the hitters because the majority of their at-bats were potentially influenced by other catchers. But with pitchers its a big deal, certainly when a certain pitcher, such as Jeremy Guthrie pitches a vast majority of his innings to only one or two catchers.

Link to comment
Share on other sites

I'll try to "show my work".

I started out with all the 2010 data and removed the plays were either Wieters or Mauer were the catcher (that's 191,835 plays). For each play, I looked at what park the game was played in and added the park factor from espn. I also looked up the pitcher and the batter for each play and merged in the pitcher's era and the batter's OPS. Those plays occurred in 1932 innings. So I created a spreadsheet with 1932 rows. Each row had columns indicating if Wieters or Mauer was the catcher, the average of all the pitchers ERAs (weighted by how many plays in the inning they accounted for), the weighted average of the hitters OPS, the park factor and the number of runs scored in the inning.

So just for an example, the 3rd inning of the April 5th game of the Twins against the Angels. The game was played in Anaheim and the park factor for 2010 was 0.836. Scott Baker was on the mound for the Twins and he had an ERA of 4.49 in 2010. The Angels sent three men to the plate (Tori Hunter, Hideki Matsui and Kendery Morales) and they averaged an OPS of 0.824.

In essence what a regression model does is match up all the innings with similar pitcher ERA, similar park factors and similar batter OPS. The assumption is that if those things are the same, two equal defensive catchers would give up the same number of expected runs. I then compared how many runs were scored when Wieters was catcher and Mauer was catcher. Averaged across all similar looking innings, 0.01 fewer runs were scored when Mauer was the catcher compared to when Wieters was catcher.

I hope that helps.

Link to comment
Share on other sites

I'll try to "show my work".

I started out with all the 2010 data and removed the plays were either Wieters or Mauer were the catcher (that's 191,835 plays). For each play, I looked at what park the game was played in and added the park factor from espn. I also looked up the pitcher and the batter for each play and merged in the pitcher's era and the batter's OPS. Those plays occurred in 1932 innings. So I created a spreadsheet with 1932 rows. Each row had columns indicating if Wieters or Mauer was the catcher, the average of all the pitchers ERAs (weighted by how many plays in the inning they accounted for), the weighted average of the hitters OPS, the park factor and the number of runs scored in the inning.

So just for an example, the 3rd inning of the April 5th game of the Twins against the Angels. The game was played in Anaheim and the park factor for 2010 was 0.836. Scott Baker was on the mound for the Twins and he had an ERA of 4.49 in 2010. The Angels sent three men to the plate (Tori Hunter, Hideki Matsui and Kendery Morales) and they averaged an OPS of 0.824.

In essence what a regression model does is match up all the innings with similar pitcher ERA, similar park factors and similar batter OPS. The assumption is that if those things are the same, two equal defensive catchers would give up the same number of expected runs. I then compared how many runs were scored when Wieters was catcher and Mauer was catcher. Averaged across all similar looking innings, 0.01 fewer runs were scored when Mauer was the catcher compared to when Wieters was catcher.

I hope that helps.

That's actually very interesting. BTW, I assume that the bolded is backwards - that Wieters was 0.01 better at preventing runs, not Mauer.

Did you test for statistical significance? My gut reaction is that 1932 plays is actually a very small sample for this analysis.

The problem of circularity (i.e. pitcher ERA both affecting and being affected by your results) is a really hard one to overcome. It's the #1 stumbling block to isolating the effect of a catcher on a pitching staff. I think your analysis is way more legit than I initially thought but I don't see that it does much if anything to address this.

Finally, on a "smell test" level, a difference of 0.01 runs saved per inning between two highly regarded defensive catchers is a difference of about 11 runs on average, which is about 1.1 wins. That seems high but reasonable.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.



×
×
  • Create New...