Jump to content

Why trade Roberts?


turtlebowl

Recommended Posts

Don't know what post you were reading.

What he said was, "So your incorrect intuition costs the team between 15 and 26 runs a year."

And that was based on the cute little Baseball Musings tool.

And the best way to prove someone's tools are inadequate to the task is to call them "cute" in a mocking, playground bully kind of way.

Maybe you could provide some insight into why you think simulation tools aren't (or maybe this particular simulation tool isn't) accurate to the degree davearm suggests. Do you have some metrics that show its failings or inaccuracies in the past?

Link to comment
Share on other sites

  • Replies 437
  • Created
  • Last Reply
Fortunately we don't just have to take your speculation as truth. We've got a neat lineup analyzer tool from Baseball Musings to rely on to see if Roberts leading off is worth having Soriano's numbers suffer.

Here's Lineup #1, with Soriano leading off and putting up his career OBP and SLG numbers from the leadoff spot (.341 and .551), and Roberts hitting second. According to the model, this lineup yields 5.196 runs per game, or 842 in a 162 game season.

Here's Lineup #2, with Roberts leading off, and Soriano in the #5 hole, producing at his career rates of .312 OBP and .513 SLG. This one yields 5.104 runs per game, or 827 total.

And finally here's Lineup #3, with Soriano in the 3 hole at .310 and .452. This one comes in at 5.035 runs per game, or 816 total.

So your incorrect intuition costs the team between 15 and 26 runs a year.

Let me say first it is incredibly dubious to plug Soriano into three different lineup simulators, give him drastically better numbers in the 1 hole, and then claim that its obvious the lineup w him hitting first is optimal. Dare I say, the lineup with him hitting best is optimal with all other things considered equal.

Secondly, if you scroll down the page you linked I think you'll find the best lineups for the Cubs in descending order of expected runs scored. Guess how many of those lineups Soriano leadsoff in? None. Just thought I'd point that out.

And since I know where this conversation is going I'll prematurely respond to your next post. You say, "Well, yes, but Soriano is a much better hitter leading off. He's said so himself; he feels more comfortable there."

To which I respond, "He's had more than 6 times as many abs in the 1 hole than anywhere else. His increased production is much more a case of a good player (notice I didn't say excellent no matter how much $ the Cubs want to throw at him) performing at his naturual level, and the decreased production at other spots much more a case of a decreased sample size.

And oh, by the way, he was pretty outspoken about his desire to stay at 2nd base too wasn't he? Yeah, but he was forced to move and that worked out pretty damn well for him in the end. Your angst about how he'll hit lower in the order is overblown."

Link to comment
Share on other sites

And the best way to prove someone's tools are inadequate to the task is to call them "cute" in a mocking, playground bully kind of way.

Maybe you could provide some insight into why you think simulation tools aren't (or maybe this particular simulation tool isn't) accurate to the degree davearm suggests. Do you have some metrics that show its failings or inaccuracies in the past?

So now someone needs statistics to question the validity of certain statistics?:rolleyes: That's pretty funny.

Link to comment
Share on other sites

Maybe you could provide some insight into why you think simulation tools aren't (or maybe this particular simulation tool isn't) accurate to the degree davearm suggests. Do you have some metrics that show its failings or inaccuracies in the past?

OK. (We've done this before, but that's OK, I can't remember what everybody says either.)

It's a naively simple calc-based tool that is based on two completely untenable assumptions:

  • That a baseball game is simply a series of discrete events (in this case, AB's) in which prior performance is a precise determinant of future performance, and
  • That you can adequately model different lineups by simply taking each player's stats from whatever past-situations, and simply daisy-chain them together in different sequences and get a valid result, as if there are no interdependencies involved.

Both of these things are just arbitrary assumptions with zero empirical evidence to support them. In contrast, a large body of work in trying to construct simulations of other real-world events show that both of these assumptions are naive and not at all viable. Doing adequate simulations of complex discrete events is hard enough, but baseball is more than just a collection of discrete events. Baseball is not even close to being a deterministic system. Absent a precise statistical model of the game, this is precisely the kind of problem that you can't solve with computers.

Don't misunderstand me, I don't in any way criticize the good people who created the tool (assuming they did it competently). They were just doing what they could, which is to take a boatload of individual performance data and juggle them around. That's pretty much all they can do. Without extensive empirical data about lineup effects to go by, and without validated statistical models of the game of baseball, there's little else they could do.

The problem is when people mistake it for something it's not. It's a fine toy for people to play around with for fun. Nothing wrong with that. But that's all it is. Pretending it's something else is the problem. To say it can tell you how many runs you get in an actual season of baseball is ridiculous, much like fantasy baseball is ridiculous as a simulation of real baseball. There's a huge difference between pleasant little computer games and The Game.

Link to comment
Share on other sites

So now someone needs statistics to question the validity of certain statistics?:rolleyes: That's pretty funny.

It's actually way worse than that. It's that you need statistics about lineup effects from actual reality to challenge the conclusions of a toy simulation that is 100% based on a *dearth* of statistics about actual lineup effects.

Link to comment
Share on other sites

Let me say first it is incredibly dubious to plug Soriano into three different lineup simulators, give him drastically better numbers in the 1 hole, and then claim that its obvious the lineup w him hitting first is optimal. Dare I say, the lineup with him hitting best is optimal with all other things considered equal.

Secondly, if you scroll down the page you linked I think you'll find the best lineups for the Cubs in descending order of expected runs scored. Guess how many of those lineups Soriano leadsoff in? None. Just thought I'd point that out.

And since I know where this conversation is going I'll prematurely respond to your next post. You say, "Well, yes, but Soriano is a much better hitter leading off. He's said so himself; he feels more comfortable there."

To which I respond, "He's had more than 6 times as many abs in the 1 hole than anywhere else. His increased production is much more a case of a good player (notice I didn't say excellent no matter how much $ the Cubs want to throw at him) performing at his naturual level, and the decreased production at other spots much more a case of a decreased sample size.

And oh, by the way, he was pretty outspoken about his desire to stay at 2nd base too wasn't he? Yeah, but he was forced to move and that worked out pretty damn well for him in the end. Your angst about how he'll hit lower in the order is overblown."

LOL did you even read the post I was responding to?

If at lead-off Soriano pelts out 5, even 10 more home-runs with a batting avg 30 points higher or so than if he were at the number 2 slot, the Cubs would still get more runs if a guy like Roberts were in front of him.

Since we're trying to prove or disprove this statement, the need to apply different stats at different lineup spots is painfully obvious. If you've got more appropriate values to plug in than the guy's career rates, then I'm all ears.

(FWIW, a literal interpretation of that comment, removing 10 HRs and 30 points of BA, and leaving all else unchanged, would shrink OPS from .341 to .314, and SLG from .551 to .476. In light of this, the adjustments I used are perfectly fair.)

And your sample size argument doesn't hold a lot of water considering Soriano has logged over 600 PAs in each of the 3 and 5 slots.

"The lineup with him hitting best is optimal with all other things considered equal" is pretty much exactly the point I'm making: the small gains from shuffling guys around isn't enough to make up for the loss in production associated with Soriano hitting outside of the 1 hole.

Link to comment
Share on other sites

LOL did you even read the post I was responding to?

Since we're trying to prove or disprove this statement, the need to apply different stats at different lineup spots is painfully obvious. If you've got more appropriate values to plug in than the guy's career rates, then I'm all ears.

(FWIW, a literal interpretation of that comment, removing 10 HRs and 30 points of BA, and leaving all else unchanged, would shrink OPS from .341 to .314, and SLG from .551 to .476. In light of this, the adjustments I used are perfectly fair.)

And your sample size argument doesn't hold a lot of water considering Soriano has logged over 600 PAs in each of the 3 and 5 slots.

"The lineup with him hitting best is optimal with all other things considered equal" is pretty much exactly the point I'm making: the small gains from shuffling guys around isn't enough to make up for the loss in production associated with Soriano hitting outside of the 1 hole.

In truth I didn't. In that case I see your point.

However, I still think its way too strong to assume that Soriano would drastically suffer from hitting out of the 1 hole. Soriano may have logged over 600 abs in both those slots but 1) They weren't consecutive and 2) When did they come? For the most part in Texas. When he hit worse than he has at any other time in his career. To say that is a product of where he hit in the lineup is far-fetched.

Link to comment
Share on other sites

OK. (We've done this before, but that's OK, I can't remember what everybody says either.)

It's a naively simple calc-based tool that is based on two completely untenable assumptions:

  • That a baseball game is simply a series of discrete events (in this case, AB's) in which prior performance is a precise determinant of future performance, and
  • That you can adequately model different lineups by simply taking each player's stats from whatever past-situations, and simply daisy-chain them together in different sequences and get a valid result, as if there are no interdependencies involved.

Both of these things are just arbitrary assumptions with zero empirical evidence to support them. In contrast, a large body of work in trying to construct simulations of other real-world events show that both of these assumptions are naive and not at all viable. Doing adequate simulations of complex discrete events is hard enough, but baseball is more than just a collection of discrete events. Baseball is not even close to being a deterministic system. Absent a precise statistical model of the game, this is precisely the kind of problem that you can't solve with computers.

Don't misunderstand me, I don't in any way criticize the good people who created the tool (assuming they did it competently). They were just doing what they could, which is to take a boatload of individual performance data and juggle them around. That's pretty much all they can do. Without extensive empirical data about lineup effects to go by, and without validated statistical models of the game of baseball, there's little else they could do.

The problem is when people mistake it for something it's not. It's a fine toy for people to play around with for fun. Nothing wrong with that. But that's all it is. Pretending it's something else is the problem. To say it can tell you how many runs you get in an actual season of baseball is ridiculous, much like fantasy baseball is ridiculous as a simulation of real baseball. There's a huge difference between pleasant little computer games and The Game.

The more you talk, the more abundantly clear it becomes that you have absolutely no friggin idea what's behind this model.

Or perhaps more accurately, you think it makes absolutely no friggin difference what's behind this model because no matter what, the problem is far too complex to get back anything meaningful or useful, and thus you're an utter fool even to try, so why should anyone bother learning anything about the analytical process that's at work here since no analysis could ever be constructed that might shed some light on this issue.

Yep, might as well just pack it in, and remain wholly reliant on Morganisms to run your baseball team.

Link to comment
Share on other sites

In truth I didn't. In that case I see your point.

However, I still think its way too strong to assume that Soriano would drastically suffer from hitting out of the 1 hole. Soriano may have logged over 600 abs in both those slots but 1) They weren't consecutive and 2) When did they come? For the most part in Texas. When he hit worse than he has at any other time in his career. To say that is a product of where he hit in the lineup is far-fetched.

Surely you don't intend to argue that Soriano's hitting suffered from playing home games in one of the best hitters' parks in the majors. The stats will disprove this notion decisively as well.

Link to comment
Share on other sites

Surely you don't intend to argue that Soriano's hitting suffered from playing home games in one of the best hitters' parks in the majors.

Of course not; that would be counter intuitive. Kind of like claiming there is some mystical experience in hitting leadoff for him and that's why his #s are better there.

Guys have good years and bad years. Maybe Soriano liked the bars in Arlington too much those years; maybe he met some girl who rearranged his priorities for a while; hell, maybe he missed some girl in the Bronx. I really don't know but to claim it was because he was dropped in the order has no more credibility than the theories I just put forth.

BTW, how did moving to left field work out for him?

Link to comment
Share on other sites

The more you talk, the more abundantly clear it becomes that you have absolutely no friggin idea what's behind this model.

OK, dave, then howsabout if you explain what's behind it, and how that's different than what I said...

Link to comment
Share on other sites

OK, dave, then howsabout if you explain what's behind it, and how that's different than what I said...

If you're legitimately interested in educating yourself on this stuff, I would suggest you begin by reading Mark Pankin's work applying Markov Chains.

This method is applied here, and various other places linked here.

IMO Pankin's work is extremely interesting and the complexity of what he's done is really remarkable. I would laugh out loud if you read through his various articles and came to the conclusion that there is no merit to his analysis.

Cyril Morong has taken a simpler, purely regression-based approach here and here.

Morong's regression work is applied here and here, as well as in the lineup analysis tool you have already blown off.

Link to comment
Share on other sites

If you're legitimately interested in educating yourself on this stuff, I would suggest you begin by reading Mark Pankin's work applying Markov Chains.

This method is applied here, and various other places linked here.

IMO Pankin's work is extremely interesting and the complexity of what he's done is really remarkable. I would laugh out loud if you read through his various articles and came to the conclusion that there is no merit to his analysis.

Cyril Morong has taken a simpler, purely regression-based approach here and here.

Morong's regression work is applied here and here, as well as in the lineup analysis tool you have already blown off.

OK, I followed those links and read all those things.

What's going on here is the difference between talking about forest and the trees. Those articles talk about the trees, while I was talking about the forest. Based on reading those things, the gist of my response has 2 parts:

  • I can see how you were annoyed when I referred to the simulators as toys.
  • What they are doing does not change anything about the main point I was making.

What's going on is that the people doing this are trying to address something that is very hard to do. Plus, they face severe practical problems. Chief among their problems is that what they're trying to do is not something they can do actual experiments about. It's not like they can go to MLB and say, "Hey, we've got these interesting statistical models about lineups, and we wanna find out how right vs. wrong they are, so howsabout if you guys please let us tell all the teams how to do their lineups for a few years, and then we'll see, OK?" In short, they're trying to do all this by observation without experimentation (a fact which isn't their fault), so they've really got 1.75 of their 2 hands tied behind their back. The fact that they can't go into a lab and do stuff is a big part of what makes it next to impossible to really do science about this whole topic (or pretty much any topic concerning baseball). This is the same basic problem that helps keep any of the human sciences (psychology, sociology, economics, etc.) from being "real sciences" in the way that physics or chemistry or biology are. The point here is that it's not really their fault that they can't do this like a real scientist (or engineer) would do things.

So, they're pretty much stuck doing the best they can with what they've got. But that doesn't mean they're right, and it certainly does mean that it's very hard for them to determine whether or not they're right. What it does is create a situation where all they can really do is manipulate the stats they have, because they can't manipulate the Actual Baseball that generates those stats. The fact that it's not their fault that they can't do better doesn't mean that their stat-based conclusions are even close to adequate. All they are is current best-guesses while they are still in the very, very early stages of being in the "discovery phase" of trying to be a science.

Basically, they're doing more-or-less what artificial intelligence researchers did for 20+ years before they admitted it wasn't working and they were barking up the wrong tree: they go into the abstract hypothetical world and construct a hypothetical problem space where they can manipulate the math-based rules and do the best they can. So, like I said:

Don't misunderstand me, I don't in any way criticize the good people who created the tool (assuming they did it competently). They were just doing what they could, which is to take a boatload of individual performance data and juggle them around. That's pretty much all they can do. Without extensive empirical data about lineup effects to go by, and without validated statistical models of the game of baseball, there's little else they could do.

Reading the stuff you linked to suggests that they are doing it competently. But that doesn't change what I said. The fact is that they *are* doing what I said in the 2 bullet points from my previous post. That's all they can do, it's not their fault that they can't do better. But they're still doing those things, they're pretty much stuck with those assumptions because they cannot experiment, and those assumptions are still faulty assumptions. Just because they use fancy-sounding stuff like "Markov models" doesn't change that. All the Markov model does is describe the math of how they go about it, it's not some great technique that somehow provides insight that overcomes the basic problems I mentioned. A Markov model is not some complex model about baseball, it's just a name for the basic nuts and bolts of the arithmetic they're using, that's all it is. That's the forest vs. the trees part: they can do all the matrix-math 100% correctly and still get completely wrong answers, based on the unwarranted assumptions they're stuck with because they can't experiment.

From the standpoint of the guys doing that work, it is indeed research. They're taking what they've got, and they're working hard to see what they can do with it. That part is where the word "toy" does a disservice to them, and I apologize for that. However, from the standpoint of the fan who's not trying to create new knowledge, and instead is just trying to figure out if BRob will help the Cubs much, it *is* just a naive toy and should not be trusted. Using it for that purpose would just as unwise as taking some of the early AI efforts to model the stock market and basing your own real-life stock portfolio on them: it would be an extremely dumb thing to do. In the very same way, it would be dumb to conclude that having BRob bat leadoff for the Cubs is gonna somehow cost your team between 15 and 26 runs just because some very rough and completely not-validated research tool spits out that answer. Same basic mistake.

So, whether it's "good work" vs. a "naive toy" depends on what you're trying to do with it. From the perspective of SABR research, it's one point on the early part of the discovery curve, but from the fan's perspective about their Actual Baseball Team it is just a toy. Here's when it will stop being a toy: when they can take models like that and predict with reliable accuracy how the season is gonna turn out before it happens, in the same way that they knew ahead of time where the Apollo missions where going when they sent them to the moon, and knew exactly how they were gonna get them back home again. You can do that with physics, but you can't do that with baseball. And unless the next Einstein shows up and completely changes the rules about how everybody thinks about the problem, they're not gonna be able to do that either. But they definitely oughta keep trying.

In the meantime, your Cubbies would be a whole lot better off with BRob leading off, and that's true no matter what that dang tool says. But I don't want you to have him ;-)

Link to comment
Share on other sites

So, whether it's "good work" vs. a "naive toy" depends on what you're trying to do with it. From the perspective of SABR research, it's one point on the early part of the discovery curve, but from the fan's perspective about their Actual Baseball Team it is just a toy. Here's when it will stop being a toy: when they can take models like that and predict with reliable accuracy how the season is gonna turn out before it happens, in the same way that they knew ahead of time where the Apollo missions where going when they sent them to the moon, and knew exactly how they were gonna get them back home again. You can do that with physics, but you can't do that with baseball. And unless the next Einstein shows up and completely changes the rules about how everybody thinks about the problem, they're not gonna be able to do that either. But they definitely oughta keep trying.

Why? Why should they keep trying? From your perspective, from what you just said in the above paragraph, they're never going to get it right enough for it matter. You want accuracy in predicting the results of future baseball games along the lines of Kepler's and Newton's laws of planetary motion. That's obviously impossible. At least within the context of the universe we currently know, or probably could hope to know in any of our lifetimes.

If predicting the position of the moon to a few miles in billions of years is what you're looking for in baseball simulation, you're setting all sabermetric analysis up for complete failure all of the time. You're looking for a system (more-or-less) that can tell us how many games the O's will win, right now, in 2073 to within 1/16th of a win. I can only assume you've asked for this level of accuracy knowing it can't happen, thus validating your opinion that this type of research is much more a silly toy than a serious science, and therefore can be dismissed when convenient.

Link to comment
Share on other sites

Preconceived and antiquated? That a lead-off hitter should have a high on base percentage? Call me old fashioned then.

OK, so maybe you are right (although I don't think so) that Soriano's production will be higher at lead-off than elsewhere in the line-up. Even if I grant you that, which I don't have to, the Cubs as a team are sacrificing runs for an individual's stats. If at lead-off Soriano pelts out 5, even 10 more home-runs with a batting avg 30 points higher or so than if he were at the number 2 slot, the Cubs would still get more runs if a guy like Roberts were in front of him.

Not only are you spot on correct that the Cubs would get more runs if Roberts batted leadoff and Soriano second, but Soriono would also hit better because he would benefit from the many ocassions when BR would be distracting the pitcher by his antics when on base.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.


×
×
  • Create New...