Jump to content

Data driven decisions can be garbage


Baltimorecuse

Recommended Posts

It's really a no brainer.  In a game like baseball no sext can possibly consider every possible variable.  You see this in political polling all the time.  

 

Example in baseball.  You've got 1st and 3rd with one out.  The percentages in the book say hit away, because back checking all the times the computer can find this situation back as far as the records go, hitting away, ON THE AVERAGE, produces more runs.  But that's an average of every time this situation comes up.  

So let's add some different variables.  The pitcher kills right handed hitting.  The hitter can't hit righties.  It's the bottom of the eighth.  A run doubles your score.  Now every time that situation has occurred is averaged into the original set of variables with all the thousands of time the first and third, with one out situation has occurred.  But that totally waters down the specific situation we're in and the computer book decision is useless.  

The problem with the data is it's overly generalized.  For you Ravens fans, the book says you go fourth and one.  How did that work out last year?

Link to comment
Share on other sites

2 minutes ago, ArtVanDelay said:

Is there a purpose to this thread or did you just feel the need to crap on analytics?

And I’m not trying to be a jerk here.  This thread just seems totally out of left field.  The O’s just got a big win to avoid a sweep and this is what’s on your mind?

Link to comment
Share on other sites

10 minutes ago, Baltimorecuse said:

It's really a no brainer.  In a game like baseball no sext can possibly consider every possible variable.  You see this in political polling all the time.  

 

Example in baseball.  You've got 1st and 3rd with one out.  The percentages in the book say hit away, because back checking all the times the computer can find this situation back as far as the records go, hitting away, ON THE AVERAGE, produces more runs.  But that's an average of every time this situation comes up.  

So let's add some different variables.  The pitcher kills right handed hitting.  The hitter can't hit righties.  It's the bottom of the eighth.  A run doubles your score.  Now every time that situation has occurred is averaged into the original set of variables with all the thousands of time the first and third, with one out situation has occurred.  But that totally waters down the specific situation we're in and the computer book decision is useless.  

The problem with the data is it's overly generalized.  For you Ravens fans, the book says you go fourth and one.  How did that work out last year?

You seem to be condemning the notion of data driven decisions... by suggesting that you could make better decisions if you had more data.

What am I missing?

Link to comment
Share on other sites

1 minute ago, ArtVanDelay said:

Is there a purpose to this thread or did you just feel the need to crap on analytics?

Yes, when McKenna failed to get the bunt down a couple of games ago, someone jumped in and said the analytics mean you should hit away in that situation.  McKenna ended up hitting into a DP, that allowed the other side to tie the game with one run.  

I argued the analytics in specific situations are crap.  Never got around to explaining why.  

Link to comment
Share on other sites

1 minute ago, owknows said:

You seem to be condemning the notion of data driven decisions... by suggesting that you could make better decisions if you had more data.

What am I missing?

Actually it's more than that.  Sometimes incomplete data is worse than no data at all because it leads to the wrong decision.    Remember "New Coke"?

Link to comment
Share on other sites

Just now, Baltimorecuse said:

Actually it's more than that.  Sometimes incomplete data is worse than no data at all because it leads to the wrong decision.    Remember "New Coke"?

So your beef isn't with data driven decisions... It's with insufficient data.

Link to comment
Share on other sites

1 minute ago, Baltimorecuse said:

Yes, when McKenna failed to get the bunt down a couple of games ago, someone jumped in and said the analytics mean you should hit away in that situation.  McKenna ended up hitting into a DP, that allowed the other side to tie the game with one run. I argued the analytics in specific situations are crap.  Never got around to explaining why.  

OK, BUT, the analytics and algorithms are only as good as the data and with specific situations like you described, the dataset is minimal, well below SSS. We are only on the cusp of where data-driven will go in the near future. Hyde's gut decisions, however, based on the generalities in the big datasets the Sig-bot chews on combined with his own experiences is a whole nother can of worms. Probably belongs in the Hyde bashing thread.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Posts

    • The last column is IRS, which means “inherited runners scored”  Inherited runners would be IR and I am surprised that I didn’t see it anywhere in that column. This particular site has a different set of initials, IRS means inherited runners scored. I don’t think that can possibly be correct, because that would be a huge number.
    • LOL. 1.  I never said Basallo was an elite athlete. 2. I’ve said I’d like to see Basallo improve his plate discipline similar to what he showed 2nd half last year. 3. Basallo also needs to keep improving defensively. 4. I hope Basallo can improve over the 2nd half of the season. 5. I’d like to see Holliday cut down the strikeouts and get his average over .300 6.  I think he has a flaw in his mechanics which causes him to pull off the ball.  I hope he corrects it.   Where’s the problem?
    • Trust me, you missed nothing by missing the Coliseum.   Why “should” we have swept Seattle?   They’re leading their division, playing in their park.  I’m really happy to have taken 2 of 3.  Losing the 2-0 lead was disappointing but the offense just didn’t do much in that game and obviously the pen had a bad day.   More generally, I never “expect” a sweep of any opponent.   Sure, I hope for some sweeps, and we’ve had some.  But even a team like Oakland wins more than 1/3 of its games, and the odds basically never favor a sweep even when a really good team faces a really bad one.   Just win the game in front of us, that’s my plan.   
    • I'm going to somebody other than Akin.  Perez wasn't available since he pitched in back to back games. Same goes for Kimbrel.  I generally like Cano when he hasn't had a ton of rest which hurts sinkerballers, so he'd be a decent option.  In order of preference there: Webb Cano Let Baker finish it out Akin  Tate I reckon Hyde had that Showalter in him and was hoping to use Webb in the 9th. But I'm really opposed to just slotting guys into innings and more about putting them in based on effectiveness and situation. High leverage is really going against guys like Akin. 
    • BBref game logs, you can click and highlight by month or whatever stretch of starts that you want.
    • That works, though I'd really prefer it if we didn't trade within the division, if possible.  I'd rather our prospects helped other teams rather than rivals.
  • Popular Contributors

  • Popular Now

×
×
  • Create New...