Jump to content

Transition Probability Value


Recommended Posts

Long time lurker, recent poster here. I've been drawn to OH by the quality of the discussions and this is my first thread. Please forgive me for this long’ish post. I would like to get some comments on the following work and I respect the level of group knowledge on this board. So here goes: I am proposing a new stat able to assign value to any team, player, lineup, pitching staff or battery in baseball. Ambitious? You tell me. I call this stat:

Transition Probability Value (TPV)

TPV is the expected number of runs scored per inning given the true underlying talent of the observational unit (team, pitcher, lineup etc).

All of the gory details are here.

Essentially the model I use deconstructs the game into states as defined by the number of outs and the position of the runners (e.g. 1 out, runner on 1B). If you define a “play” as any event that either changes the state of the game or scores a run, then you can estimate the probability of going from one state to another as well as the number of expected runs.

For example, if the start of the play is none on and none out and the play ends with a runner on 1B and still none out, then we know that no runs scored on that play. For a vast majority of transitions between states, we know with certainty how many runs score. A problem arises when there is some ambiguity. Such as when a play starts with 1 out and a runner on 1B and ends with 1 out and a runner on 2B. That transition between states can result in 1 run (the batter hits a double) or 0 runs (a stolen base).

A breakthrough is reached when you consider batter plays independently of running plays. Once that is done the expected runs scored for each state transition is a known constant! Thus removing the variability in expected runs, the talent of a team/player can be expressed as a function of the probability of transitioning from one state to another.

Sorry if that is confusing. I think I do a better job of explaining it on the above link.

I am certainly not a sabrmetrician and am not familiar with all the stats out there. So I am relying on some of you to tell me if I just re-created the wheel. As far as I can tell TPV is most similar to Bill James’ Runs Created and Tom Tango’s Runs Expected but there seem to be important differences from both of those. For example, RE estimates the expected runs through the end of the inning given a particular state. This requires some large assumptions regarding the run environment. TPV models the runs created as a constant and assigns value only in the talent required to transition from one state to another.

The big advantage of TPV is that the model relies only on the rules of the game. That means that TPV can be used to assess the current major leagues, AAA, college, the dead ball era, little league or your beer guzzling softball team. Further, it only relies on two fundamental rules: 3 outs to an inning and two runners can not safely occupy the same base. Therefore it can assess leagues with slightly different rules such as DHs, aluminum bats, restrictions on base running etc.

I ran some tests using 2008 data from the AL East (given in the link above). TPV seems to fare well, out performing Runs Created when assessing offensive value.

There are multiple cool applications of this but I first wanted to get some feedback. If you have interest, please give the full description a read. Any and all comments are welcome. Thanks.

Link to comment
Share on other sites


This topic is now archived and is closed to further replies.

  • Create New...