Thursday, May 24, 2012

An In-Depth Look at wOBA

wOBA is “weighted on base average.” It takes the different types of ways on how to get on base and weighs them appropriately. It’s designed to look like OBP, a stat that is widely considered understandable to the common baseball fan.

Slugging percentage kind of deal with weights; they give singles a “weight” of 1, doubles a “weight” of 2, triples a “weight” of 3 and home runs a “weight” of four. There is a flaw to this thinking. Are doubles really worth twice as much as singles? Are home runs really worth twice as much as doubles and four times as much as singles? Of course not. This is the biggest flaw of using slugging percentage.

On base percentage does not deal with weights. They treat a home run and a walk the same, as well as a double and triple; get on base any of these ways increases your on base percentage the same amount. One big advantage from using on base percentage is that it does include walks, something slugging percentage ignores. Add on base percentage and slugging percentage and you get OPS. OPS is the best of both worlds, you have the weights of singles, doubles, triples, home runs as well as including walks. OPS does a very good job of showing overall offensive production. However, there must be a better way to properly weigh each stat. And that’s what wOBA attempts to do.

wOBA uses a thing called linear weights to determine how much each event is worth. This is the most confusing part of wOBA. You almost have to be a computer programmer to figure this out. From what I understand it takes the actual number of runs scored in a season and breaks it down to specific events and looks at how likely a run scored in that event, relative to an out. Like I mentioned, this is very complicated and there are even different ways to do this that will give you slightly different results.

Tom Tango, the developer of wOBA, did this and came up with these values:

HR = 1.70

3B = 1.37
2B = 1.08
1B = 0.77
NIBB = 0.62

He wasn’t satisfied with these results as the average wasn’t close enough to OBP. So he scaled it by adding 15% to each event.

HR = 1.95

3B = 1.56
2B = 1.24
1B = 0.90
NIBB = 0.72

He also decided that HBP and RBOE were also important enough to include:

HBP = 0.75
RBOE = 0.92

Multiply each event by these coefficients and then divide by plate appearances, and you’ll have wOBA. Well, one version of wOBA. I’ll use Miguel Cabrera’s 2011 as an example:

(0.72*NIBB + 0.75*HBP + 0.90*1B + 0.92*RBOE + 1.24*2B + 1.56*3B + 1.95*HR) / PA

(0.72*86 + 0.75*3 + 0.90*119 + 0.92*6 + 1.24*48 + 1.56*0 + 1.95*HR)/688 = .429

If you go to FanGraphs, you’ll notice that this isn’t exactly the same value that they have, which is .436. The reason for this is that wOBA has been tweaked since this original formula. The main reason is that the run environments are different year-to-year. A single isn’t always worth 0.90 runs; a home run isn’t always worth 1.95 runs. The other reason is that FanGraphs decided to add in SB and CS to the formula. Now Matt Klassen at Beyond the Boxscore did an amazing job at showing these coefficients through the years, up to 2010:

However, if you work out the formula you might not get exactly the same amount as on FanGraphs, probably mainly due to rounding. But they come pretty darn close.

The problem with this is that they haven’t, and probably won’t, update it to include recent years. Well, I found a wOBA calculator in the form of an Excel worksheet:

The great thing about this calculator is that it subtracts out pitcher’s hitting stats (which is exactly what FanGraphs does). Not including pitcher’s hitting stats excludes their weak bats and you’re left with the “true” hitters. Putting in Miguel Cabrera’s stats still gives us .429. The difference this time is that this still doesn’t give SB/CS consideration and it’s possible that they use a different linear weights method. If you notice on the Excel spreadsheet, lines 20-37 are hidden. This is where the linear weights calculations are being performed. You can unhide this to see exactly how complicated this method is. Don’t ask me any questions about it, because I don’t know either.

Now would be a good time to discuss the flaws of wOBA. First of all, why in the heck are reached base on an error included?!? The reason why IBB aren’t included is because the hitter didn’t do anything special to get on base; it was the pitcher (or more specifically the manager) who decided to put him on. The same philosophy can be used for RBOE, the hitter didn’t do anything special to get on base; it was the fielder who messed up. The best reason I can think of, is that there is a direct correlation to guys getting on base due to an error and scoring that isn’t shown in IBB. Also, guys rarely get on base on an error (Omar Infante got on base the most due to an error in 2011, a whole 13 times or 2% of his PA). You can chose to not include it if you like (this is part of the “tweaking” but it doesn’t really effect the numbers much). Also, IBB weren’t officially kept as a stat until 1955, so wOBA might overrate some older players as their total BB are used.

Secondly, some people chose not to include stolen bases and caught stealing, as in the Hardball Times calculator. They want to use wOBA as a true hitting stat, and stolen bases aren’t a product of hitting. Some people want to use it as stolen bases are a part of offense and want to use wOBA as a total offensive stat.

Now from wOBA, we can create 3 other stats, wRAA, wRC and wRC+.

wRAA is weighted runs above average and is pretty easy to calculate, (wOBA-lgwOBA)/scale*PA. Division by the scale is important because initially we added a percentage to make wOBA look more like OBP. Doing this brings it back to runs value. 0 is average, therefore anything above 0 is above-average, anything below 0 is below-average.

Using Miguel Cabrera again:

Cabrera’s wOBA = 0.436
League wOBA = .316
Cabrera’s PA = 688
And from what I can tell, the scale = 1.26.

So, (0.436-0.316)/1.26*688 = 65.5 wRAA. FanGraphs shows a wRAA of 65.6, so there’s a rounding issue here.

wRC is weighted runs created. Bill James created Runs Created. Tom Tango took his theory and applied it to wOBA. wRC = wRAA+((LgRuns/PA)*PA). Basically you’re taking the league’s runs/PA and multiplying it by the player’s PA and adding in the player’s wRAA. This eliminates the “above average” part of wRAA.

2011 League runs = 20808

2011 League Plate Appearances = 185245
League runs/PA = 0.112

Using Miguel Cabrera again, wRC = 65.6+(0.112*688) = 143 wRC. Which is exactly what FanGraphs shows.

wRC+ is weighted runs created, adjusted to the league and ballpark. wRC+ = ((wRAA/PA)/(lgRun/PA)+1*100*a ballpark adjustment.

From what I can tell, the ballpark adjustment for Comerica Park is around 0.95 (again, rounding issues could prevent you from getting exactly the number as FanGraphs gets, but it’s close enough).

Using Miguel Cabrera one more time, wRC+ = ((65.6/688)/0.112+1)*95 = 176. FanGraphs has his wRC+ at 177, so there’s the rounding issue.

A 0.95 ballpark adjustment means that Comerica Park was slightly a hitter’s park in 2011 and therefore wRC+ had to be scaled down a little.