As we've all seen, Xander Bogaerts has been a stud this year. However, the power numbers have not been quite what people may have expected. Is that going to change?
I'm going to look at Xander's minor league monthly trends to see if we can learn something.
First up: 2010 Minor League Numbers
Xander slashed .314/.396/.423/.819 in 2010 with the DSL Red Sox.
If we break it down by month:
June: 80 PA, .328/.425/.522/.947
July: 118 PA, .287/.373/.347/.719
August: 82 PA, .338/.402/.437/.839
As we can see, each month brought very different things from Xander. He was at his best in both OBP and SLG in June, though his best average came in August.
Next, we'll take a look at 2011 MiLB numbers.
In 2011, Xander posted the following numbers (avg, obp, slg, ops) with the Greenville Drive:
Split by month, these are the numbers (same chart organization)
June, 65 PA:
July, 100 PA:
August, 113 PA:
September, 18 PA:
One thing we should notice here is that while Xander's average reached its nadir in July, his ISO was off the charts, at a whopping .287 in 100 PA.
Another thing we should notice is that the September numbers came in only 18 PA, so we should be cautious.
2012 MiLB Numbers
Splitting the season between Salem and Portland, these were Xander's season numbers in 2012:
Breaking it down by month again:
April, 89 PA:
May, 104 PA:
June, 120 PA:
July, 93 PA:
August, 115 PA:
September, 11 PA:
Again, the September numbers were not over a sample size that should even be accounted for, so we'll have to take care of that when we do our final trend line.
Finally, the 2013 MiLB+MLB numbers:
Before you question why I'm including the MLB numbers, I should note that I think it's fair to include level jumps as I have in other years, as long as we know that it takes time to catch up. The nerd math later will help account for this.
In 2013, Xander played for Portland, Pawtucket, and Boston, slashing this on the year in 569 PA:
March, 4 PA:
April, 99 PA:
May, 121 PA:
June, 111 PA:
July, 108 PA:
August, 92 PA:
September, 34 PA:
The March numbers are, on their own, meaningless here. Since they're the only March numbers of his career, it'd be silly to include a category with 4 PA in our regression analysis, so we're saying goodbye to those March numbers.
After going back through all Xander's monthly numbers (ignoring HBP because that's not really under the batter's control) and adding those from every level, we get this:
April, 188 PA:
May, 228 PA:
June, 376 PA:
July, 419 PA:
August, 402 PA:
September, 63 PA:
Now, we're going to use these numbers to do a linear regression analysis based on statistical analyses of each month's OPS, assuming data are normal.
WARNING: SERIOUS NERD MATH COMING UP.
σ = sqrt((PQ)/n)
σ = sqrt((0.744681*0.255319)/188)
SE = σ/sqrt(n)
SE = (0.03180151276)/(sqrt(188))
I'm going to use α=0.05.
Confidence interval @ .95 = P+/-((z*(SE)) note: z* is the critical value for the .95 confidence level, comes out to 1.96
So the 95% confidence interval for Xander's April OPS is (0.74013504716,0.74922695283).
(No longer going to write out formulas, so just refer to the ones given in the April calculations if you're curious where things are coming from. Confidence level will stay 95% throughout.)
Confidence Interval = (0.724245,0.731895)
Confidence Interval = (0.859903,0.863502)
Confidence Interval = (0.725842,0.730005)
Confidence Interval = (0.819026,0.822765)
Confidence Interval = (0.781061,0.806241)So now that we have our confidence intervals for each month, we have a few options: we can test a hypothesis for his numbers, or just take a good number within the interval. Since I care about the slope of the regression line and NOT the numbers themselves, we're just going to take the lower bound of each interval as the data point.
NOTE: There are many out there who will say that this whole this is a waste of time due to small sample size. HOWEVER: sample size is accounted for TWICE in the Standard Error calculation, and leads to a wider confidence interval. Taking the lower bound actually inflates the effect of small sample size, so the slope we end up with may actually be too conservative based on sample size, but better that than the other way around.
Using excel, and graphing the month against the lower bound of the confidence interval, we get a graph that looks like this (regression line included):
So using the given equation of y(should be a hat) = 0.0101x + 0.7395, we see that, based on the data, the slope of his OPS regresses to 0.0101 points per month. We want to see what the range of that is, because as you can see, the correlation coefficient isn't exactly high here.
So, back to the nerd math:
In order to do this, we have to find each month's residual. Residuals are essentially the difference between the actual and expected value.
So, for April, we observe a value of 0.740135. However, based on the formula (in which x is month, where April is month 1), we have an expected value of 0.7496, giving us a residual of -0.009465.
To save time, I used an excel function to calculate this, resulting in the follow table:
Now we can use the sum of the residuals (0.001112) to calculate the variance.
The sum of the residuals squared divided by the sample size minus two is the formula for variance (it looks scary, sorry!)
((0.001112)^2)(6-2) = 0.00000494617
Now we can use the variance to find the standard deviation, which is just the root of variance.
sqrt(0.00000494617) = 0.00222399865
Now we can begin the calculation for the standard error of the slope.
Formula time!: SE = sqrt [ Σ(yi - ŷi)2 / (n - 2) ] / sqrt [ Σ(xi - x)2 ]
Orrrrrr we can cheat and use a calculator. Yeah, let's do that!
From the calculator, we get a value of 0.13934653.
Now we get to construct a confidence interval for the slope. This is where I'm going to get irresponsible for the sake of making a point.
I'm going to use a 99.9% confidence interval.
For those of you who haven't a clue what I just said, I'm going to radically increase our critical value to get the most accurate interval I can. It's going to lead to some wonky results.
Namely, our confidence interval of (-.1098, .13012). Most statisticians would see that and immediately reject any hypotheses of a true positive regression. However, between the fact that it's a 99.9% confidence interval AND the fact that there is more ground above 0 than below it (meaning that, assuming normality, the slope is more likely to be positive than negative), I'm going to run with it.
I'm going to take the center of the confidence interval to create Xander Bogaerts' projected OPS by month for the rest of the year. That slope is 0.01016. We're going to set April as month 0 and jump in at June as month two, leaving the two months worth of OPS in tact.
Okay, you can breathe now. No more nerd math, I think.
Based on all those numbers, this is the graph we get for Xander's projected OPS by month (after May, since May has already occurred).
Notice that giant performance above expectation in May? That could mean one of a few things. It could mean he's unrepresentatively hot right now, or it could mean something that should tantalize us even more:
All this math? It doesn't even include development.
If that's not something to get excited about, I don't know what is. Based on Xander's career trend and April performance this year, the math suggests a possible .800 OPS in mid-July if Xander doesn't develop past day one.
The good news? The kid's 21. You add his normal monthly trends to his inevitable development? You have yourself a truly scary player, whether it's at short or third.
Speaking of which, here's a fun postscript.
Both the math and the eye say that Xander will get better with time. But wanna know something cool?
The kid already has the 42nd best OPS in the game. That might not seem that impressive, but let's break it down some more.
Bogaerts' .813 OPS is currently sixth among all left side infielders- both SS and 3B- in the sport. All of 'em. Ahead of him are Troy Tulowitzki (by .361) , Alexei Ramirez (by .030), Josh Donaldson (by .086), Nolan Arenado (by .010), and Todd Frazier (by .005).
Now, you tell me, you think the kid's gonna keep up a .425 slugging mark, good for a .129 ISO? Boy, that seems low to me.
Despite the painful math, this was really
irresponsible convoluted presumptuous fun.
No matter what the numbers say, I think it's pretty clear that Xander has a bright, bright future ahead of him.