FanPost

Xander Bogaerts: What does his MiLB career tell us about this year?

As we've all seen, Xander Bogaerts has been a stud this year. However, the power numbers have not been quite what people may have expected. Is that going to change?

I'm going to look at Xander's minor league monthly trends to see if we can learn something.

First up: 2010 Minor League Numbers

Xander slashed .314/.396/.423/.819 in 2010 with the DSL Red Sox.

If we break it down by month:

June: 80 PA, .328/.425/.522/.947

July: 118 PA, .287/.373/.347/.719

August: 82 PA, .338/.402/.437/.839

As we can see, each month brought very different things from Xander. He was at his best in both OBP and SLG in June, though his best average came in August.

Next, we'll take a look at 2011 MiLB numbers.

In 2011, Xander posted the following numbers (avg, obp, slg, ops) with the Greenville Drive:

.260 .324 .509 .834

Split by month, these are the numbers (same chart organization)

June, 65 PA:

.276 .354 .483 .837

July, 100 PA:

.207 .290 .494 .784

August, 113 PA:

.272 .319 .524 .843

September, 18 PA:

.412 .444 .588 1.033

One thing we should notice here is that while Xander's average reached its nadir in July, his ISO was off the charts, at a whopping .287 in 100 PA.

Another thing we should notice is that the September numbers came in only 18 PA, so we should be cautious.

2012 MiLB Numbers

Splitting the season between Salem and Portland, these were Xander's season numbers in 2012:

.307 .373 .523 .896

Breaking it down by month again:

April, 89 PA:

.278 .360 .468 .828

May, 104 PA:

.258 .308 .381 .689

June, 120 PA:

.337 .433 .624 1.057

July, 93 PA:

.259 .337 .432 .769

August, 115 PA:

.374 .409 .673 1.082

September, 11 PA:

.364 .364 .455 .818

Again, the September numbers were not over a sample size that should even be accounted for, so we'll have to take care of that when we do our final trend line.

Finally, the 2013 MiLB+MLB numbers:

Before you question why I'm including the MLB numbers, I should note that I think it's fair to include level jumps as I have in other years, as long as we know that it takes time to catch up. The nerd math later will help account for this.

In 2013, Xander played for Portland, Pawtucket, and Boston, slashing this on the year in 569 PA:

.295 .384 .470 .855

By month:

March, 4 PA:

.667 .750 1.000 1.750

April, 99 PA:

.299 .378 .402 .780

May, 121 PA:

.277 .380 .535 .915

June, 111 PA:

.305 .405 .505 .911

July, 108 PA:

.311 .426 .522 .948

August, 92 PA:

.302 .341 .384 .724

September, 34 PA:

.207 .294 .379 .673

The March numbers are, on their own, meaningless here. Since they're the only March numbers of his career, it'd be silly to include a category with 4 PA in our regression analysis, so we're saying goodbye to those March numbers.

After going back through all Xander's monthly numbers (ignoring HBP because that's not really under the batter's control) and adding those from every level, we get this:

April, 188 PA:

0.289157 0.361702 0.382979 0.744681

May, 228 PA:

0.267677 0.328947 0.399123 0.72807

June, 376 PA:

0.314642 0.398936 0.462766 0.861702

July, 419 PA:

0.267409 0.346062 0.381862 0.727924

August, 402 PA:

0.321526 0.348259 0.472637 0.820896

September, 63 PA:

0.315789 0.365079 0.428571 0.793651

Now, we're going to use these numbers to do a linear regression analysis based on statistical analyses of each month's OPS, assuming data are normal.

WARNING: SERIOUS NERD MATH COMING UP.

April:

σ = sqrt((PQ)/n)

σ = sqrt((0.744681*0.255319)/188)

=0.03180151276

SE = σ/sqrt(n)

SE = (0.03180151276)/(sqrt(188))

= 0.00231936369

I'm going to use α=0.05.

Confidence interval @ .95 = P+/-((z*(SE)) note: z* is the critical value for the .95 confidence level, comes out to 1.96

= 0.744681+/-((1.96(0.00231936369))

So the 95% confidence interval for Xander's April OPS is (0.74013504716,0.74922695283).

May

(No longer going to write out formulas, so just refer to the ones given in the April calculations if you're curious where things are coming from. Confidence level will stay 95% throughout.)

Confidence Interval = (0.724245,0.731895)

June

Confidence Interval = (0.859903,0.863502)

July

Confidence Interval = (0.725842,0.730005)

August

Confidence Interval = (0.819026,0.822765)

September

Confidence Interval = (0.781061,0.806241)

So now that we have our confidence intervals for each month, we have a few options: we can test a hypothesis for his numbers, or just take a good number within the interval. Since I care about the slope of the regression line and NOT the numbers themselves, we're just going to take the lower bound of each interval as the data point.
NOTE: There are many out there who will say that this whole this is a waste of time due to small sample size. HOWEVER: sample size is accounted for TWICE in the Standard Error calculation, and leads to a wider confidence interval. Taking the lower bound actually inflates the effect of small sample size, so the slope we end up with may actually be too conservative based on sample size, but better that than the other way around.
Using excel, and graphing the month against the lower bound of the confidence interval, we get a graph that looks like this (regression line included):
30rmvbl_medium

So using the given equation of y(should be a hat) = 0.0101x + 0.7395, we see that, based on the data, the slope of his OPS regresses to 0.0101 points per month. We want to see what the range of that is, because as you can see, the correlation coefficient isn't exactly high here.
So, back to the nerd math:

In order to do this, we have to find each month's residual. Residuals are essentially the difference between the actual and expected value.

So, for April, we observe a value of 0.740135. However, based on the formula (in which x is month, where April is month 1), we have an expected value of 0.7496, giving us a residual of -0.009465.

To save time, I used an excel function to calculate this, resulting in the follow table:

Actual Expected Residual
April 0.740135 0.7496 -0.009465
May 0.724245 0.7597 -0.035455
June 0.859903 0.7698 0.090103
July 0.725842 0.7799 -0.054058
August 0.819026 0.79 0.029026
September 0.781061 0.8001 -0.019039

Now we can use the sum of the residuals (0.001112) to calculate the variance.

The sum of the residuals squared divided by the sample size minus two is the formula for variance (it looks scary, sorry!)

((0.001112)^2)(6-2) = 0.00000494617

Now we can use the variance to find the standard deviation, which is just the root of variance.

sqrt(0.00000494617) = 0.00222399865

Now we can begin the calculation for the standard error of the slope.

Formula time!: SE = sqrt [ Σ(yi - ŷi)2 / (n - 2) ] / sqrt [ Σ(xi - x)2 ]

Orrrrrr we can cheat and use a calculator. Yeah, let's do that!

From the calculator, we get a value of 0.13934653.

Now we get to construct a confidence interval for the slope. This is where I'm going to get irresponsible for the sake of making a point.

I'm going to use a 99.9% confidence interval.

For those of you who haven't a clue what I just said, I'm going to radically increase our critical value to get the most accurate interval I can. It's going to lead to some wonky results.

Namely, our confidence interval of (-.1098, .13012). Most statisticians would see that and immediately reject any hypotheses of a true positive regression. However, between the fact that it's a 99.9% confidence interval AND the fact that there is more ground above 0 than below it (meaning that, assuming normality, the slope is more likely to be positive than negative), I'm going to run with it.

I'm going to take the center of the confidence interval to create Xander Bogaerts' projected OPS by month for the rest of the year. That slope is 0.01016. We're going to set April as month 0 and jump in at June as month two, leaving the two months worth of OPS in tact.

Okay, you can breathe now. No more nerd math, I think.

Based on all those numbers, this is the graph we get for Xander's projected OPS by month (after May, since May has already occurred).

23lld2o_medium

Notice that giant performance above expectation in May? That could mean one of a few things. It could mean he's unrepresentatively hot right now, or it could mean something that should tantalize us even more:

All this math? It doesn't even include development.

If that's not something to get excited about, I don't know what is. Based on Xander's career trend and April performance this year, the math suggests a possible .800 OPS in mid-July if Xander doesn't develop past day one.

The good news? The kid's 21. You add his normal monthly trends to his inevitable development? You have yourself a truly scary player, whether it's at short or third.

Speaking of which, here's a fun postscript.

Both the math and the eye say that Xander will get better with time. But wanna know something cool?

The kid already has the 42nd best OPS in the game. That might not seem that impressive, but let's break it down some more.

Bogaerts' .813 OPS is currently sixth among all left side infielders- both SS and 3B- in the sport. All of 'em. Ahead of him are Troy Tulowitzki (by .361) , Alexei Ramirez (by .030), Josh Donaldson (by .086), Nolan Arenado (by .010), and Todd Frazier (by .005).

Now, you tell me, you think the kid's gonna keep up a .425 slugging mark, good for a .129 ISO? Boy, that seems low to me.

To conclude:

Despite the painful math, this was really irresponsible convoluted presumptuous fun.

No matter what the numbers say, I think it's pretty clear that Xander has a bright, bright future ahead of him.

X
Log In Sign Up

forgot?
Log In Sign Up

Forgot password?

We'll email you a reset link.

If you signed up using a 3rd party account like Facebook or Twitter, please login with it instead.

Forgot password?

Try another email?

Almost done,

By becoming a registered user, you are also agreeing to our Terms and confirming that you have read our Privacy Policy.

Join Over the Monster

You must be a member of Over the Monster to participate.

We have our own Community Guidelines at Over the Monster. You should read them.

Join Over the Monster

You must be a member of Over the Monster to participate.

We have our own Community Guidelines at Over the Monster. You should read them.

Spinner.vc97ec6e

Authenticating

Great!

Choose an available username to complete sign up.

In order to provide our users with a better overall experience, we ask for more information from Facebook when using it to login so that we can learn more about our audience and provide you with the best possible experience. We do not store specific user data and the sharing of it is not required to login with Facebook.

tracking_pixel_9351_tracker