I was inspired by our fearless leaders at OTM to analyze if a hot start to the season spelled good things...for our fantasy teams! First of all, a few disclaimers: I only have last season's worth of data to pull from, so the sample size is small, though over our three leagues we have an n that's better than 30 (48, 16 players x 3 leagues), so it is actually not terrible to begin with. More importantly, I haven't worked with stats in a couple years and I only have access to basic online tools since I don't have access to SPSS at the moment. So please, if you have expertise, please tell me if and when I make a critical error in this stuff. For the moment I'm just sticking to very basic linear regression analysis based on my limited knowledge and tools.
So my basic question is: how significant is April to gauging a team's success? To find out, I looked at the standings and points in April in all three leagues, then compared them to the final season standings and points.
Here is my dataset:
End of April End of Season
|Fried Chick||3||1368||Bash Dash||16||6333.5|
|Bash Dash||2||1317.5||Fried Chick||12||6754.5|
First, I ran a basic linear regression to find out how correlated points are to wins in the final standings. I came up with an r of .7975. Pretty good! This means that points and wins are highly correlated - not a big surprise, though having the most points does NOT mean you are guaranteed to have the most wins by any stretch, as a look at the dataset can tell you.
This can be shown by the r^2 of .636. An r^2 of .636 is so-so. It means that the points dataset explain about 64% of the variation in the wins dataset. Not too surprising - some other variables we might consider in the future might be variables such as points against or some sort of "pitching blowups" factor, for example.
Now let's do the same analysis with the April dataset. The predictability drops across the board: an r of .7442 indicates that points are still highly correlated with wins, but the r^2 is down to .554 which shows a drop in the ability of points to explain the wins dataset. As it is still early, this is an expected outcome, and I am sure you can pick out some outliers across the dataset presented above for points not matching up with wins.
But let's look at if April helps explain the final standings and points. First let's see if Wins in April explain final win totals. Wins in April give us an r of .7394, indicating a fairly high degree of correlation. I was a little surprised to see this, as I expected it to drop more. But it does appear that if you do well in April you are fairly likely to carry on that high win total into September. Our r^2 is .547, showing that only about 55% of the variation in final win totals is explained by the April win totals.
Second to last, we'll look at April points and if they correlate to final points. This is actually a very interesting result. I expected April points to not correlate well to final points at all. However, they blow away the win analysis with an r of .8006 and an r^2 of .641, showing that April points are highly correlated with final points AND that they explain over 64% of the variation in the final points dataset. 64.1% is still not fantastic, but it's better than even the points vs. wins analysis, showing that what you score in April might just be a decent indicator of your team's overall points scoring ability. Of course there are still other factors to consider, like injuries or underperformance, and just ask Rogue Nine if a bad April can bury an otherwise high scoring team.
Finally, let's look at April points and final wins. Do the points you score in April help to predict your final win total? This could be valuable to showing what you should do if your team is clunking along in April. It ends up being more helpful than April wins. The r is .761 and the r^2 is .579, showing a high degree of correlation but a moderate r^2. April points are still a useful tool in general that help explain your final win total, but exercise skepticism.
This study is clearly weakened by the lack of multivariate regression analysis, which I just don't have the tools to pull off right now, unfortunately. I would expect the r's and r^2's to drop across the board when exposed to alternate causation possibilities. Still, given even this limited data set, the degree of correlation between April wins and/or point totals and final standings is much higher than I anticipated.
In conclusion, I would examine your April with a specific regard to your team (taking into account injuries and underperformance), but also keep these results in mind when deciding if you can get back into it, or if it's time for a retooling.
P.S. I have the raw datasets on file if anyone is interested in pursuing this further.