If you follow me on Twitter, you probably see me joke about advanced statistics from time to time. Those that “know” me well know that I love to bring up the VORP statistic whenever possible, and I even had my Twitter handle as “VORP” for a little while to display my love for the statistic.
Because I often ruffle the feathers of people who are into advanced analytics, I think I might have a reputation for not valuing analytics or not understanding them. I really couldn’t care less about this reputation, even though I actually do highly value analysis and analytics.
But, I have wanted to write some words on this topic for a while now, so here goes:
I have worked for over 15 years analyzing data. In fact, I work with data and numbers all day, everyday. Moreover, I often have to distill thousands of lines of data into digestible reports, tables, charts, etc. for executive level leaders using various tools and database applications.
I have a graduate degree in business and have taken graduate level Statistics courses. During my time in grad school, I even took a Sports Analytics course with an instructor who oversaw data analytics for a professional team in the area.
Suffice it to say that I have read, studied, used, and tried to project data for a living. I am not approaching this as a complete novice.
Does any of this qualify me as an authority on the subject? Absolutely not? Is that going to stop me from opining on the topic? Absolutely not.
Do you really know what you’re talking about, or did you just watch/read Moneyball?
Perhaps my favorite thing about people that throw out “Advanced” Statistics is that they ALWAYS get upset if you challenge them or ask them questions about the statistic – or why one player ranks higher than someone else in that statistic. Why is this always the case? My strong belief is that most people that quote these statistics really don’t understand much about them (their limitations, their usefulness, who developed them and why, etc.).
Below are some questions I think you should ask yourself when someone throws out a statistic:
- Why is this stat useful?
- How is this stat calculated?
- Is this statistic skewed towards a certain type of player? Why?
- How well does this statistic isolate the player from the rest of the team?
- How well does this statistic isolate other variables? Pace? Opponent? Etc.
I know the PER stat that John Hollinger created has lost steam in recent years. Apparently, it is flawed in some way – from what I have been told on Twitter – but I have no idea why.
Regardless, below is the calculation of PER:
uPER = (1 / MP) *
+ (2/3) * AST
+ (2 – factor * (team_AST / team_FG)) * FG
+ (FT *0.5 * (1 + (1 – (team_AST / team_FG)) + (2/3) * (team_AST / team_FG)))
– VOP * TOV
– VOP * DRB% * (FGA – FG)
– VOP * 0.44 * (0.44 + (0.56 * DRB%)) * (FTA – FT)
+ VOP * (1 – DRB%) * (TRB – ORB)
+ VOP * DRB% * ORB
+ VOP * STL
+ VOP * DRB% * BLK
– PF * ((lg_FT / lg_PF) – 0.44 * (lg_FTA / lg_PF) * VOP) ]
pace adjustment = lg_Pace / team_Pace
aPER = (pace adjustment) * uPER
PER = aPER * (15 / lg_aPER)
factor = (2 / 3) - (0.5 * (lg_AST / lg_FG)) / (2 * (lg_FG / lg_FT))
VOP = lg_PTS / (lg_FGA - lg_ORB + lg_TOV + 0.44 * lg_FTA)
DRB% = (lg_TRB - lg_ORB) / lg_TRB
You got all that? Of course you do. You know that the limitations and usefulness of this statistic just by looking at it, right? It was the go-to statistic for a while, but now it is almost completely disregarded. Why?
Are these stats really telling us anything new?
Despite how long and drawn out the PER statistic above is, you can see that the calculation is just a derivative of a bunch of regular statistics. Weighting each item and adjusting for pace, etc. is neat and probably holds some value, but does doing all of this really tell you anything new? If you look at the top 10 in most of these advanced statistics, is it a shock to see Steph Curry, LeBron James, James Harden, etc. at the top of each of them?
Analysis vs. Analytics
Now that we have all of these fancy statistics, we should really know what to do with them, right? Can I put the top 5 VORP players together to get the best possible team that can be constructed? Or would putting them together hurt their individual VORPs? I know – you have no idea.
So, what do we do with these statistics? That brings us to the difference between Analysis and Analytics. Firstly, taking available data and scrubbing it, manipulating it, organizing it, distilling it, etc. allows you to come up with a metric or set of metrics. That is the analysis part. But, what do you do next?
This is the tricky part. It’s why most of this article has been question format. Analytics is using all of the data and metrics that you have to try to project what will happen in the future. Therefore, this involves building forecast models and models that allow you to change variables and see the impact that this has on a specific metric (most likely wins – points scored – points allowed – etc). And then, you have to make decisions based on which models (or other data sources) you believe.
My idol, Sam Hinkie said it best in his famous (among Sixers/NBA fans) resignation letter:
There is so much about projecting players that we still capture best by seeing it in person and sharing (and debating) those observations with our colleagues. What kind of teammate is he? How does he play under pressure? How broken is his shot? Can he fight over a screen? Does he respond to coaching? How hard will he work to improve? And maybe the key one: will he sacrifice—his minutes, his touches, his shots, his energy, his body—for the ultimate team game that rewards sacrifice? That information, as imperfect and subjective as it may be, comes to light most readily in gyms and by watching an absolute torrent of video.
I think this is my biggest gripe with the analytics people on Twitter. Save for a few, I don’t think many people really understand what to do with these statistics. How can we predict or project based off of them? As an example, a guy like Robert Covington, who has been polarizing in some ways, has good advanced metrics. But, would you want to have 5 Robert Covingtons in your starting lineup. Why not? Would 5 Steph Currys or 5 James Hardens on the court at the same time work? Obviously not. But why? Their advanced metrics are through the roof. Wouldn’t combining five players with very high PIPMs allow you to win just about every game?
So, why the hell did I even write this article?
I guess I just wanted to point out that it is cool to be into Analysis and Analytics. However, it’s off-putting when people look down their noses at others because they value what they see on the screen or in person vs. what a new statistic says. Or better yet, I just love when they accuse someone else of not being smart enough to understand one of these overly involved statistics. I was taught that you should observe something, then use data to test it, then build/maintain models to track and project it (if it is meaningful). Or, if you notice something from a data perspective first, test it with your eyes.
I believe that people who work in front offices are looking at data and metrics that aren’t publicly available in many cases. They are looking at things in interesting and exciting ways that will continue to change how players are evaluated and such. But, if you read Sam Hinkie’s resignation letter, you can see the humility and recognition that he displays when discussing the topic.
For example, I saw a guy on Twitter the other day, that I really like and respect, point out that a player’s PIPM was solid despite the perception that the player was under-performing. Then, a little while later, he said he made a mistake and refreshed his excel file and that the player’s PIPM was actually quite a bit lower.
You should be able to explain anomalies in any data or metric that you present – not get all upset and worked up over it. So, be ready for me to challenge you if you throw out “advanced” statistics – or be ready to block me.