Barkley and Morey battle about the value of analytics
+9
NYCelt
wide clyde
gyso
bobheckler
Outside
Sloopjohnb
Sam
mrkleen09
Shamrock1000
13 posters
Page 2 of 2
Page 2 of 2 • 1, 2
Re: Barkley and Morey battle about the value of analytics
I agree with Mr. Kleen as to the use (and miss-use) of the per-36 stat. It is a great tool that acts like an equalizer when comparing players whose actual minutes are close to one another, like starters. When it is used to compare players whose actual minutes are extremely different, like starters and bench players, it is less useful and less of a predictor.
It is absolutely useless when applied on older player whose minutes are limited for physical reasons. The older KG, Bill Walton, Kerelenko (sp?) Shaq, and others, for instance. To say that their production at 18 minutes per game would double if they played 36 minutes is ridiculous.
I have read that it is sometimes useful when it is applied to a young lottery level type player. Dudder probably believes this as well. (LOL) An early lottery pick oftentimes does not play starter minutes. In order to predict how his stats will level out when he gets experiences and starter minutes, the per-36 stat can be found to be useful.
Per-36, per-40, per-48, per-1. No matter how you slice the minutes, it is most useful as a predictor when comparing like things and falls down a bit when comparing unlike things.
gyso
It is absolutely useless when applied on older player whose minutes are limited for physical reasons. The older KG, Bill Walton, Kerelenko (sp?) Shaq, and others, for instance. To say that their production at 18 minutes per game would double if they played 36 minutes is ridiculous.
I have read that it is sometimes useful when it is applied to a young lottery level type player. Dudder probably believes this as well. (LOL) An early lottery pick oftentimes does not play starter minutes. In order to predict how his stats will level out when he gets experiences and starter minutes, the per-36 stat can be found to be useful.
Per-36, per-40, per-48, per-1. No matter how you slice the minutes, it is most useful as a predictor when comparing like things and falls down a bit when comparing unlike things.
gyso
_________________
gyso- Posts : 23027
Join date : 2009-10-13
Re: Barkley and Morey battle about the value of analytics
I believe that there is a place for these varying amount of stat driven evaluations and there are just as many places for not using them. Barkley is not completely right or wrong on this topic. How he delivers his messages sometimes bothers people.
However, I would guess that most coaches have been combining both aspects of evaluation since the beginning of the game. My high school coach in the 60s certainly did the "eye" test and went with his "gut" in making decisions on many situations, but his stat guy (another student) used to scratch down various things during the game that I would total up in first period the next morning. He had a formula of some sort in which he put all these stats into that he thought helped him make decisions for the next practice and game.
Coach D did not have any computers or even video tape (or even films) of our games like most all coaches do these days, but he used whatever stats he liked to do something with. I cannot tell you how much it made him a better coach or us a better team, but I do know that he believed in them. And, I am sure that he was not the first basketball coach to do something very similar. I know that I used a similar system when I was coaching.
Who knows how much each coach believes in and will use the mountain of stats that are available, but there is a place for both the personal observation (Barkley's theory) and the stats.
However, I would guess that most coaches have been combining both aspects of evaluation since the beginning of the game. My high school coach in the 60s certainly did the "eye" test and went with his "gut" in making decisions on many situations, but his stat guy (another student) used to scratch down various things during the game that I would total up in first period the next morning. He had a formula of some sort in which he put all these stats into that he thought helped him make decisions for the next practice and game.
Coach D did not have any computers or even video tape (or even films) of our games like most all coaches do these days, but he used whatever stats he liked to do something with. I cannot tell you how much it made him a better coach or us a better team, but I do know that he believed in them. And, I am sure that he was not the first basketball coach to do something very similar. I know that I used a similar system when I was coaching.
Who knows how much each coach believes in and will use the mountain of stats that are available, but there is a place for both the personal observation (Barkley's theory) and the stats.
wide clyde- Posts : 815
Join date : 2014-10-22
Re: Barkley and Morey battle about the value of analytics
Charles Barkley took a social-media shot from Daryl Morey and returned fire with a vengeance Tuesday night, calling the Houston Rockets general manager "one of those idiots who believes in analytics."
The Naismith Hall of Famer and TNT analyst then disavowed the widespread use of the practice in sports, saying its proponents were "a bunch of guys who have never played the game, and they never got the girls in high school."
"First of all I've always believed analytics was crap," Barkley said on TNT's postgame coverage of the Rockets' 127-118 win over the Phoenix Suns."
If he's right, or if he's wrong, I still love Barkley's direct take on things.
Love the quote about the 'bunch of guys who never played the game and never got the girls in high school.'
There certainly is a place for analytics in sports, however...
where the human element of competition is introduced, and desire dances with physical ability, statistics lose their value quickly if not completely.
Thanks Coach Utter.
The Naismith Hall of Famer and TNT analyst then disavowed the widespread use of the practice in sports, saying its proponents were "a bunch of guys who have never played the game, and they never got the girls in high school."
"First of all I've always believed analytics was crap," Barkley said on TNT's postgame coverage of the Rockets' 127-118 win over the Phoenix Suns."
If he's right, or if he's wrong, I still love Barkley's direct take on things.
Love the quote about the 'bunch of guys who never played the game and never got the girls in high school.'
There certainly is a place for analytics in sports, however...
where the human element of competition is introduced, and desire dances with physical ability, statistics lose their value quickly if not completely.
Thanks Coach Utter.
NYCelt- Posts : 10794
Join date : 2009-10-12
Re: Barkley and Morey battle about the value of analytics
Think perhaps baseball more then other professional sports makes the best use of analytics. Not sure it serve much good in the NBA.
Prime example is the shift put on left handed pull hitters and the new comish wants to outlaw the practice. Sort of treating the players like Little leaguers, so Billy this is where you play SS ect.
beat
But did Sir Charles call him a "Knuckleheaaad"
Prime example is the shift put on left handed pull hitters and the new comish wants to outlaw the practice. Sort of treating the players like Little leaguers, so Billy this is where you play SS ect.
beat
But did Sir Charles call him a "Knuckleheaaad"
beat- Posts : 7032
Join date : 2009-10-13
Age : 71
Re: Barkley and Morey battle about the value of analytics
Beat, totally agree baseball is much better suited for analytics than hoops. Good point...
Shamrock1000- Posts : 2711
Join date : 2013-08-19
Re: Barkley and Morey battle about the value of analytics
Per 36 is not predictive. Repeat: Not predictive Those who choose to use it for predictive reasons are wrong. That's not the stat's fault. It's not designed to be predictive. If I want to rank Celtics players in lack of turnovers and I simply list them all in terms of number of turnovers per game, I'm penalizing the guys who play more minutes. That's wrong!!!!!
If, instead, I rank them on number of turnovers per minute, I'm doing what stats are supposed to do. I'm controlling for the greatest potential bias in the result by using the same frame of reference for each of the two players.
If someone misconstrues or misuses the stat, it's not the stat's fault. It's not the statistician's fault, unless the statistician is making false claims about the stat and how it can be used (for which I personally take Hollinger to task every time I feel like ranting about something.)
• The stat does not conjecture whether fatigue or differences in the quality of competition or any other factor is the reason for the differential. People conjecture.
• The stat does not conclude that Player A is better than Player B. People draw conclusions.
• The stat does not predict what would happen to Player A's turnover rate if he played more. People make predictions.
• The stat does not suggest that the coach should change the two players' respective playing times. People make suggestions.
Essentially, a stat is a snapshot of a particular milieu within a given time or situational frame, and the milieu it's representing in this case is the dynamics of a activity (professional basketball) in which a coach decides to play some guys more and some guys less. That's reality. All the stat tries to do is to represent reality in as unbiased a fashion as possible.
Please don't write me telling me how the statistic is misused. I know how it's misused. I've cautioned countless people on how any number of basketball stats are misused. I'm a professional statistician. I'm not a keeper of those who use statistics. Don't blame the stat or those who originated it unless they're also trying to misrepresenting it to the public as more than it really is. (If you've ever seen me take off on Hollinger, you'll know I practice what I preach.)
Blame those who misuse the stat. Blame those who publish convoluted "pop stats" designed to delude people into thinking the stats have credibility. Blame the reporters who cite +/- figures for individual players. Blame the columnists who persist in alluding to Hollinger's PER in evaluating players. Even worse, blame the columnists who use PER with a tiny caveat that it's largely offensive in nature (oh boy, is it offensive!), knowing the public will ignore the caveat.
I've been on record since the early BDC days as believing that basketball stats are generally over-used and misused. I believe that I made the claim one time that anyone could give me virtually any basketball stat and I could find a problem with it. (If I did make that claim, I'm glad people didn't take me up on it, because I get too little sleep as it is.) Many of you will recall hearing me use the term "context" over and again in regard to stats. Disregard of the context(s) within which a given stat is used is a huge problem. For example, believing that the per-X stat is predictive is evidence of a disregard of context.
Anyway, people are obviously free to use whatever stats they wish. But, if you find yourself disliking the results and are prone to blaming the construct of the stat itself, try diverting your attention to the conclusions or interpretations, who made them, and whether they're truly justified.
Sam
If, instead, I rank them on number of turnovers per minute, I'm doing what stats are supposed to do. I'm controlling for the greatest potential bias in the result by using the same frame of reference for each of the two players.
If someone misconstrues or misuses the stat, it's not the stat's fault. It's not the statistician's fault, unless the statistician is making false claims about the stat and how it can be used (for which I personally take Hollinger to task every time I feel like ranting about something.)
• The stat does not conjecture whether fatigue or differences in the quality of competition or any other factor is the reason for the differential. People conjecture.
• The stat does not conclude that Player A is better than Player B. People draw conclusions.
• The stat does not predict what would happen to Player A's turnover rate if he played more. People make predictions.
• The stat does not suggest that the coach should change the two players' respective playing times. People make suggestions.
Essentially, a stat is a snapshot of a particular milieu within a given time or situational frame, and the milieu it's representing in this case is the dynamics of a activity (professional basketball) in which a coach decides to play some guys more and some guys less. That's reality. All the stat tries to do is to represent reality in as unbiased a fashion as possible.
Please don't write me telling me how the statistic is misused. I know how it's misused. I've cautioned countless people on how any number of basketball stats are misused. I'm a professional statistician. I'm not a keeper of those who use statistics. Don't blame the stat or those who originated it unless they're also trying to misrepresenting it to the public as more than it really is. (If you've ever seen me take off on Hollinger, you'll know I practice what I preach.)
Blame those who misuse the stat. Blame those who publish convoluted "pop stats" designed to delude people into thinking the stats have credibility. Blame the reporters who cite +/- figures for individual players. Blame the columnists who persist in alluding to Hollinger's PER in evaluating players. Even worse, blame the columnists who use PER with a tiny caveat that it's largely offensive in nature (oh boy, is it offensive!), knowing the public will ignore the caveat.
I've been on record since the early BDC days as believing that basketball stats are generally over-used and misused. I believe that I made the claim one time that anyone could give me virtually any basketball stat and I could find a problem with it. (If I did make that claim, I'm glad people didn't take me up on it, because I get too little sleep as it is.) Many of you will recall hearing me use the term "context" over and again in regard to stats. Disregard of the context(s) within which a given stat is used is a huge problem. For example, believing that the per-X stat is predictive is evidence of a disregard of context.
Anyway, people are obviously free to use whatever stats they wish. But, if you find yourself disliking the results and are prone to blaming the construct of the stat itself, try diverting your attention to the conclusions or interpretations, who made them, and whether they're truly justified.
Sam
Re: Barkley and Morey battle about the value of analytics
Sam,
Why do you continually use the per minute stat as the bee's knees and put down the per-36? Mathematically, any ratio between two player's per-1 stats is exactly the same as the ratio between their per-36 stats. Math is math. I assure you that I fully understand math. The per-1 just offers up smaller numbers, but the ratio of the smaller numbers is the same as the ratio of the larger numbers. Player A has 25% more turnovers per unit time than Player B.
You yourself said "I'm controlling for the greatest potential bias in the result by using the same frame of reference for each of the two players."
The frame of reference is stat per some unit of time. Per-1 is just as useless as per-36 (or per-48 or per-40 or per million) if the user of the tool doesn't understand it's limitations.
Besides, not many players only play one minute, but many starters play 36 minutes (plus or minus) That is why many use per-36.
gyso
Why do you continually use the per minute stat as the bee's knees and put down the per-36? Mathematically, any ratio between two player's per-1 stats is exactly the same as the ratio between their per-36 stats. Math is math. I assure you that I fully understand math. The per-1 just offers up smaller numbers, but the ratio of the smaller numbers is the same as the ratio of the larger numbers. Player A has 25% more turnovers per unit time than Player B.
You yourself said "I'm controlling for the greatest potential bias in the result by using the same frame of reference for each of the two players."
The frame of reference is stat per some unit of time. Per-1 is just as useless as per-36 (or per-48 or per-40 or per million) if the user of the tool doesn't understand it's limitations.
Besides, not many players only play one minute, but many starters play 36 minutes (plus or minus) That is why many use per-36.
gyso
_________________
gyso- Posts : 23027
Join date : 2009-10-13
Re: Barkley and Morey battle about the value of analytics
Gyso - I don't want to speak for Sam, but I would like to add my two cents. When using a stat like per-36 to compare players, the underlying assumption in applying this stat is that the game conditions which two (or more) different players experience in a given time frame are approximately equal. If this condition is not met, then the per-36 stat loses meaning. For example, a bench player might have an inflated per-36 because he plays in garbage time. He is playing against inferior competition, and thus it becomes easier to produce. Additionally, a "star" might be asked to take more difficult, lower percentage shots, whereas a rookie getting six minutes a night might be more inclined to only take wide open shots. Furthermore, he is well rested for the few minutes he plays, and thus has more energy to execute basketball-related tasks, Obviously I am over-simplifying things, but hopefully you get the point. Nearly the exact same arguments could be made for plus/minus numbers. It is essential to understand the assumptions underlying any mathematical model, be it statistical or otherwise. This may be what Sam means by "context", but I will leave it to Sam to clarify.
I don't think anyone would argue stats are useless, but neither are they magical tools that reveal basketball-worthiness that is otherwise invisible. Player evaluation ain't easy, otherwise we would all be scouts/GMs/coaches rather (or in addition to..) than posting here.
I don't think anyone would argue stats are useless, but neither are they magical tools that reveal basketball-worthiness that is otherwise invisible. Player evaluation ain't easy, otherwise we would all be scouts/GMs/coaches rather (or in addition to..) than posting here.
Shamrock1000- Posts : 2711
Join date : 2013-08-19
Re: Barkley and Morey battle about the value of analytics
gyso, the real reason I use the per minute stat is to deflect the criticism of the concept because the figure that happens to be used is 36. Apparently, the figure 36, being a figure that appears frequently in boxscores fixates some people on the myth that the 36 somehow represents a durability test or something of the sort. They don't "get" the fact that the 36 is just an arbitrary constant used in the equation to put all players on the same mathematical plane.
I've explained, in this very thread, that all you have to do is divide the per 36 minute stat by 36 and you get the same relative result. Look back at my previous posts on this thread, and you'll see where I did that. What you don't get with the per minute stat is the constant reinforcement to some people of their misconception concerning the implications of the figure 36 when it's only a common divisor.
If you think I've ever put down the per-36 stat in terms of its validity, you're wrong. The only PER stat I put down is Hollinger's abortion, which is an entirely different thing. I just think that arbitrary per-36 winds up misleading more people than it enlightens because it is just realistic enough to encourage them to jump to the wrong conclusion about the intent of the stat. If you don't believe it misleads more than it enlightens, just reread this thread. I find that using per-1 at least makes many people stop and think conceptually rather than so literally.
Sam
I'm speaking only of the validity of the statistic per se, whether it's per-1 or per-36 or per-whatever. due to the way it controls a potential bias.
I asked that people not write me about the misuse of the stat by unenlightened people. So perhaps you'll pardon my not responding your comments about the "useless"ness of the stat "if the user of the tool doesn't understand its limitations."
I choose not to be the Johnny Appleseed of statistics and sewing the seeds of statistical clarification far and wide. However, I'm glad you recognized my attempt to do my little part of clarification by removing the 36 from the equation and replacing it with 1 (and I believe also said 200 or something).
I've explained, in this very thread, that all you have to do is divide the per 36 minute stat by 36 and you get the same relative result. Look back at my previous posts on this thread, and you'll see where I did that. What you don't get with the per minute stat is the constant reinforcement to some people of their misconception concerning the implications of the figure 36 when it's only a common divisor.
If you think I've ever put down the per-36 stat in terms of its validity, you're wrong. The only PER stat I put down is Hollinger's abortion, which is an entirely different thing. I just think that arbitrary per-36 winds up misleading more people than it enlightens because it is just realistic enough to encourage them to jump to the wrong conclusion about the intent of the stat. If you don't believe it misleads more than it enlightens, just reread this thread. I find that using per-1 at least makes many people stop and think conceptually rather than so literally.
Sam
I'm speaking only of the validity of the statistic per se, whether it's per-1 or per-36 or per-whatever. due to the way it controls a potential bias.
I asked that people not write me about the misuse of the stat by unenlightened people. So perhaps you'll pardon my not responding your comments about the "useless"ness of the stat "if the user of the tool doesn't understand its limitations."
I choose not to be the Johnny Appleseed of statistics and sewing the seeds of statistical clarification far and wide. However, I'm glad you recognized my attempt to do my little part of clarification by removing the 36 from the equation and replacing it with 1 (and I believe also said 200 or something).
Re: Barkley and Morey battle about the value of analytics
Shamrock, thanks for the comments. Actually, the underlying assumption of the per-whatever stat is not that "the game conditions which two (or more) players experience in a given time frame are approximately equal." There's no assumption of anything in the construction of the stat. It's just a calculation of figures designed to put players on the same frame of reference. It doesn't assume anything. It doesn't predict anything. It just controls for the fact that the stats of different players may be influenced by the number of minutes they play—a likelihood that I believe would be borne out by statistics.
Of course there are many circumstances in basketball that can affect a player's production. What the per-whatever stat is saying is: "Here's a normalized calculation that tells you what various players produce during the conditions under which they play. This stat is not intended to prove or represent anything about the different conditions."
If one wants to take these raw per-whatever data and massage them with other factors that you believe represent extenuating circumstances in a given player's performance, one is free to do so." That's why God invented the phrase: "It should be noted that....." Statisticians use that one a lot. You'll see it in my posts. It takes a little more work than just allowing a stat to stand naked by itself while assuming it means whatever one wants it to mean, but there's nothing wrong with presenting some clarifying qualifiers. "Young averages 333% as many made threes per minute as Turner, but it should be noted that Young averages 385% more attempted threes per minute than Turner, and James is not called upon for nearly as great a variety of taxing contributions or minutes as Evan."
Suppose I said on the board that Turner has a much higher ppg average than Young. How many people do you think would wrestle with their conviviality genes to keep from responding, "Of course he does, dummy. He plays a lot more minutes per game." Failing to account for that outrageously obvious fact would be a cardinal statistical sin. That's what the per-whatever stat does.
There are many other factors that could also be incorporated into the formula if stats could be found for them. For instance, how about a wear and tear factor? That would presumably answer questions about differences in playing time. But nay, nay. Would minutes be a valid "wear and tear" factor in comparing, for example, Sully with John Havlicek? (Even at their respective ages now.) Nay, nay. You'd need to introduce some more high falutin' control factor. Or how about incorporating the length or difficulty of players' commutes? Yeah, that's silly, but I could think of hosts of realistic considerations that could be added. In fact, I included 21 of them in a paragraph that I deleted out of respect for everyone's boredom level. Differences in competition. Differences in pressure situations faced. Differences in the competence of teammates. Blah, blah, blah.
Any basketball statistic could be subjected to so many splits that you'd have 100 available stats just concerning assists. That's foolish. What good basketball stats do is to try to control for the most obvious biases and let 'er rip. The per-whatever stat is a perfect example.
Sam
Of course there are many circumstances in basketball that can affect a player's production. What the per-whatever stat is saying is: "Here's a normalized calculation that tells you what various players produce during the conditions under which they play. This stat is not intended to prove or represent anything about the different conditions."
If one wants to take these raw per-whatever data and massage them with other factors that you believe represent extenuating circumstances in a given player's performance, one is free to do so." That's why God invented the phrase: "It should be noted that....." Statisticians use that one a lot. You'll see it in my posts. It takes a little more work than just allowing a stat to stand naked by itself while assuming it means whatever one wants it to mean, but there's nothing wrong with presenting some clarifying qualifiers. "Young averages 333% as many made threes per minute as Turner, but it should be noted that Young averages 385% more attempted threes per minute than Turner, and James is not called upon for nearly as great a variety of taxing contributions or minutes as Evan."
Suppose I said on the board that Turner has a much higher ppg average than Young. How many people do you think would wrestle with their conviviality genes to keep from responding, "Of course he does, dummy. He plays a lot more minutes per game." Failing to account for that outrageously obvious fact would be a cardinal statistical sin. That's what the per-whatever stat does.
There are many other factors that could also be incorporated into the formula if stats could be found for them. For instance, how about a wear and tear factor? That would presumably answer questions about differences in playing time. But nay, nay. Would minutes be a valid "wear and tear" factor in comparing, for example, Sully with John Havlicek? (Even at their respective ages now.) Nay, nay. You'd need to introduce some more high falutin' control factor. Or how about incorporating the length or difficulty of players' commutes? Yeah, that's silly, but I could think of hosts of realistic considerations that could be added. In fact, I included 21 of them in a paragraph that I deleted out of respect for everyone's boredom level. Differences in competition. Differences in pressure situations faced. Differences in the competence of teammates. Blah, blah, blah.
Any basketball statistic could be subjected to so many splits that you'd have 100 available stats just concerning assists. That's foolish. What good basketball stats do is to try to control for the most obvious biases and let 'er rip. The per-whatever stat is a perfect example.
Sam
Re: Barkley and Morey battle about the value of analytics
I love this board. Could you imagine having a conversation this intellectual on another sports board? I can't. Is it a bit dry? Yeah, like a bone, but it cuts to the heart of how we (i.e. sports fans) compare players and that inevitably leads to the inclusion of other extraneous factors such as their salaries (are they worth the money?) but excludes relevant comparisons like their style of play (i.e. Rondo in an uptempo style vs half-court) and impacts all our other conversations about trades. In other words, it is an important topic.
bob
.
bob
.
bobheckler- Posts : 62620
Join date : 2009-10-28
Re: Barkley and Morey battle about the value of analytics
sam wrote: Of course there are many circumstances in basketball that can affect a player's production. What the per-whatever stat is saying is: "Here's a normalized calculation that tells you what various players produce during the conditions under which they play. This stat is not intended to prove or represent anything about the different conditions."
sam wrote:Differences in competition. Differences in pressure situations faced. Differences in the competence of teammates. Blah, blah, blah.
In two passages you basically define my problem with the per whatever stat.
If you as a scientist or researcher or statistician come up with a finding or equation that mathematically speaking is perfectly valid, yet everyday in all kinds of discussions, your stat is regularly and repeatedly misused and misrepresented - how perfect can your stat actually be? Isnt part of scientific discovery not just discovering a new way to look at something, but ensuring that your findings have practical utility in the real world?
We could slow climate change and ozone problems by telling people to "stop driving" - but clearly that isnt going to happen. So a realistic scientific response to climate change cannot end with "...well we told you all to stop driving and you didnt, so whatever happens to the environment going forward it not our problem. Scientists who understood practical utility did not leave it with a "take it or leave it" attitude of the above finding, they dug deeper looking at how much could improvements in fuel efficiency and emissions help over the long run. They didnt stop with "Blah, Blah, Blah - they found a way to apply their findings to the real world.
And in the end, that is my only point. Even if a stat was created using sound mathematical and statistical principals, if the practical application of said stat is misunderstood and people on TV, Radio and in newspapers are misusing it every single day - how successful can your stat actually be?
mrkleen09- Posts : 3873
Join date : 2009-10-16
Age : 55
Re: Barkley and Morey battle about the value of analytics
I enjoyed reading this thread too as I'm not smart enough to make a valid point either way as I'm not a real stat guy.
cowens/oldschool- Posts : 27707
Join date : 2009-10-18
Re: Barkley and Morey battle about the value of analytics
Mrkleen,
"...a finding or equation that is perfectly valid." There's no such thing. As I said before, miles per gallon is a widely accepted way to compare fuel expenditures in various cars. Yet, shouldn't the gas mileage of Subarus be calculated differently than Fords in order to allow for the fact that Subarus (with sales concentrated largely in cold-weather climes) are subjected, on average, to tougher driving conditions? OOoooh, I may never buy my 14th Subaru because there's a flaw in that stat.
Shouldn't MPH wind forecasts incorporate a factor for the duration of the wind because, after all, isn't one of the main uses of the wind stat devoted to estimating how much threat of damage or outages an anticipated wind involves? And isn't duration of the wind a factor? OOoooh, I may never watch another weather forecast because a there's a flaw in that stat.
Name me a basketball stat that isn't, as you say, "regularly and repeatedly misused and misrepresented," and I'll very likely be able to cite one or more variables that should be controlled in an effort to attempt "perfection" (which never occurs) but are not controlled in existing formulas. Name one of Hollinger's stats, and I'll dance the little known "Meringue of the Flaws" on Hollinger's head.
Statisticians are not necessarily missionaries, spreading the Gospel on how to use their stats. I've already said where the blame for misuse lies. How is it that people have access to basketball stats that are "repeatedly and regularly misrepresented?" Do statisticians also own publishing houses?
Who should be the ultimate judge as to whether a given basketball statistic has "practical application in the real world?" You? Well, then, why don't you also take on the side job of touring the country and making a real contribution to society by administering, in effect, a "how to use statistics.?" Tell them how political polls are often used to change their opinions rather than to measure it.
Usually, when a statistician (even Hollywood, or whatever his name is) comes up with a formula, (s)he also produces a methodology which, among other things, includes a "margin of error." Do you think the margin of error is cited just for kicks? No. It (along with a methodological appendix) is designed to acquaint people with the details of the stat, including (1) points of caution in using it and (2) notification about the presence and size of margin of error. If the media who publish the stat elect not to incorporate the methodological summary, is the statistician to blame?
The statistician may also include statistical tests of how much variation in one statistic called a "dependent variable" (say points per game) is explained by the variation in another stat called an "independent variable" (say minutes played per game). If the statistician concludes that not enough variation in the dependent variable is explained by variation in the dependent variable, (s)he may toss the whole exercise or go back and add more independent variables to the formula in order to explain greater variation in the dependent variable (or, in the case I just cited, points per game). The objective is to achieve 100% explained variation in the dependent variable via the combination of the independent variables that appear in the formula. The next 100% explanation of variation could well be the first in history; but many important business decisions are made based on formulas that explain only a portion of the variation in the dependent variable—but a portion that is deemed acceptable by those using the stat.
And, between judgment and evaluative stats such as tests of explained variation, I'm sure that some statistician, somewhere, has justified the use of the per-minute (or 36 minutes) stat for certain basketball analyses. Perhaps (s)he even included more variables than just minutes per game into the equation and came up with the finding that, of all the variables used in the equation, minutes played had by far a greater influence on points scored than were registered by any of the other variables. So, perhaps instead of explaining (let's say) 85% of the variation in scoring by a combination of all variables, 79% could be explained by minutes alone. As they say, "...good enough for government work."
It's very easy for one to sit back and pontificate that perfection should characterize everything in the universe, but unfortunately I think that may not be the case. In statistics, you do your best with what is available to you, and then you decide whether it meets a statistical and/or judgmental criterion for validity.
One could apply your concern about the validity of the per-minute or per-36 minute stat to virtually every basketball stat. Let's take the simple measure of assists. Why doesn't the calculation of assists include a factor to allow for the difference between criteria for awarding assists in the earlier years of the league from the criteria existing now? People "repeatedly and regularly misrepresent the difference" between Stockton's and Cousy's assist totals by not considering that very essential factor. Personally, Mrkleen, I think you should organize an anti-assist-usage movement for reasons of statistical invalidity. Could you please do that for me?
Sam
Sam
"...a finding or equation that is perfectly valid." There's no such thing. As I said before, miles per gallon is a widely accepted way to compare fuel expenditures in various cars. Yet, shouldn't the gas mileage of Subarus be calculated differently than Fords in order to allow for the fact that Subarus (with sales concentrated largely in cold-weather climes) are subjected, on average, to tougher driving conditions? OOoooh, I may never buy my 14th Subaru because there's a flaw in that stat.
Shouldn't MPH wind forecasts incorporate a factor for the duration of the wind because, after all, isn't one of the main uses of the wind stat devoted to estimating how much threat of damage or outages an anticipated wind involves? And isn't duration of the wind a factor? OOoooh, I may never watch another weather forecast because a there's a flaw in that stat.
Name me a basketball stat that isn't, as you say, "regularly and repeatedly misused and misrepresented," and I'll very likely be able to cite one or more variables that should be controlled in an effort to attempt "perfection" (which never occurs) but are not controlled in existing formulas. Name one of Hollinger's stats, and I'll dance the little known "Meringue of the Flaws" on Hollinger's head.
Statisticians are not necessarily missionaries, spreading the Gospel on how to use their stats. I've already said where the blame for misuse lies. How is it that people have access to basketball stats that are "repeatedly and regularly misrepresented?" Do statisticians also own publishing houses?
Who should be the ultimate judge as to whether a given basketball statistic has "practical application in the real world?" You? Well, then, why don't you also take on the side job of touring the country and making a real contribution to society by administering, in effect, a "how to use statistics.?" Tell them how political polls are often used to change their opinions rather than to measure it.
Usually, when a statistician (even Hollywood, or whatever his name is) comes up with a formula, (s)he also produces a methodology which, among other things, includes a "margin of error." Do you think the margin of error is cited just for kicks? No. It (along with a methodological appendix) is designed to acquaint people with the details of the stat, including (1) points of caution in using it and (2) notification about the presence and size of margin of error. If the media who publish the stat elect not to incorporate the methodological summary, is the statistician to blame?
The statistician may also include statistical tests of how much variation in one statistic called a "dependent variable" (say points per game) is explained by the variation in another stat called an "independent variable" (say minutes played per game). If the statistician concludes that not enough variation in the dependent variable is explained by variation in the dependent variable, (s)he may toss the whole exercise or go back and add more independent variables to the formula in order to explain greater variation in the dependent variable (or, in the case I just cited, points per game). The objective is to achieve 100% explained variation in the dependent variable via the combination of the independent variables that appear in the formula. The next 100% explanation of variation could well be the first in history; but many important business decisions are made based on formulas that explain only a portion of the variation in the dependent variable—but a portion that is deemed acceptable by those using the stat.
And, between judgment and evaluative stats such as tests of explained variation, I'm sure that some statistician, somewhere, has justified the use of the per-minute (or 36 minutes) stat for certain basketball analyses. Perhaps (s)he even included more variables than just minutes per game into the equation and came up with the finding that, of all the variables used in the equation, minutes played had by far a greater influence on points scored than were registered by any of the other variables. So, perhaps instead of explaining (let's say) 85% of the variation in scoring by a combination of all variables, 79% could be explained by minutes alone. As they say, "...good enough for government work."
It's very easy for one to sit back and pontificate that perfection should characterize everything in the universe, but unfortunately I think that may not be the case. In statistics, you do your best with what is available to you, and then you decide whether it meets a statistical and/or judgmental criterion for validity.
One could apply your concern about the validity of the per-minute or per-36 minute stat to virtually every basketball stat. Let's take the simple measure of assists. Why doesn't the calculation of assists include a factor to allow for the difference between criteria for awarding assists in the earlier years of the league from the criteria existing now? People "repeatedly and regularly misrepresent the difference" between Stockton's and Cousy's assist totals by not considering that very essential factor. Personally, Mrkleen, I think you should organize an anti-assist-usage movement for reasons of statistical invalidity. Could you please do that for me?
Sam
Sam
Re: Barkley and Morey battle about the value of analytics
Mrkleen,
Remember when I said that some statistician somewhere must have done some test that convinced him that the per-X stat explains more variation in performance than any other single stat.
Well, now you know a statistician who did just that.
Me.
I did a correlation analysis on the 18 players who have appeared this season in a Celtics uniform. And i'm not interested in hearing "small sample" cries. It's not a sample. It's a census of every single Celtics player to appear in at least one game during this season. Not one minute played this season for the Celtics is unaccounted for. If you'd like to replicate the exercise for every team in the league, perhaps over 35 or 40 seasons, I'll give you the formula.
In a correlation exercise, a result (called a "coefficient of correlation") of +1.0 represents perfect correlation (I've never seen one) between two or more variables; zero represents no correlation; and -1.0 represents perfect inverse (negative) correlation (I've never seen one of those either).
In a correlation exercise, there are two kinds of variables: independent variables (in this case, minutes on the floor), which are being tested to determine how much influence they have on dependent variables (the variable, such as scoring, that may or may not prove to be influenced by one or more of the independent variables. In this test, I used only the two variables of points scored per game (dependent0 and minutes played per game (independent0.
In well over 500 research studies on which I've used this statistical test in business, social science and sports research, it's unusual to arrive a coeffficient of correlation substantially higher than +0.50. That's because most phenomena that are worth testing are complex enough that the usual result is one dominant influence and a host of less major influences fighting to comprise the remaining bits and pieces of the explained variation.
The correlation coefficient relating scoring to minutes played was +.64. That's not a percentage; it's a coefficient. And it's considered quite high. What it means is that, whatever other less important variables might be fighting it out for the dregs of the remaining +.355, it's extremely likely that none of them could have a correlation coefficient remotely approaching the .645 influence of minutes on scoring.
As I said before, the per-whatever stat attempts to control the single most influential factor on scoring, assists, rebounds, or practically any other stat on which players are evaluated. By "control," I mean (in this case) that the per-whatever statistic attempts to minimize the most serious bias that could creep into conclusions based on using the stats-per-game measure.
It's ironic that, the more influences anyone could think of that might affect scoring, the more variables there are to chop up relatively tiny pieces of the other +.35.5 of the correlation coefficient...and the more reason there is not to go to the trouble of adding another secondary variable really won't explain a meaningful incremental amount of the variation in the dependent variable.
There's another reason for using the most dominant statistic. It involves "intervening variables." Let's say you wanted to introduce an independent variable that investigates how a player's ability level (however you want to measure it) influences his scoring. That should be a no-brainer of an independent variable. Right? Wrong! Because it's highly likely that the player's ability level influences the minutes he's given. The variation measured by talent level is largely duplicated by the per-X stat; and the per-X stat accomplishes a lot more too. The formulas for explained variation don't simply measure how much variation is reflected by a given independent stat. The formula identifies the single most influential stat (in this case, minutes) and then (if other independent variables were part of the exercise, which they were in the one I did), it evaluates the influence of the secondary variables based not on how much influence they have but rather on how much incremental influence they have that is not duplicated by the primary independent variable—in this case, minutes.
Anyway, I'm going to stop before we get zapped by the big power man in the sky. Stay safe and warm, everyone. Good luck to BobH on your way home.
Sam
Remember when I said that some statistician somewhere must have done some test that convinced him that the per-X stat explains more variation in performance than any other single stat.
Well, now you know a statistician who did just that.
Me.
I did a correlation analysis on the 18 players who have appeared this season in a Celtics uniform. And i'm not interested in hearing "small sample" cries. It's not a sample. It's a census of every single Celtics player to appear in at least one game during this season. Not one minute played this season for the Celtics is unaccounted for. If you'd like to replicate the exercise for every team in the league, perhaps over 35 or 40 seasons, I'll give you the formula.
In a correlation exercise, a result (called a "coefficient of correlation") of +1.0 represents perfect correlation (I've never seen one) between two or more variables; zero represents no correlation; and -1.0 represents perfect inverse (negative) correlation (I've never seen one of those either).
In a correlation exercise, there are two kinds of variables: independent variables (in this case, minutes on the floor), which are being tested to determine how much influence they have on dependent variables (the variable, such as scoring, that may or may not prove to be influenced by one or more of the independent variables. In this test, I used only the two variables of points scored per game (dependent0 and minutes played per game (independent0.
In well over 500 research studies on which I've used this statistical test in business, social science and sports research, it's unusual to arrive a coeffficient of correlation substantially higher than +0.50. That's because most phenomena that are worth testing are complex enough that the usual result is one dominant influence and a host of less major influences fighting to comprise the remaining bits and pieces of the explained variation.
The correlation coefficient relating scoring to minutes played was +.64. That's not a percentage; it's a coefficient. And it's considered quite high. What it means is that, whatever other less important variables might be fighting it out for the dregs of the remaining +.355, it's extremely likely that none of them could have a correlation coefficient remotely approaching the .645 influence of minutes on scoring.
As I said before, the per-whatever stat attempts to control the single most influential factor on scoring, assists, rebounds, or practically any other stat on which players are evaluated. By "control," I mean (in this case) that the per-whatever statistic attempts to minimize the most serious bias that could creep into conclusions based on using the stats-per-game measure.
It's ironic that, the more influences anyone could think of that might affect scoring, the more variables there are to chop up relatively tiny pieces of the other +.35.5 of the correlation coefficient...and the more reason there is not to go to the trouble of adding another secondary variable really won't explain a meaningful incremental amount of the variation in the dependent variable.
There's another reason for using the most dominant statistic. It involves "intervening variables." Let's say you wanted to introduce an independent variable that investigates how a player's ability level (however you want to measure it) influences his scoring. That should be a no-brainer of an independent variable. Right? Wrong! Because it's highly likely that the player's ability level influences the minutes he's given. The variation measured by talent level is largely duplicated by the per-X stat; and the per-X stat accomplishes a lot more too. The formulas for explained variation don't simply measure how much variation is reflected by a given independent stat. The formula identifies the single most influential stat (in this case, minutes) and then (if other independent variables were part of the exercise, which they were in the one I did), it evaluates the influence of the secondary variables based not on how much influence they have but rather on how much incremental influence they have that is not duplicated by the primary independent variable—in this case, minutes.
Anyway, I'm going to stop before we get zapped by the big power man in the sky. Stay safe and warm, everyone. Good luck to BobH on your way home.
Sam
Re: Barkley and Morey battle about the value of analytics
sam wrote:Mrkleen,
Remember when I said that some statistician somewhere must have done some test that convinced him that the per-X stat explains more variation in performance than any other single stat.
Well, now you know a statistician who did just that.
Me.
I did a correlation analysis on the 18 players who have appeared this season in a Celtics uniform. And i'm not interested in hearing "small sample" cries. It's not a sample. It's a census of every single Celtics player to appear in at least one game during this season. Not one minute played this season for the Celtics is unaccounted for. If you'd like to replicate the exercise for every team in the league, perhaps over 35 or 40 seasons, I'll give you the formula.
In a correlation exercise, a result (called a "coefficient of correlation") of +1.0 represents perfect correlation (I've never seen one) between two or more variables; zero represents no correlation; and -1.0 represents perfect inverse (negative) correlation (I've never seen one of those either).
In a correlation exercise, there are two kinds of variables: independent variables (in this case, minutes on the floor), which are being tested to determine how much influence they have on dependent variables (the variable, such as scoring, that may or may not prove to be influenced by one or more of the independent variables. In this test, I used only the two variables of points scored per game (dependent0 and minutes played per game (independent0.
In well over 500 research studies on which I've used this statistical test in business, social science and sports research, it's unusual to arrive a coeffficient of correlation substantially higher than +0.50. That's because most phenomena that are worth testing are complex enough that the usual result is one dominant influence and a host of less major influences fighting to comprise the remaining bits and pieces of the explained variation.
The correlation coefficient relating scoring to minutes played was +.64. That's not a percentage; it's a coefficient. And it's considered quite high. What it means is that, whatever other less important variables might be fighting it out for the dregs of the remaining +.355, it's extremely likely that none of them could have a correlation coefficient remotely approaching the .645 influence of minutes on scoring.
As I said before, the per-whatever stat attempts to control the single most influential factor on scoring, assists, rebounds, or practically any other stat on which players are evaluated. By "control," I mean (in this case) that the per-whatever statistic attempts to minimize the most serious bias that could creep into conclusions based on using the stats-per-game measure.
It's ironic that, the more influences anyone could think of that might affect scoring, the more variables there are to chop up relatively tiny pieces of the other +.35.5 of the correlation coefficient...and the more reason there is not to go to the trouble of adding another secondary variable really won't explain a meaningful incremental amount of the variation in the dependent variable.
There's another reason for using the most dominant statistic. It involves "intervening variables." Let's say you wanted to introduce an independent variable that investigates how a player's ability level (however you want to measure it) influences his scoring. That should be a no-brainer of an independent variable. Right? Wrong! Because it's highly likely that the player's ability level influences the minutes he's given. The variation measured by talent level is largely duplicated by the per-X stat; and the per-X stat accomplishes a lot more too. The formulas for explained variation don't simply measure how much variation is reflected by a given independent stat. The formula identifies the single most influential stat (in this case, minutes) and then (if other independent variables were part of the exercise, which they were in the one I did), it evaluates the influence of the secondary variables based not on how much influence they have but rather on how much incremental influence they have that is not duplicated by the primary independent variable—in this case, minutes.
Anyway, I'm going to stop before we get zapped by the big power man in the sky. Stay safe and warm, everyone. Good luck to BobH on your way home.
Sam
Sam,
Quite a dissertation. Thanks.
2 more days on a beach (Koh Chang, Thailand), back to Bangkok for a day and a half and then I'm bingo SFO.
bob
.
bobheckler- Posts : 62620
Join date : 2009-10-28
Re: Barkley and Morey battle about the value of analytics
Bob,
Thanks.
You'll have time to deal with jet lag before the Celtics begin the stretch run or meltdown or whatever is going to happen.
I have no idea whether or not I'll be on the board during the next few days, as they're predicting power outages on the Cape. I'm sure my absence would give everyone some relief from my diatribes on statistics.
Safe travels,
Sam
Thanks.
You'll have time to deal with jet lag before the Celtics begin the stretch run or meltdown or whatever is going to happen.
I have no idea whether or not I'll be on the board during the next few days, as they're predicting power outages on the Cape. I'm sure my absence would give everyone some relief from my diatribes on statistics.
Safe travels,
Sam
Re: Barkley and Morey battle about the value of analytics
I think I mentioned it before but it could be interesting to implement some sort of version of chess elo or glicko on different aspects of basketball since that would not only involve what you do (score, rebound, steal) but also who you do it against. Problem would be to be to decide to who is the winning and losing actor in each instance.
swedeinestonia- Posts : 2153
Join date : 2009-10-17
Age : 44
Re: Barkley and Morey battle about the value of analytics
Swede,
Sounds interesting. Could you lead us through it? As I understand it, those systems provide competition-specific contexts for evaluating players on an ongoing basis. I especially like the ongoing updating of it because I usually see (and use myself) season-long stats being used in discussions, and I would favor updating based on splits with smaller intervals than the pre-all-star vs. post-all-star comparisons that are employed. It seems that a sort of "What have you done for me lately?" approach could be illuminating, especially in cases like Marcus Smart's three-point shooting percentage. I know this is not exactly what you are proposing, but it seems that the two ideas could be related.
Sam
Sounds interesting. Could you lead us through it? As I understand it, those systems provide competition-specific contexts for evaluating players on an ongoing basis. I especially like the ongoing updating of it because I usually see (and use myself) season-long stats being used in discussions, and I would favor updating based on splits with smaller intervals than the pre-all-star vs. post-all-star comparisons that are employed. It seems that a sort of "What have you done for me lately?" approach could be illuminating, especially in cases like Marcus Smart's three-point shooting percentage. I know this is not exactly what you are proposing, but it seems that the two ideas could be related.
Sam
Re: Barkley and Morey battle about the value of analytics
sam wrote:Swede,
Sounds interesting. Could you lead us through it? As I understand it, those systems provide competition-specific contexts for evaluating players on an ongoing basis. I especially like the ongoing updating of it because I usually see (and use myself) season-long stats being used in discussions, and I would favor updating based on splits with smaller intervals than the pre-all-star vs. post-all-star comparisons that are employed. It seems that a sort of "What have you done for me lately?" approach could be illuminating, especially in cases like Marcus Smart's three-point shooting percentage. I know this is not exactly what you are proposing, but it seems that the two ideas could be related.
Sam
They are both rating systems. Those two are generally used for chess or other 1v1 games and can be used both to rate the players but also to predict the outcome. If a 2000 player competes vs a 1700 player and wins he might gain +3 points (figures just guess but it gives a general idea) which the other person loses. They go against eachother again and for whatever reason the now 1697 rated player wins over the 2003 rated player and gains +30 points (1727 and 1973). If the two players would have the same rating they would lose/win the same but if the ratings differ then the stakes will also differ. There are also parts that make the rating more variable the less competitions the person has done to make them find their "true rating" faster.
It is sort of a more advanced +/- statistics that also takes into account who you are competing against. 75% win ratio against bench players is not worth as much as 75% win ratio against starters in that instance.
http://en.wikipedia.org/wiki/Elo_rating_system
http://en.wikipedia.org/wiki/Glicko_rating_system
Those two systems are systems that are used for 1v1 games such as chess and go but could also be applied to tennis and similar 1v1 situations. So one might think it could be applied to maybe rebounding or similar situations that is two players competing for the same objective. That would not account for team efforts or credit people doing the things like boxing out though which brings about "how does one rate people performing in a multiplayer environment" to take account into not only the persons direct 1v1 skills but also the ability to contribute to the 1v1 competition from a 5v5 perspective (like maybe someone is a monster at boxing out).
Online computer/videogaming have some ranking/rating systems used for rating the players and their performance/effect on a team of "randoms" which they also use for matchmaking and finding you opponents at your level. Micosoft Trueskill is one such system
http://en.wikipedia.org/wiki/TrueSkill
Dont know if this was interesting to anybody or if people have the ability to decode the content in those links but I try to contribute where I can
swedeinestonia- Posts : 2153
Join date : 2009-10-17
Age : 44
Re: Barkley and Morey battle about the value of analytics
Swede,
On my part, it could have validity for basketball if it doesn't credit individual players with statistics that are earned by groups of players. It would be interesting to see whether other members would have interest.
And your contributions are always appreciated.
Sam
On my part, it could have validity for basketball if it doesn't credit individual players with statistics that are earned by groups of players. It would be interesting to see whether other members would have interest.
And your contributions are always appreciated.
Sam
Re: Barkley and Morey battle about the value of analytics
"Lies, damned lies, and statistics" is a phrase describing the persuasive power of numbers, particularly the use of statistics to bolster weak arguments. It is also sometimes colloquially used to doubt statistics used to prove an opponent's point.
The term was popularised in the United States by Mark Twain (among others), who attributed it to the 19th-century British Prime Minister Benjamin Disraeli (1804–1881): "There are three kinds of lies: lies, damned lies, and statistics."
The term was popularised in the United States by Mark Twain (among others), who attributed it to the 19th-century British Prime Minister Benjamin Disraeli (1804–1881): "There are three kinds of lies: lies, damned lies, and statistics."
NYCelt- Posts : 10794
Join date : 2009-10-12
Re: Barkley and Morey battle about the value of analytics
Sure stats can be misleading. But then other stats can be brought into play to help to amend a misleading stat. I'm personally aware that any opinion that I offer, based on a stat, may be invalidated because of a relevant stat that I over looked.
swish
swish
swish- Posts : 3147
Join date : 2009-10-16
Age : 92
Page 2 of 2 • 1, 2
Similar topics
» Morey is Philly's new GM.
» Daryl Morey Leaving Houston Rockets Front Office
» Daryl Morey: Boston Celtics' Danny Ainge the best negotiator I've ever known
» Morey blames Boston media for his Joel not Being voted an All Star starter
» Charles Barkley
» Daryl Morey Leaving Houston Rockets Front Office
» Daryl Morey: Boston Celtics' Danny Ainge the best negotiator I've ever known
» Morey blames Boston media for his Joel not Being voted an All Star starter
» Charles Barkley
Page 2 of 2
Permissions in this forum:
You cannot reply to topics in this forum