Photo

Balt Sun: Q&A: Bill Ripken on his old school vs. new school book, his war on WAR, and the state of the Orioles


  • Please log in to reply
60 replies to this topic

#21 dude

dude

    HOF

  • Members
  • PipPipPipPipPip
  • 7,065 posts
  • LocationColumbus, GA

Posted 12 February 2020 - 07:12 PM

We'll have to agree to disagree on that.

 

Calculation of the stat is math.  Are you disagreeing with math?



#22 dude

dude

    HOF

  • Members
  • PipPipPipPipPip
  • 7,065 posts
  • LocationColumbus, GA

Posted 12 February 2020 - 07:17 PM

Also this previous thread.... 

 

https://www.baltimor...ve replacement

 

I don't get in on that until the last page or so, but I don't need to change my mind.

 

The question remains....what do you want to use it for?



#23 Ravens2006

Ravens2006
  • Members
  • 327 posts

Posted 13 February 2020 - 12:35 PM

Chris Davis has remained a consistent active player on this team for years now solely because of salary.  If he wasn't guaranteed all that money, he'd have been gone a long time ago.

 

Which is a horrible way to run a sports team by the way, from a competitive standpoint...



#24 Nigel Tufnel

Nigel Tufnel

    HOF

  • Members
  • PipPipPipPipPip
  • 4,659 posts

Posted 13 February 2020 - 02:34 PM

Calculation of the stat is math.  Are you disagreeing with math?

 

I assume he's willing to believe that the arithmetic is correct, but maybe he's more concerned about the formula they're using.

 

Me:  I invented a new formula to calculate a player's value.  It's 2 x 6.  Every player has a value of 12.

 

You:  Every player has a value of 12?

 

Me:  DON'T ARGUE, IT'S MATH


  • bmore_ken likes this

#25 dude

dude

    HOF

  • Members
  • PipPipPipPipPip
  • 7,065 posts
  • LocationColumbus, GA

Posted 14 February 2020 - 12:09 AM

Really?



#26 Nigel Tufnel

Nigel Tufnel

    HOF

  • Members
  • PipPipPipPipPip
  • 4,659 posts

Posted 14 February 2020 - 01:12 PM

Really?

 

No, I'm actually still working on my formula.



#27 dude

dude

    HOF

  • Members
  • PipPipPipPipPip
  • 7,065 posts
  • LocationColumbus, GA

Posted 15 February 2020 - 05:44 PM

I think I'm the only baseball fan that thinks that stat is flawed.

 

For Nigel, let's start over.

-----------

 

Ken, why do you think the stat is flawed?



#28 Nigel Tufnel

Nigel Tufnel

    HOF

  • Members
  • PipPipPipPipPip
  • 4,659 posts

Posted 20 February 2020 - 04:17 PM

Bill James posted a pretty long article on his website about WAR today...  the below is just a small part of it.  He thinks WAR is flawed.

 

The process of saying what a player’s "value" is is immensely complicated, and in that long and complicated process the analyst has to make hundreds of choices about the interpretation of data.  The analyst has to make some decision about the treatment of fluke outcomes.  Norm Cash had almost exactly the same three true outcomes in 1962 as he had in 1961, but his batting average dropped 118 points.  Do you treat that as a fluke, or do you treat it as a reality?  There is no clearly correct answer.  Do you measure park effects in one-year increments, or five-year aggregates?  There is no clearly correct answer.  If a team wins 90 games but has numbers which suggest that they should have won 80, do you treat them as a 90-win group of players, or an 80-win group of players?  One answer is not necessarily better than the other. 

 

We’re choosing a pathway through a forest of choices.  You make one choice, you wind up at a lake; you make the other choice, you wind up on a mountain. 

 

At times, in designing Win Shares, I was absolutist when I should have chosen a middle ground.  I chose a narrow pathway when I should have chosen a broader one. Also, I made the system so damned complicated that almost nobody really understands it; people say all kinds of things about Win Shares that are clearly not true, but it’s my own fault for making the system so complicated.  Of perhaps more importance, making it so complicated makes it hard to fix, hard to update, hard to program. 

 

But. . .this is just my opinion; take it for what it is worth.   I think the problems of Win Shares are trivial compared with the problems of WAR.   Sabermetrics is supposed to be, as much as possible, an open road toward insight on an issue. The designers of WAR—friends of mine, almost without exception—have made choices which create an extremely narrow pathway through the forest of problems.

 

If you think about it, if you create a logical pathway toward Wins Above Replacement, you first have to measure WINS.  Right?  If you’re measuring Wins, and you are measuring Wins Above Replacement, which problem do you come to first?  AFTER you measure how many WINS each player has contributed to his team, THEN you are in a position to ask "How many of those wins would have been contributed by a Replacement Level Player, and how many are Wins Above Replacement?"

 

This has never been done.   The designers of WAR skipped the first problem, and tried to take a shortcut toward the second.

 

The problem of how many wins a player has contributed ABOVE replacement level is necessarily more complicated than the problem of how many Wins he has contributed.    In order to reach Wins Above Replacement, you have to solve all of the problems associated with measuring Wins, and then you have to solve an additional set of problems. 

 

This has never been done.  I spent two, three years working essentially full-time on Win Shares, trying to think through every little problem as best I was able.  I made some mistakes.  But if you’re REALLY going to measure Wins Above Replacement, rather than merely pretending that you are measuring it, you’re going to have to take a couple of year’s sabbatical from whatever else it is that you are doing, and think through all of the problems.  I don’t believe that anyone has ever done this, and I don’t believe that the structure of WAR was ever really thought through in a logical fashion.   

 

If you think about it, this should be obvious:  that your measurement of the number of Wins a player is above Replacement can never be more accurate than your measure of his Wins. 

 

And then, WAR is a derivative stat, derived from an estimate of the player’s Wins and an estimate of the Replacement Level, one subtracted from the other.  But a derivative stat of this nature is inherently less accurate than EITHER of its component measurements.  It absolutely has to be.

 

I propose this, as a thought experiment.  This complicated math that we go through to find Win Shares or WAR, it is like a scale.  It is a scale that measures value—and, frankly, it is not a tremendously accurate scale.  It’s a best-we-can-do scale. 

 

This is my thought experiment.   Suppose that you create a universe of players, and suppose that, for each player, you create (a) a number of wins for him, and (b) a number of replacement-level wins, to be subtracted from his wins to find his value.  Suppose, however, that the scale on which you measure each one of those things is 10% inaccurate—just 10%.  Will the resulting derivative stat also be 10% inaccurate?

 

No; in fact, it will be something like 35% inaccurate.  Suppose that a player’s true Wins Contributed is 7.0, but that the replacement level player would have contributed 4.0 (a normal ratio of wins to WAR.)  His true value is 3.0 WAR.  But if each of the two major components is measured with a potential error of 10%, then the player’s measured Wins Above Replacement can be anywhere from 1.9 (that is, 6.3 minus 4.4) up to 4.1 (that is, 7.7 minus 3.6).   With a potential 10% measurement error on each element, a player with a WAR of 3.0 can be measured anywhere from 1.9 to 4.1. 

 

What I think that almost no one who uses WAR understands is how fantastically accurate your process measurements would have to be to get an accurate WAR.   A derivative estimate contains all of the inaccuracy in any of the components from which it is derived, combined in a geometric fashion.  And, in fact, they have NOT arrived at an accurate WAR. 

 

WAR is. . . it’s not a fraud, because a fraud is a DELIBERATE attempt to mislead.  No one involved in the creation of promulgation of WAR has attempted to mislead you; they have merely over-estimated their ability to measure baseball value accurately, based on what we know.  We’re not there yet; we have not yet reached the point at which WAR estimates are even reasonably reliable.  Due to the remarkable skills of Sean Foreman, and his devotion to the concept of WAR, millions of people have come to attribute to WAR a reliability that the stat simply does not have.

 

When you review WAR estimates in the way that I have been spent the last week doing, this becomes obvious.  B-WAR leads the user along a narrow pathway through the forest of decisions, and tells us that the best player in the American League in 1962 was:  Hank Aguirre.  This is a weird idea.  I hate to tell you this, but Hank Aguirre was really NOT the best player in the American League in 1962—nor one of the five best, nor one of the 20 best.  B-WAR leads the user along a narrow pathway through the statistics of the 1966 season, and tells you that, if you buy ALL of their choices, the best player in the American League was not Frank Robinson, it was Earl Wilson.  This is a weird idea.  It is a weird conclusion, and it is logically indefensible.

 

And there are quite a few of them. 

 

Do you know who WAR says was the Most Valuable Player in the American League in 2008?  Nick Markakis.   He wasn’t mentioned in the American League’s MVP voting, but. . .that’s what they want us to believe.

 

Nick Markakis back then was a good player.  These are Markakis’ stat lines from 2007 through 2009, copied directly from Sean Foreman’s wonderful site:

 

Doesn’t it look to you, kind of, like Nick Markakis was the same player in 2008 that he was in 2007 or 2009?  Isn’t that the conclusion that you would tend to reach?

 

But no, WAR says that Markakis’ value was 4.2 in 2007 and 2.9 in 2009, but 7.4 in 2008.  His value in 2008 was greater than his combined value in 2007 AND 2009.  It is, frankly, a weird thing to say.  His walks were up by 38 but his RBI were down by 25, his other stats really the same.  I buy it to the extent of saying that he had SOME more value in 2008 than in the other years.  If you said he was 4.2 in 2007 but 5.2 in 2008, I’d be OK with that.  Win Shares shows his value in those three seasons as 20-23-16—a moderate increase for 2008.  The conclusion that he was, for some reason, the American League’s best player in 2008 is weird. 

 

I should not leave the impression that the 2008 calculation is mysterious and I don’t understand it, or some nitwit will write and explain it to me.  It results from a combination of his offensive and his defensive stats.  His walks spiked upward in 2008, leading to an increase in offensive value, and his defensive value also spiked upward.  BWAR says that his dWAR is negative in every season of his career up to 2015, except 2008, when it is tremendously positive.  His dWAR by season, beginning in 2006, was -0.1, -0.1, +1.8, -0.8, -1.7, -0.1, -1.2, -0.5, -1.4.  The spike in defensive value in 2008 explains most of why he was the American League’s best player that year.  I’m not saying that I don’t understand it; I’m saying that I don’t believe it. 

 

WAR chooses a narrow pathway through the forest of numbers, to lead you to that conclusion—and people say, "Oh.  Okay.  If that’s what the formulas say, I guess that’s his value."

 

https://www.billjame...ks_at_the_mvps/



#29 BSLMikeRandall

BSLMikeRandall

    Sr. Ravens Analyst

  • Members
  • PipPipPipPipPip
  • 20,491 posts

Posted 20 February 2020 - 04:41 PM

Complicated, sure. But I’ve never seen a high number assigned to a bad player or a low number assigned to a great player. Whatever it is, it’s fine as a quick snapshot of a caliber of player if you didn’t know anything about them.
@BSLMikeRandall

#30 bmore_ken

bmore_ken

    All Star

  • Members
  • PipPipPip
  • 1,473 posts

Posted 20 February 2020 - 05:50 PM

This is in the James article is essentially my problem with it. Well not really a problem, just why I give it no real credence. 

In order to reach Wins Above Replacement, you have to solve all of the problems associated with measuring Wins, and then you have to solve an additional set of problems. 



#31 Nigel Tufnel

Nigel Tufnel

    HOF

  • Members
  • PipPipPipPipPip
  • 4,659 posts

Posted 21 February 2020 - 09:06 AM

Complicated, sure. But I’ve never seen a high number assigned to a bad player or a low number assigned to a great player. Whatever it is, it’s fine as a quick snapshot of a caliber of player if you didn’t know anything about them.

 

His issue with WAR wasn't that it's complicated, just that it isn't as accurate as people think.  But, like you said, good players will have high WAR, and bad players will have low WAR - a stat being flawed is a lot different from it being useless.  It's just that if player A has 7 WAR and player B has 5 WAR, that doesn't necessarily mean that player A was actually better.



#32 BSLMikeRandall

BSLMikeRandall

    Sr. Ravens Analyst

  • Members
  • PipPipPipPipPip
  • 20,491 posts

Posted 21 February 2020 - 09:35 AM

His issue with WAR wasn't that it's complicated, just that it isn't as accurate as people think. But, like you said, good players will have high WAR, and bad players will have low WAR - a stat being flawed is a lot different from it being useless. It's just that if player A has 7 WAR and player B has 5 WAR, that doesn't necessarily mean that player A was actually better.


I think more like a player with 8.6 WAR isn’t always better than 8.5 WAR. Last year this was Mike Trout or Alex Bregman.

Player with 7.0 WAR and 5.2 WAR were Anthony Rendon and Yasmani Grandal. It’s not a debate which of those two players is better.
@BSLMikeRandall

#33 Nigel Tufnel

Nigel Tufnel

    HOF

  • Members
  • PipPipPipPipPip
  • 4,659 posts

Posted 21 February 2020 - 09:50 AM

I think more like a player with 8.6 WAR isn’t always better than 8.5 WAR. Last year this was Mike Trout or Alex Bregman.

Player with 7.0 WAR and 5.2 WAR were Anthony Rendon and Yasmani Grandal. It’s not a debate which of those two players is better.

 

Well, obviously not every 5 WAR player is better than every 7 WAR player.  That wouldn't make any sense.  It's just that it can happen.

 

Look at Markakis (7.4) vs Grady Sizemore (5.9) in 2008 - who do you think had a better season?



#34 BSLMikeRandall

BSLMikeRandall

    Sr. Ravens Analyst

  • Members
  • PipPipPipPipPip
  • 20,491 posts

Posted 21 February 2020 - 10:42 AM

Well, obviously not every 5 WAR player is better than every 7 WAR player. That wouldn't make any sense. It's just that it can happen.

Look at Markakis (7.4) vs Grady Sizemore (5.9) in 2008 - who do you think had a better season?


Fangraphs has Sizemore at 7.4 and Markakis 6.1.

If you used BBRef, I’d suggest using FG instead for advanced metrics.
@BSLMikeRandall

#35 Nigel Tufnel

Nigel Tufnel

    HOF

  • Members
  • PipPipPipPipPip
  • 4,659 posts

Posted 21 February 2020 - 11:48 AM

I'm not sure that saying that there are two different versions of WAR, which can return wildly different values, is an argument in support of its pinpoint accuracy.


  • bmore_ken likes this

#36 DJ MC

DJ MC

    HOF

  • Members
  • PipPipPipPipPip
  • 23,680 posts
  • LocationBeautiful Bel Air, MD

Posted 21 February 2020 - 12:17 PM

I'm not sure that saying that there are two different versions of WAR, which can return wildly different values, is an argument in support of its pinpoint accuracy.

 

Why are you demanding pinpoint accuracy?


@DJ_McCann

#37 Nigel Tufnel

Nigel Tufnel

    HOF

  • Members
  • PipPipPipPipPip
  • 4,659 posts

Posted 21 February 2020 - 12:22 PM

Why are you demanding pinpoint accuracy?

 

I'm not, although I suppose it's the eventual goal for any stat.  But Mike seemed to be saying it was already that accurate, and I disagree is all.



#38 bmore_ken

bmore_ken

    All Star

  • Members
  • PipPipPip
  • 1,473 posts

Posted 21 February 2020 - 12:53 PM

His issue with WAR wasn't that it's complicated, just that it isn't as accurate as people think.  But, like you said, good players will have high WAR, and bad players will have low WAR - a stat being flawed is a lot different from it being useless.  It's just that if player A has 7 WAR and player B has 5 WAR, that doesn't necessarily mean that player A was actually better.

However, that's what you're lead to believe. 



#39 DJ MC

DJ MC

    HOF

  • Members
  • PipPipPipPipPip
  • 23,680 posts
  • LocationBeautiful Bel Air, MD

Posted 21 February 2020 - 02:54 PM

I'm not, although I suppose it's the eventual goal for any stat.  But Mike seemed to be saying it was already that accurate, and I disagree is all.

 

Except that it isn't what he said. He said that when you get to such narrow differences in the output number (the 8.6 vs. 8.5 of Trout vs. Bregman) it isn't accurate enough to say that one is definitively better. But at larger spreads, there is much more confidence in the result (7 WAR vs. 5.2 WAR).


@DJ_McCann

#40 BSLMikeRandall

BSLMikeRandall

    Sr. Ravens Analyst

  • Members
  • PipPipPipPipPip
  • 20,491 posts

Posted 21 February 2020 - 03:06 PM

I'm not, although I suppose it's the eventual goal for any stat. But Mike seemed to be saying it was already that accurate, and I disagree is all.


I never said it had pinpoint accuracy. Even pointed out that decimal place differences don’t mean much. 2 whole point differences, pretty easy to see though. Fangraphs metric for WAR is probably a better one the BBRef because their foundation is metrics. They use metrics in their evaluation that BBRef doesn’t.

All I’m implying with WAR is that it is a snapshot. If you wanted you compare two players you wouldn’t say X has 0.2 more WAR than Y. X is better. You might say, I didn’t expect X to compare as closely to Y. Let me look deeper into why that is.

Then you probably find your answer in the other stats.
@BSLMikeRandall




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

Partners