Comments on: Comparing Distrubutions of Observations and Predictions: A Response to James Annan http://cstpr.colorado.edu/prometheus/?p=4416 Wed, 29 Jul 2009 22:36:51 -0600 http://wordpress.org/?v=2.9.1 hourly 1 By: John V http://cstpr.colorado.edu/prometheus/?p=4416&cpage=1#comment-10006 John V Sat, 17 May 2008 05:19:18 +0000 http://sciencepolicy.colorado.edu/prometheusreborn/?p=4416#comment-10006 Roger, The existence of other observational trends *increases* the uncertainty and *improves* the agreement between model and observation. The assumption of perfect measurements makes the "consistent with" test more difficult to pass. Since you have a math degree you must know this already. Roger,

The existence of other observational trends *increases* the uncertainty and *improves* the agreement between model and observation. The assumption of perfect measurements makes the “consistent with” test more difficult to pass.

Since you have a math degree you must know this already.

]]>
By: hank roberts http://cstpr.colorado.edu/prometheus/?p=4416&cpage=1#comment-10005 hank roberts Fri, 16 May 2008 22:59:46 +0000 http://sciencepolicy.colorado.edu/prometheusreborn/?p=4416#comment-10005 > Comparing Distrubutions Typo... > Comparing Distrubutions
Typo…

]]>
By: Roger Pielke, Jr. http://cstpr.colorado.edu/prometheus/?p=4416&cpage=1#comment-10004 Roger Pielke, Jr. Fri, 16 May 2008 21:30:53 +0000 http://sciencepolicy.colorado.edu/prometheusreborn/?p=4416#comment-10004 Lupo- Tom C gets it exactly right! Thanks for the pointer, I've updated the most recent post with this comment. Lupo- Tom C gets it exactly right! Thanks for the pointer, I’ve updated the most recent post with this comment.

]]>
By: Roger Pielke, Jr. http://cstpr.colorado.edu/prometheus/?p=4416&cpage=1#comment-10003 Roger Pielke, Jr. Fri, 16 May 2008 21:23:52 +0000 http://sciencepolicy.colorado.edu/prometheusreborn/?p=4416#comment-10003 John V- You observe -- "What James did was equivalent to assuming the observational trend (-0.1) was perfect." Indeed, and this is the step that I take issue with. Note that I do not call him names, I simply think that this step is not correct. The existence of 4 other temp trend measures with different values would seem to be pretty strong evidence along these lines, no? John V-

You observe — “What James did was equivalent to assuming the observational trend (-0.1) was perfect.”

Indeed, and this is the step that I take issue with. Note that I do not call him names, I simply think that this step is not correct. The existence of 4 other temp trend measures with different values would seem to be pretty strong evidence along these lines, no?

]]>
By: Lupo http://cstpr.colorado.edu/prometheus/?p=4416&cpage=1#comment-10002 Lupo Fri, 16 May 2008 20:20:05 +0000 http://sciencepolicy.colorado.edu/prometheusreborn/?p=4416#comment-10002 Maybe this can help. Tom C says in a comment over at James' James - What you and Roger are arguing about is not worth arguing about. What is worth arguing about is the philosophy behind comparing real-world data to model predictions. I work in the chemical industry. If my boss asked me to model a process, I would not come back with an ensemble of models, some of which predict an increase in a byproduct, some of which predict a decrease, and then claim that the observed concentration of byproduct was "consistent with models". That is just bizarre reasoning, but, of course, such a strategy allows for perpetual CYAing. The fallacy here is that you are taking models, which are inherently different from one another, pretending that they are multiple measurements of a variable that differ only due to random fluctuations, then doing conventional statistics on the "distribution". This is all conceptually flawed. Moreover, the wider the divergence of model results, the better the chance of "consistency" with real-world observations. That fact alone should signal the conceptual problem with the approach assumed in your argument with Roger. Maybe this can help. Tom C says in a comment over at James’

James -

What you and Roger are arguing about is not worth arguing about. What is worth arguing about is the philosophy behind comparing real-world data to model predictions. I work in the chemical industry. If my boss asked me to model a process, I would not come back with an ensemble of models, some of which predict an increase in a byproduct, some of which predict a decrease, and then claim that the observed concentration of byproduct was “consistent with models”. That is just bizarre reasoning, but, of course, such a strategy allows for perpetual CYAing.

The fallacy here is that you are taking models, which are inherently different from one another, pretending that they are multiple measurements of a variable that differ only due to random fluctuations, then doing conventional statistics on the “distribution”. This is all conceptually flawed.

Moreover, the wider the divergence of model results, the better the chance of “consistency” with real-world observations. That fact alone should signal the conceptual problem with the approach assumed in your argument with Roger.

]]>
By: John V http://cstpr.colorado.edu/prometheus/?p=4416&cpage=1#comment-10001 John V Fri, 16 May 2008 19:56:00 +0000 http://sciencepolicy.colorado.edu/prometheusreborn/?p=4416#comment-10001 Roger, If the mean of one value falls in the confidence limits of the other, then it is not necessary to use the normal difference distribution to show consistency. The normal difference distribution will *always* enlarge the confidence intervals and thereby improve the agreement. What James did was equivalent to assuming the observational trend (-0.1) was perfect. There was no need to extend the confidence limits to account for uncertainty in the observation. Hopefully somebody with more stats knowledge than I have can comment on your use of an unpaired-t test. Roger,

If the mean of one value falls in the confidence limits of the other, then it is not necessary to use the normal difference distribution to show consistency. The normal difference distribution will *always* enlarge the confidence intervals and thereby improve the agreement.

What James did was equivalent to assuming the observational trend (-0.1) was perfect. There was no need to extend the confidence limits to account for uncertainty in the observation.

Hopefully somebody with more stats knowledge than I have can comment on your use of an unpaired-t test.

]]>
By: Roger Pielke, Jr. http://cstpr.colorado.edu/prometheus/?p=4416&cpage=1#comment-10000 Roger Pielke, Jr. Fri, 16 May 2008 19:39:39 +0000 http://sciencepolicy.colorado.edu/prometheusreborn/?p=4416#comment-10000 John V- Thanks, but James did not use this statistical test. he compared a single observational trend mean (-.1) to the distribution of modeled trend means. The tests result in different conclusions: http://sciencepolicy.colorado.edu/prometheus/archives/prediction_and_forecasting/001431the_helpful_undergra.html John V-

Thanks, but James did not use this statistical test. he compared a single observational trend mean (-.1) to the distribution of modeled trend means.

The tests result in different conclusions:
http://sciencepolicy.colorado.edu/prometheus/archives/prediction_and_forecasting/001431the_helpful_undergra.html

]]>
By: John V http://cstpr.colorado.edu/prometheus/?p=4416&cpage=1#comment-9999 John V Fri, 16 May 2008 19:34:21 +0000 http://sciencepolicy.colorado.edu/prometheusreborn/?p=4416#comment-9999 Roger, I agree that you can't just check for overlapping confidence intervals. Under the assumption of Gaussian (normal) distributions you have to use the normal difference distribution that I linked above, and that James Annan used in his calculations. You are basically saying that the standard deviation of the difference is less than the sum of the individual standard deviations. Of course it is. It is however always larger than either of the individual standard deviations. Therefore, if the mean of one value falls in the confidence limits of the other then you can skip the normal difference distribution. Applying it will only improve the degree of agreement. Roger,

I agree that you can’t just check for overlapping confidence intervals. Under the assumption of Gaussian (normal) distributions you have to use the normal difference distribution that I linked above, and that James Annan used in his calculations.

You are basically saying that the standard deviation of the difference is less than the sum of the individual standard deviations. Of course it is. It is however always larger than either of the individual standard deviations.

Therefore, if the mean of one value falls in the confidence limits of the other then you can skip the normal difference distribution. Applying it will only improve the degree of agreement.

]]>
By: Lupo http://cstpr.colorado.edu/prometheus/?p=4416&cpage=1#comment-9998 Lupo Fri, 16 May 2008 19:26:25 +0000 http://sciencepolicy.colorado.edu/prometheusreborn/?p=4416#comment-9998 At what point does a mode of transportation move from being safe to unsafe? Not 'more safe and less safe' or 'more unsafe and less unsafe' or some mix. So at what point does it move from consistency to inconsistency? Then the term can be qualified to the extent of the state. Are two 75% certain ranges that overlap 50% considered consistent? Two 50% certain ranges that overlap 75% considered consistent? Is it 75/75 50/50 75/95 95/95? If you can get the wrong answer half the time or more, that would be inconsistent, correct? The more of the time you can get the right answer the more consistent it is, above 50%. The more of the time you can get the wrong answer the more inconsistent it is, under 50%. At what point does a mode of transportation move from being safe to unsafe? Not ‘more safe and less safe’ or ‘more unsafe and less unsafe’ or some mix. So at what point does it move from consistency to inconsistency? Then the term can be qualified to the extent of the state.

Are two 75% certain ranges that overlap 50% considered consistent? Two 50% certain ranges that overlap 75% considered consistent? Is it 75/75 50/50 75/95 95/95?

If you can get the wrong answer half the time or more, that would be inconsistent, correct?

The more of the time you can get the right answer the more consistent it is, above 50%. The more of the time you can get the wrong answer the more inconsistent it is, under 50%.

]]>
By: Roger Pielke, Jr. http://cstpr.colorado.edu/prometheus/?p=4416&cpage=1#comment-9997 Roger Pielke, Jr. Fri, 16 May 2008 17:10:37 +0000 http://sciencepolicy.colorado.edu/prometheusreborn/?p=4416#comment-9997 John V- Real Climate links to a useful paper which explains why the overlapping uncertainty ranges are not a good test of significance. It also explains why the approach favored by James Annan, despite the bluster, is not the best one for this exercise. http://www.gfdl.noaa.gov/reference/bibliography/2005/jrl0501.pdf Here are some excerpts: "In climate change studies, a trend estimate may be presented for competing datasets, or for models and observations. When the error bars for the different estimates do not overlap, it is presumed that the quantities differ in a statistically significant way. Unfortunately, as demonstrated by Schenker and Gentleman (2001, hereafter SG), this is in general an erroneous presumption." The paper found in appropriate use of error bars in asserting significance in a number of paper and in the IPCC TAR. It also says: "The misperception regarding the use of error bars may arise because of a fundamental difference between one-sample and two- (or multi) sample testing. For a Gaussian-distributed variate, when only one quantity is estimated, a one-sample test (such as a Student’s t test) may be performed. The null hypothesis would be that the estimated quantity is equal to some constant (e.g., that an anomaly is zero). In the one-sample case, application of a t test is equivalent to placing error bars about the quantity to see if it overlaps with the hypothesized value. However, when the interest is in comparing estimated values from two different samples, use of error bars about each estimate, looking for overlap, is not equivalent to application of a two-sample t test." and further: "In particular, this approach will lead to a conservative bias in that sometimes no difference is found when it should. However, this bias is not constant, varying depending on the relative magnitudes of the sampling errors in the two samples. The maximum bias is found when the sampling variability of the two samples is comparable." So take from this that (a) overlapping error bars do not convey statistical significance, and (b) the choice of statistical test involves some judgment (i.e., James Annan's placing of a point representing observations in the distribution of model results and asserting consistency is not the best way to address this issue). John V-

Real Climate links to a useful paper which explains why the overlapping uncertainty ranges are not a good test of significance. It also explains why the approach favored by James Annan, despite the bluster, is not the best one for this exercise.

http://www.gfdl.noaa.gov/reference/bibliography/2005/jrl0501.pdf

Here are some excerpts:

“In climate change studies, a trend estimate may be presented for competing datasets, or for models and observations. When the error bars for the different estimates do not overlap, it is presumed that the quantities differ in a statistically significant way. Unfortunately, as demonstrated
by Schenker and Gentleman (2001, hereafter
SG), this is in general an erroneous presumption.”

The paper found in appropriate use of error bars in asserting significance in a number of paper and in the IPCC TAR. It also says:

“The misperception regarding the use of error bars
may arise because of a fundamental difference between one-sample and two- (or multi) sample testing. For a Gaussian-distributed variate, when only one quantity is estimated, a one-sample test (such as a Student’s t test) may be performed. The null hypothesis would be that the estimated quantity is equal to some constant (e.g., that an anomaly is zero). In the one-sample case, application of a t test is equivalent to placing error bars about the quantity to see if it overlaps with the hypothesized value. However, when the interest is in comparing estimated values from two different samples, use of error bars about each estimate, looking for overlap, is not equivalent to application of a two-sample t test.”

and further:

“In particular, this approach will lead to a conservative bias in that sometimes no difference is found when it should. However, this bias is not constant, varying depending on the relative
magnitudes of the sampling errors in the two
samples. The maximum bias is found when the sampling variability of the two samples is comparable.”

So take from this that (a) overlapping error bars do not convey statistical significance, and (b) the choice of statistical test involves some judgment (i.e., James Annan’s placing of a point representing observations in the distribution of model results and asserting consistency is not the best way to address this issue).

]]>