Comments on: Upcoming Talk and Panel This Week

By: Kooiti Masuda

Kooiti Masuda — Sat, 09 Jul 2005 05:20:37 +0000

Excuse me for continuing discussion out of original context of this posting, but I think I should make some clarification of one of my previous comments.

My comment here at July 5, 2005 03:00 AM was, unlike other comments of mine, not aimed at so-called hockey-stick reconstruction but about climate science in general. Actually my main concern in mind was instrumental meteorological data and climate model codes.

The word “norm” may be ambiguous. We should distinguish what is desirable
from what is obligatory. I strongly think that it is desirable for climate scientists to contribute source codes and data to public data bases.

(But I still do not think it desirable to archive in public data bases all codes and data made during various processes of scientific studies. The cost of maintaining such archives is likely to be prohibitive.)

On the other hand, making it obligatory for climate scientists to deposit source programs and data (i.e. inhibiting publication of studies which do not follow this norm) would result in huge loss of scientific knowledge that cannot be compensated by relative improvement of quality.

Though it may be unimaginable to U.S. citizens, large part of instrumental
meteorological data of many countries are not freely available. I cannot give all the data which I use to those who request them. Let me illustrate the situation using somewhat idealized examples. Each of countries X and Y has 100 good stations, but releases observation records at just 30 stations as free contribution to the world following the guideline of World Meteorological Organization. Each government considers the rest intellectual properties. Country X has a national data center and it has a data catalog with price tags. In this case my responsibility seems to be just to mention the entry in the data catalog which corresponds to the one I used. The rest of reproducibility depends only on funds available to the requester. This is a relatively easy case. In the case of Country Y, data became available to us by negotiation, and the condition of availability and the price may be different to other users who also manage to get them. This is a case of low reproducibility, and the data at 70 stations in country Y can be called “closed”. It is true that such studies that use “closed” data should be considered less reliable than those which use open data only (from the standpoint of reproducibility). But on the other hand, results obtained using 100 stations may be much informative than those using just 30 stations in some aspects — in spatial details, probably. (This is similar to the case of the oil company study mentioned previously.)

Certainly we should encourage those countries to release the data more openly. The U.S. Government seems to be in the best position to make such suggestion. But the response of bureaucracies are likely to be slow. Also, if the countries feel threats behind the words of the U.S. government, they probably tend more towards protection. Publishing scientific results which use the data seems to be one of best ways of assistance to the goal of making the data eventually join the global public goods.

Software is, in current legal principle, considered intellectual property. It is actually rare that programs written by climate scientists are considered economic goods. But the atmospheric part of a climate model is almost the same as a weather forecast model and can thus possibly be used for profit. Some institutions want to protect the code from free-riding — they think that the profit should be shared by the institutions which originated the code. More often, the authors dislike bureaucracy of their own institutions, avoid claiming copyrights officially, and put their software in an informal state like trade secrets. They give the codes to friendly collaborators. But they do not want to give them to unfriendly competitors. (They fear that their paper may not be considered original study by scientific journals if it is submitted later than similar paper by their
competitors.) They also do not want to give them to unfriendly critiques who may emphatically announce faults of the code to the public before helping the authors correct them.

(The range of applicability of scientific software is usually not rigorously determined. Behavior of a program with such input data that the authors did not imagine may or may not be good [i.e. representing the physical model behind the code correctly]. The authors would welcome users who report them questionable behavior and help them understand the limitation of the code. But the authors would not welcome users who just shout that the code is bad. This is what I meant by “My code is not yet well documented …” in the previous comment.)

These attitudes may be selfish, but there seems to be no grounds either for their competitors or critiques favored above them.

Sometimes, especially when there is suspect of frauds, there will be needs
for some measure to enforce submission of
programs and data. Then we need a publicly recognized arbiter who can settle disputes.

By: Frank H. Scammell

Frank H. Scammell — Thu, 07 Jul 2005 21:37:00 +0000

I am somewhat surprised that the debate is so circuitous. Surely some must recognize that having the hockeystick handle flatter than the evidence suggests is merely a ploy to accentuate the blade-”the Instrumental Record”. “The Instrumental Record” continues to deviate from radiosonde (balloon) and satellite records (see Junk Science) because the ‘Urban Heat Island” effect is incorrectly modelled (talk to Dr Jim Hansen). If you examine the MBH results – without the merging with “the Instrumental Record”, there certainly seems to be nothing of concern. Additionally look at the scaling – chartsmanship at its finest – , long way to go to projected worst cases. The political issue is control over energy usage – by whom, when, and how.

By: Roger Pielke Jr.

Roger Pielke Jr. — Thu, 07 Jul 2005 19:46:28 +0000

We won’t be recording or transcribing the talk and panel, but three people have agreed to summarize the sesssion and we will post their reports here next week.

By: Kooiti Masuda

Kooiti Masuda — Thu, 07 Jul 2005 06:34:34 +0000

Re: my comment at July 5 2005 01:45 AM (see also correction at July 5, 2005 03:22 AM)

Please note also: my intuitive “finding” was not well founded and maybe simply wrong.

As a time series analysis, the number of data points used by Mann et al. was not small. Most of the input data had values at every year during many hundreds of years. We usually consider the number of data points is large enough to represent decadal mean values.

Also spatially, number of sample points was not so small. That is why Mann et al. thought it appropriate to use multivariate statistics (principal component analysis or something like that) to aggregate the information.

What made me feel insufficiency of data in this case was strong spatial inhomogeneity of data. Useful data exist in some regions of the world but not in the other regions. Then, another intuitive reasoning can be made. Our experience with modern climatological data suggests that regional anomalies often compensate each other when global averages are taken. Thus, we may expect smaller variability of actual global averages (or averages based on homogeneous sampling) than global averages based on inhomogeneous sampling. But this reasoning implicitly assumes that such forcing that tends to bring global warming or global cooling are unimportant — surely not proven even about the era before industrial revolution.

Thus I cannot objectively say that the reconstruction by Mann et al. is too smooth to be true. I think the study by von Storch et al. (2004) strongly suggests something like my guess, but, strictly speaking, this is just yet another case study and maybe not universal.

By: Roger Pielke Jr.

Roger Pielke Jr. — Wed, 06 Jul 2005 14:44:38 +0000

Bob- A good idea. We’ll look into it.

By: Bob

Bob — Wed, 06 Jul 2005 14:35:12 +0000

Any chance of publishing a transcript of this seminar/discussion on this website?

By: garhane

garhane — Tue, 05 Jul 2005 21:58:03 +0000

Seeking to learn how climate scientists go about working up graphs which do not resemble normal series for many types of data we read a short piece by I.M. Scientist and found he seems to hold some peculiar ideas about the adequacy of samples of data points where series present smooth curves. He states that such series “…tend to be smooth when the number of data points are large…”.

This seemed odd, even though it is a claim by a person who alleges a standing in climate science, so we decided to check the claim. We took the following statistical procedures A through G from data bank Z and subjected sample data to a series of standard statistical tests. Unlike I. M. Scientist we have provided our data which is here.
We ran series of tests using data with very few data points all the way up to data with very many.
We also ran a series of standard tests of reliability and our results, which can be seen to be very high, are here. The original claim of I. M. Scientist is found in the web page blank, where it appeared with no supporting data whatever. So far as we can tell, the claim has not previously been published or subjected to peer review.

Our tests showed that the claim made by this scientist is completely false and his conclusion is spurious.
To ensure we had obtained the correct statement of his claim we made repreated requests for disclosure, to this individual, and for confirmation that he had indeed made this false claim, after we had written up our results and submitted them to a journal.
We have not yet been favored by a personal reply but there is a most interesting follow up. This well known scientist who would not answer our inquiries has made another posting to the web page where he published his false claim and sought to alter his posting. Now he says in web page comments, that his statment about smoother data when the number of data points are large, should be read in reverse, as a statement that applies when the number of data points is not large. So much for those who relied on him.
We have again requested disclosure to determine how this peculiar outcome occurred. We believe complete and fair disclosure of all data methodology, calculations workinfg notes, source code, code for any programs existing on the same computer, and samples of the pens used in writing up notes, supported by Affidavit of the University Administrator should accompany claims of this sort. Perhaps this will lead to earlier detection so that such spurious claims will not be made.

By: Kooiti Masuda

Kooiti Masuda — Tue, 05 Jul 2005 09:22:21 +0000

Sorry!

Correction to my comment at July 5, 2005 01:49 AM

The second paragraph,
>number of data points are large enough
reads
number of data points are NOT large enough

By: Kooiti Masuda

Kooiti Masuda — Tue, 05 Jul 2005 09:00:34 +0000

Re: Steve Bloom’s comments.

I think that the main goal of science is shared knowledge, and that scientific achivements should be reproducible.

But I think that McIntyre is too demanding about disclosure of program codes and input data used by scientists. Those scientists who can put all codes and data publicly available, such as Mann (even if partly due to pressure from McIntyre), are admirable, but actually exceptional. Part of the software I use may be proprietary and not readily sold as commodity either. Another part may have been written by my fellow scientists who feel free for me to use it but not releasing it publicly. Another part which was written by myself is not yet well documented, and I feel giving it away without appropriate documentation is prone to misuse. Thanks to open-source software movement, some scientists may be able to conduct all computation on open-source software, and also to make their own software open-source. I think we should encourage such movements, but I do not think we can make it norm of scientific community.

Similar things can be said about data. Actually this is my own main concern, but I skip disussing details for now. I think it is very constructive to create common data bases which everyone can access. But I do not think it constructive to inhibit studies which use data that cannot be entered to the data bases. Also I do not think it constructive to require preservation of all data which was used in scientific study in the form that facilitates easy reproduction.

Usually, reproducibility as a norm of science does not mean exact reproduction with the same data and the same program. Obviously, in medical studies, readers cannot usually access the same patient (at least at the same stage of cure) that the author mentions. I think that reproducibility in science means that the same general conclusion can be drawn from similar but not exactly the same data.
It is true that we think such studies that inhibit access to original data less credible. For example, some people in oil companies reconstructed the sea level in the past hundreds of millions of years, and the source data were trade secrets. We think it not so credible until another study with open data confirms their main conclsion. But then we think it intellectually richer to have two results than just open-source one.

By: Kooiti Masuda

Kooiti Masuda — Tue, 05 Jul 2005 07:49:06 +0000

Though I have not examined myself the paleoclimatic reconstruction by Mann et al., reputation among my fellow scientists suggests that their practice is good in estimating the most plausible temperature at each of past years given the data they had chosen. I think that their studies are not seriously flawed as climate reconstruction.

I instincitvely felt, however, that their reconstructed time series seemed too smooth to be true. But I think it is normal. Time series connecting best estimates (in the sense of least sum of squared errors or something alike) tend to be smooth when the number of data points are large enough. Even if we anticipate that there are troughs and ridges, we do not consider it better reconstruction to put ridges and troughs at arbitrary positions than leaving the smooth “best estimate” curve intact.

I think that what IPCC (2001) wanted to use the reconstruction is to put the latest decades in the context of past climate variability (e.g. variance of decadal mean temperature in a millenium). To this end, the studies by Mann et al. (1998, 1999) were better than nothing, but far from ideal. The good practice to have pointwise best estimates almost inevitably leads to underestimate variability. But I was not confident about this intuitive finding of mine until reading the paper by von Storch et al (2004, Science) which demonstrated the issue much more systematically.

By using an artificial example, von Storch et al. (2004) demonstrated weakness of usual methods of climate reconstruction in estimating past variability. They did not show, however, better methods for recustructing variability, perhaps except incorporating more input data (especially those excluded by Mann et al. because of not having annual time resolution). Reconstruction of climatic variability is a scientific challenge, and surely some advance will be achieved in this decade, but it is not guraranteed whether such quality that can be used as basis of policymaking can be achieved.