Inexpert Elicitation by RMS on Hurricanes
April 22nd, 2009Posted by: Roger Pielke, Jr.
Risk Managment Solutions (RMS) is a leading company that provides catastrophe models which are used to asses risk in the insurance and reinsurance industries. RMS plays a very important role in enabling companies and regulatiors to quantify risks and uncertainties associated with extreme events such as floods, hurricanes and terrorism.
I have in the past been somewhat critical of RMS for issuing short-term hurricane predictions (e.g., see here and here and here). I don’t believe that the science has demonstrated that such predictions can be made with any skill, and further, by issuing predictions, RMS creates at least the appearance of a conflict of interest as many of its customers will benefit (or lose) according to how these predictions are made. So RMS has an financial interest in the predictions that it issues. The issues faced by catastrophe modeling firms are similar to those faced by ratings agencies in the financial services sector.
A key part of RMS short-term prediction of hurricane activity is an expert elicitation. The elicitation is described in the paper by Jewsen et al. from RMS (PDF). I participated in the 2008 RMS expert elicitation held last October. The elicitation allows experts to allocate weights to different models of hurricane activity and then these weights are used to integrate the models for each expert, thus providing a view of that particular expert for the coming five years. For instance, if there were two models with annual landfall rates of 0.4 and 0.8 Catergory 3-4-5 hurricanes, and I decided to give the first 20% weight and the second 80%, then the result would be 0.4 * 20% + 0.8 * 80% = 0.72 landfalls per year. The views of the individual experts are combined to provide the results of the group elicitation. RMS conducts its elicitation with the experts blind to the results of the models, focusing instead on the models themselves.
The RMS elicitation has been controversial because it has resulted in a short-term prediction of activity higher than the climatological average, which means higher risk, and for coastal property owners the result is higher insurance rates. In 2006 for example the elicitation by the RMS panel of experts resulted in a prediction of 0.92 Category 3-4-5 storms making landfall every year for the period 2007 to 2011, which is considerably higher than the 1950-2006 average of 0.64. At the time, loss estimates increased by about 40%, with consequences for insurance and reinsurance rates.
Because Jewsen et al. conveniently provide the results for future storm activity from the 20 models used in the 2006 elicitation, I recently decided to do my own inexpert elicitation. I created a panel of 5 “monkeys” by allocating weights randomly across the 20 models for each of my participating monkeys. My panel of 5 monkeys came up with an estimated 0.91 storms per year for 2007 to 2011. I did this three more times, and these three different panels of “monkeys” projected 0.90, 0.93, 0.92 landfalling Category 3-4-5 storms per year. The RMS experts came up with 0.92.
In short, my panels of “monkeys” produced the exact same results as those produced by the RMS expert panels comprised of world leading scientists on hurricanes and climate. How can this be? The reason for this outcome is that the 20 models used in the expert elicitation (i.e., from Jewsen et al.) provide results that range from 0.63 on the low side to 1.21 at the high end, with an average of 0.90. Thus, if the elicitiors spread their weights across models the process will always result in values considerably higher than the historical average. The more spreading of weights and the more particpants will necessarily mean that the results will gravitate to the average across the models. So with apologies to my colleagues, we seem to be of no greater intellectual value to RMS than a bunch of monkeys. Though we do have credentials, which monkeys do not.
The stablity of results for the first 3 times that RMS ran its elicitation provided a hint that something in the process constrained possible results. In 2008 there was a downward change in the results as compared to the first three years. Based on the monkey exercise, I’d guess the change is largely due to the fact that the number of models used in the elicitation expanded to 39 from 20, and while I don’t have the output from the 39 models, I’d bet a beer that the average is lower than that of the previously used 20 models.
What does this mean? It means that the RMS elicitation process is biased to give results above the historical average. Of course, it may in fact be the case (and probably is) that many of the elicited experts actually do believe that storm activity will be enhanced during this period. However the fact that the RMS elicitation methodology does not distinguish between the views of these experts and a bunch of monkeys should give some concern about the fidelity of the process. Another way to look at it is that RMS is providing guidance on the future based on what RMS believes to be the case and the experts are added in the mix to provide some legitimacy to the process. I’m not a big fan of experts being used as props in the marketing of false certainties.
The RMS expert elicitation process is based on questionable atmospheric science and plain old bad social science. This alone should lead RMS to get out of the near-term prediction business. Adding in the appearance of a conflict of interest from clients who benefit when forecasts are made to emphasize risk above the historical average makes a stronger case for RMS to abandon this particular practice. RMS is a leading company with an important role in a major financial industry. It should let its users determine what information on possible futures they want to incorporate when using a catastrophe model. RMS should abandon its expert elicitation and its effort to predict future hurricane landfalls for the good of the industray, but also in service of its own reputation.
April 22nd, 2009 at 2:13 am
Bravo!
April 22nd, 2009 at 4:59 am
“I’m not a big fan of experts being used as props in the marketing of false certainties.”
I’m not either. Which is why I’m an AGW skeptic. Of course, I’m not a fan of Algore being used, either.
April 22nd, 2009 at 7:43 am
These projections for 2007-2011 have two years of history that should allow an evaluation of the skill in their predictions. Does this ever get reported and plotted to see if the skill is even in range? Also, as an aside, did these people learn their trade estimating appropriate CEO salaries?
April 22nd, 2009 at 8:56 am
-3-Sean
See:
http://sciencepolicy.colorado.edu/prometheus/evaluation-of-near-term-hurricane-loss-predictions-4800
April 22nd, 2009 at 9:53 am
Hi Roger,
Good stuff. (As usual.)
But I think your picture may be of chimpanzees, rather than monkeys.
Here’s a good picture to use, the next time you write about expert panels of chimps:
http://www.solarnavigator.net/animal_kingdom/animal_images/Chimpanzee_thinking_poster.jpg
April 24th, 2009 at 8:04 am
Thanks to Mark Bahner for pointing out this important difference. It is unfortunate that monkeys are so often used to imply lack of intelligence or random testing. Many well regard tests have been used to show that monkeys can in some cases demonstrate advance forms of intellect, including numerical compotence (see http://www.post-gazette.com/healthscience/20030412primate5.asp).
Chimpanzees, however, are better known for their cognitive research and numerical reasoning. They are social creatures that appear to be capable of empathy, altruism, self-awareness, cooperation in problem solving and learning through example and experience. Chimps even outperform humans in some memory tasks, as shown by Jane Goodall’s studies of chimpanzees in Gombe National Park in Northern Tanzania, starting in the 1960s.
See here for additional info
http://www.nytimes.com/2007/04/17/science/17chimp.html?ref=science
April 27th, 2009 at 2:36 am
I am the statistician who conducted the expert elicitation that Dr Pielke derides. I feel that I must answer his unbalanced criticism of the procedure that I adopted in collaboration with RMS. Like Dr Pielke, I was engaged by RMS as an expert to help them with the assessment of hurricane risks. My skills are in the area of probability and statistics, but in particular I have expertise in the process of elicitation of expert judgements. I am frequently dismayed by the way that some scientists seem unprepared to acknowledge the expertise of specialists in other fields from their own, and seem willing to speak out on topics for which they themselves have no specific training. During the elicitation exercise it was essential for me to trust the undoubted expertise that he and the other participants had in the science of hurricanes, and I wish that he had the courtesy to trust mine.
Let me now address Dr Pielke’s specific criticisms.
First, he says that the results obtained were indistinguishable from the results of randomly allocating weights between the various models, and he implies that this is inevitable. The latter implication is completely unjustified. I was not involved in the 2006 elicitation which Dr Pielke uses for his numerical illustration, but I can comment on the two most recent exercises. The experts were given freedom to allocate weights, and did so individually in quite non-random ways. In aggregate, they did not weight the models at all equally. The fact that the result came out in the middle of the range of separate model predictions in 2006 was therefore far from inevitable.
The elicitation exercise was designed to elicit the views of a range of experts. They were encouraged to share their views but to make their own judgements of weights. Dr Pielke says that the more experts we have, the more likely it is that the elicited average will come out in the middle, which is again fallacious. The result depends on the prevailing opinions in the community of experts from whom the participants were drawn. The experts who took part were not chosen by me or by RMS but by another expert panel. If, from amongst the models that RMS proposed, all the ones which would give high hurricane landfalling rates were rejected (and so given very low weights) by the experts, then the result would have ended up below the centre of the range of model predictions. The fact that it comes somewhere in the middle is suggestive, if it suggests anything at all, of RMS having done a good job in proposing models that reflected the range of scientific opinion in the field.
I think the above also answers Dr Pielke’s criticism of RMS’s potential conflict of interest. I agree that this potential is real. RMS is a commercial organisation and their clients are hugely money-focused. Nevertheless, as I have explained, the outcome of the elicitation exercise is driven by the judgements of the hurricane experts like Dr Pielke. Any attempt by RMS to bias the outcome by proposing biased models should fail if the experts are doing their job. If Dr Pielke is convinced, as he appears to be, that no model can improve on using the long-term average strike rate, then he could have allocated all of his weight to this model. That he did not do so is not the fault of RMS or of me.
This brings me back to the question of expertise. The elicitation was carefully designed to use to the full the expertise of the participants. We did not ask them to predict hurricane landfalling, which is in part a statistical exercise. What we asked them to do was to use their scientific skill and judgement to say which models were best founded in science, and so would give predictions that were most plausible to the scientific community. I believe that this shows full appreciation by RMS and myself of the expertise of Dr Pielke and his colleagues. For myself, the expertise that Dr Pielke seems to discount completely is based on familiarity with the findings of a huge and diverse literature, on practical experience eliciting judgements from experts in various fields, and on working with other experts in elicitation. In particular, I have collaborated extensively with psychologists and other social scientists. I don’t know how much Dr Pielke knows of such things, but to complain that what I do is “plain old bad social science” is an insult that I refute utterly.
Dr Pielke is no doubt highly-respected in his field, but should stick to what he knows best instead of casting unfounded slurs on the work of experts in other fields.
April 27th, 2009 at 9:35 am
Using monkeys, or great apes, for animal research is indefensible. Although you were not specific about which type of primate you used in your research (odd, considering you were so specific on their results), their importation has only perpetuated their species’ steady march towards extinction. As of now, 58 species of primates have been declared “endangered” by the US and another 13 are listed as “threatened”. Each year 32,000 wild primates are captured from the wild in order to be used primarily for testing and research. In order to bring your monkeys to the US for your research required the intrusion and destruction of their natural habitat, either in South America or western Africa. This not only depletes the natural resources for the remaining few primates, but also for their less-developed native countries. Responsible scientists need to start being aware of the harmful effects of their research.
As more forests in western Africa are depleted to supply superfluous research, like yours, primate habitats shrink and their ranks come closer to extinction. Not only that, but the once verdant forests become stinking cesspools, ripe for the breeding grounds of malarial mosquitoes, contributing to countless more deaths and instability in that most troubled continent. And by depleting crucial natural resources because monkeys were absolutely crucial to your research, the countries of western Africa now have even less chance of recovering their economies and creating stable governments. This is assuming you used chimpanzees, as the photo you used implies.
Assuming you were using monkeys, as you say in the article, their origins were probably from the South American rainforest. By having to enter the rainforest in order to obtain monkeys for export, the suppliers helped contribute to the world-wide destruction of the rainforest. An ecosystem that once covered 14% of the earth is now cut down to a mere 6% and still declining. In the Amazon rainforest not only native animals and plants are losing their habitat, but also the native peoples. An estimated ten million Native Amazonians lived in that forest in the year 1500, but that number is now below 200,000. Yes, disease and colonization of the New World contributed to many of their deaths, but there were many tribes living in isolation in the rainforest until the rest of the world started destroying their home and displacing entire tribes. One would have thought we’d have learned from the mistakes of the conquistadors. Instead, people burn the trees in order to force the monkeys to flee their homes—right into the waiting traps of suppliers ready to export them to research facilities all over the world.
So next time, instead of trying to make a clever statement by comparing competent, educated scientists to monkeys, consider using a computer model. It could make the same point without the terrible costs.
Help support the World Wildlife Fund and end the use of animals in research:
http://www.worldwildlife.org/home.html
April 29th, 2009 at 9:14 am
crc_fozzie
That was a wind up right? I sincerely hope so otherwise you have no sense of humour whatsoever.
tonyohagan, I must confess that if I was deluded enough to think that I was an expert in a field and that somehow that gave me authority over others and was likened to a monkey then I’d be a bit miffed just like you are. Thankfully although many people would call me an expert in certain fields e.g. web development, nuclear physics etc I am humble enough to admit that I am not, particularly given the subjective nature of who is or is not classed as an expert on a given subject. It’s all about who makes the judgement as to who is isn’t considered an expert. In this cas eit appears that you were the judge, so well done for picking a load of monkeys.
As Roger has more than adequately demonstrated in this case it matters not a jot whether you use a team of experts or a team of monkeys. In deference to and assuming that crc_fozzie’s post above wasn’t a wind up I assume that Roger didn’t in fact use monkeys so in this case no animals were in fact harmed as a consequence of his exercise (sorry WWF! you’ll have to get the money to save the soon to be extinct polar bears from someone else).
But on the other hand its no doubt nice work if you can get it I suspect (being chosen as an expert monkey that is!)
KevinUK
April 29th, 2009 at 1:16 pm
[...] fruit of an computationally intensive calculation inspired by a long conversation that began with Inexpert Elicitation by RMS on Hurricanes, by Roger Pielke Jr. and continued in Tony O’Hagan Responds (Not on Behalf of [...]
April 30th, 2009 at 1:56 pm
[...] duda viene de un intenso cálculo computacional inspirado por una vieja discusión que comenzó con Inexpert Elicitation by RMS on Hurricanes, de Roger Pielke, y fue continuada en Tony O’Hagan Responds (Not on Behalf of [...]