Flood Damage in the United States: Sources of Inaccuracy in the Flood Damage Data

Flood Damage in the United States, 1926-2003
A Reanalysis of National Weather Service Estimates

4. Sources of Inaccuracy in the Flood Damage Data

Sections 4 and 5 analyze the accuracy of flood damage data received from the NWS Hydrologic Information Center. The goals are to (1) identify errors, inconsistencies, and uncertainties in the estimates, and (2) assess the accuracy of the estimates. The analyses focus on national and state annual damage estimates for the period 1955-1998.

Discussions with staff and comparison of the available materials revealed several sources of inaccuracy and inconsistency in the time series of historical damage estimates:

Clerical errors
Inconsistency in reporting over time
Low precision of reported estimates
Inadequate estimation methods

Each source of inaccuracy is described briefly below. Many of the clerical errors were correctable. Inconsistencies are inevitable in data collected over a long time period; their existence should be noted, but the effects are not measurable. Assessment of the inaccuracy introduced by poor estimation methods is undertaken in Section 5.

A. Clerical Errors

These include mistakes in data entry, transcription, and labeling. Clerical errors were found and corrected, if possible, by comparing the data sets with published sources and material in the archive files. Mistaken labeling included, for example, the statement that all damages were summed by fiscal year (Oct. - Sep.) when, in fact, the national data had been summed by calendar year (Jan. - Dec.) through 1982.

B. Inconsistency in Reporting Over Time

Published NWS reports of flood damage are uniform in format and content for extended periods, leading us to assume that fairly consistent methods were used within the periods 1934-1979 and 1983-present (see Section 2). However, collection of flood damage data was greatly curtailed in 1980, then restarted in 1983 with a new purpose and less detailed reporting. Before 1980, the data were aggregated by river basin and calendar year with several types of flood loss itemized separately. After 1982, data were aggregated by state and fiscal year (Oct.-Sep.), at first with distinction between damage to property and crops, later with only the total of the two. The difference in data collection between the two periods introduces errors when one attempts to develop a uniform data series for the full timespan.

Inconsistency in spatial units:

Flooding naturally occurs in river basins, not necessarily bounded by individual states. When rivers form the state lines or floods cross state lines, assigning historical losses to the proper state is problematic. Our efforts to assemble estimates for 1976-1979 shed some light on the uncertainties involved. For example, the Wabash River rises in Indiana, but it forms a part of the border between Indiana and Illinois. NWS records on floods in 1976 and 1977 did not indicate how Wabash River flood damage should be divided between Indiana and Illinois; therefore, we had to decide the allocation arbitrarily. Another example is the Pearl River, which rises in Mississippi and flows through Louisiana. The NWS reported high flood losses in 1979 in the Pearl River and adjoining basins, including parts of Alabama, but we could not accurately assign the damage among the three states. It is likely that similar uncertainties existed when the NWS converted 1955-1975 river basin damage estimates into state estimates. Thus, occasional mistakes in assigning damage to particular states should be expected.

Inconsistency in time periods:

NWS flood reports have usually been filed monthly, but aggregation periods have changed. Fiscal or calendar years are useful for accounting purposes; water years (which differ by geographic location) are more meaningful for scientific purposes. For example, NWS use of calendar years (through 1979) was problematic in aggregating data for locations along the Pacific coast. There, December - January is the peak flood season, leading to uncertainty in assigning damage to the correct year. (It appears that the NWS resolved this by assigning all the damage from a particular flood season to the year in which the hydrologic flooding peaked.) The present use of October - September fiscal years corresponds well to water years across the U.S, since fewer floods occur in the autumn dry season.

Inconsistency in losses included:

NWS policies on what kinds of losses to include have changed somewhat over the years. Damage estimates published through 1975 focused primarily on damage to property and crops, but included some indirect losses (loss of business and wages, 1934-1947; a "miscellaneous" loss category, 1948-1975). Since 1975, estimates routinely collected for Storm Data have been labelled only as property damage and crop damage. Present policy is to focus exclusively on physical damage to property and crops (John Ogren, NWS, personal communication, 8/29/01). However, the estimates come from diverse independent sources, so other types of damage could be included occasionally.

The NWS process of collecting damage data has always focused more attention on larger floods. Possible inconsistencies related to the exclusion of floods involving low damage are examined in Section 6.

It is sometimes impossible to separate damage by flood and other storm-related causes (e.g. wind, hail, snow, or ice). Typically, the full amount has been labeled as flood damage if heavy rain or river flows are considered to be the primary cause. Thus, NWS flood damage estimates are sometimes inflated by including other causes. Conversely, flood damage may be omitted when the major cause of damage is wind (hurricanes, tornadoes), snow, or ice. These uncertainties have existed throughout the entire data series and sometimes lead to incompatibilities with data from other agencies.

C. Low Precision of Reported Estimates

The estimates have always been collected from myriad sources, differing greatly in precision and accuracy. Field office estimates sometimes include very precise figures; more often they give only one or two significant digits. Aggregated sums give a misleading impression of greater precision. For example, separate estimates of $7 million, $400,000, and $17,000 add to a more precise-looking annual estimate of $7,417,000 but the accuracy is limited by that of the largest estimate ($7 million, in this case).

Even one-digit accuracy is not assured. Published reports sometimes disagree greatly on the amount of damage in a particular flood event. For example, shortly after the failure of the Teton Dam in Idaho in 1976, damage estimates ranged from $400 million to $1 billion (Chadwick et al. 1976). In subsequent reports from several agencies, the $1 billion estimate was used repeatedly with no further refinement (for example, USACE Walla Walla District 1977). A final report on the Teton Dam failure (Eikenberry et al. 1980) gave the only specific figures: loss of a $102.4 million project investment and over $315 million paid to more than 7,500 claimants. This establishes a minimum loss of about $417 million, but only covers a portion of the total damage. In creating the reanalyzed data set, we chose to use the geometric mean of the minimum and maximum estimates, producing a damage estimate of $650 million.

After NWS reports on flood damage were discontinued in 1980, Storm Data became the primary source of flood damage estimates (see Section 2). From 1980 until about 1984, the accuracy of available estimates is limited by Storm Data reporting procedures. At that time, NWS field offices reported damage estimates by checking categories on the following logarithmic scale:

Less than $50
$50 to $500
$500 to $5,000
$5,000 to $50,000
$50,000 to $500,000
$500,000 to $5 million
$5 million to $50 million
$50 million to $500 million
$500 million to $5 billion

Such estimates indicated only the order of magnitude of the damage (e.g. roughly a $100,000 flood, a $1 million flood, a $10 million flood). Occasionally, more specific damage estimates were included in narrative descriptions of a flood event.

To add a set of these categorical estimates, each category must be assigned a point value. Proportional errors are minimized by using the geometric mean of a category’s end points. That is, category k is from $0.5 × 10^k to $5 × 10^k (when k > 1), so the best estimate is

(2.5)0.5 × 10^k = 1.58 × 10^k

However, the individual estimates could be in error by more than a factor of 3. For example, an event with damage originally estimated anywhere between $500,000 and $5 million would be entered into the data set as damage of $1.58 million. This is about 3 times higher than an estimate at the low end of the range, and about 1/3 of an estimate at the high end of the range.

Errors associated with these logarithmic categories are of concern primarily in the 1980-1984 flood damage estimates. By 1985, it appears that NWS-HIC had instituted some follow-up checking and refinement of the estimates, at least for major floods. Use of logarithmic categories in Storm Data was discontinued in 1995. Since then, one- or two-digit estimates have been given in thousands or millions of dollars (e.g. $60K or $3.2M).

D. Inadequate Estimation Methods

Potentially the most serious source of inaccuracy is the ad hoc approach to obtaining damage estimates from each NWS field office (described in Section 2). The estimates are collected by staff members who have little or no training in damage estimation and who rely on diverse sources. Estimation methods used by their sources are unknown, and completeness of coverage varies. Estimates are usually obtained within 2 months after a flood event and are not compared by the NWS with records of actual damage.

Incomplete reports and omissions:

A state emergency management official (Kay Phillips, Ohio Emergency Management Agency, personal communication, 7/25/00) complains that the NWS calls her asking for a damage estimate within a few weeks after a disaster. At that time, the extent of damage is unknown and emergency managers are scurrying to respond to immediate needs. They have some knowledge of losses to individuals, but little knowledge of damage to infrastructure, which makes up a large part of total losses. Thus, in her opinion, early loss estimates tend to be much too low in relation to final tabulations.

An example of underestimation is the NWS damage estimate for California flooding associated with Hurricane Kathleen in 1976. The NWS dataset (which had not been fully updated because annual summaries were discontinued that year) gave a damage estimate of $42 million, whereas estimates in subsequent published reports (e.g., Montane 1999) are 3 to 4 times higher.

Errors of omission occur when a significant flood event is overlooked entirely. For example, flash floods in California in July 1979 caused damage estimated at $26-50 million (Montane 1999), but the NWS dataset reported no damage.

Potential biases:

A substantial bias toward underestimation is expected due to incomplete reporting and omission of some floods. However, we hypothesize that some damage estimates provided to the NWS field offices might be biased upward if, for example, losses were exaggerated to improve chances of getting state or federal assistance. Accuracy and bias in early damage estimates are examined in Section 5.

[ Back ] [ Next ]

NOAA Disclaimer

A report of the University Corporation for Atmospheric Research, supported by the National Science Foundation, the National Weather Service, and the National Oceanic and Atmospheric Administration, Office of Global Programs, pursuant to NOAA Award No. NA96GP0451 through a cooperative agreement. In partnership between the Environmental and Societal Impacts Group of the National Center for Atmospheric Research and the Center for Science and Technology Policy Research, Cooperative Institute for Research in Environmental Sciences, University of Colorado.