Baron, J., & Greene, J. (1996). Determinants of insensitivity to quantity in valuation of public goods: contribution, warm glow, budget constraints, availability, and prominence. Journal of Experimental Psychology: Applied, 2, 107-125.

Determinants of insensitivity to quantity in valuation of public goods: Moral satisfaction, budget constraints, availability, and prominence

Jonathan Baron¹ and Joshua Greene²
University of Pennsylvania

Abstract

Insensitivity to quantity in valuation of public goods has been shown in three ways: the embedding effect (in which willingness to pay [WTP] for a good is smaller if assessed after a superordinate good), the quantity effect (relative insensitivity to numerical quantity), and the adding-up effect (WTP for two goods less than inferred from WTP for each good alone). We test four explanations of these effects: moral-satisfaction (people think of the task as a contribution, or they get a warm glow from participation), budget-constraints (people limit expenditures for certain goods), availability (people fail to think of other goods of the same type as the good evaluated), and prominence (people overattend to the type of good and underattend to its quantity). Contribution accounts are insufficient because insensitivity is found even when the contribution idea is removed through the use of a trigger-price mechanism. Budget constraint accounts are ruled out because subjects are still insensitive with willingness-to-accept, and because subjects are sensitive with unit-price WTP. Sensitivity is also increased by presentation of a full context (supporting the availability explanation), by asking about a private good rather than an equivalent public good (supporting moral satisfaction), by asking for good-good comparisons (supporting the prominence account) and by asking subjects to fill in the quantity of the good (again supporting prominence). These findings support the critics of contingent valuation, but they suggest that some of the methods of decision analysis can improve the eliciatation of economic values.

Introduction

Researchers often measure the economic value of public goods by asking people how much money is equivalent to a good. In the contingent-valuation (CV) method, subjects indicate their willingness to pay (their WTP) for the good in question. CV is used in several countries for assessing environmental damage from oil and chemical spills and for evaluating environmental and safety effects of government policies. CV is a matter of contentious debate and litigation (Portney, 1994). A large part of the problem is that CV judgments are sometimes remarkably insensitive to the quantity or scope of the good provided. This insensitivity provides ammunition to critics who say that CV does not measure economic value (e.g., Diamond & Hausman, 1994). Defenders of CV have argued that insensitivity does not necessarily imply faulty measurement (e.g., Hanemann, 1994) or that CV can be made more sensitive if properly conducted (e.g., Schuman, 1995). At issue is the explanation of the insensitivity effects themselves. The source of the effects has implications for whether the problem of insensitivity is serious and, if it is, how to make people more sensitive.

We shall first summarize the evidence for insensitivity and then discuss some possible explanations and their implications. Insensitivity is a pervasive phenomenon, not dependent on the details of method. Accordingly, although we used a variety of methods to study these effects in experiments we shall report, we see no reason why our conclusions would not apply just as well to similar methods that we did not use.

Types of quantity insensitivity

Several types of effects have been identified.

Embedding. Kahneman and Knetsch (1992) asked some subjects their WTP for improved disaster preparedness and other subjects their WTP for improved rescue equipment and personnel. The improved equipment and personnel were thus ``embedded in'' the improved disaster preparedness, so the preparedness included the equipment and personnel, and other things too. WTP was, however, about the same for the larger good and for the smaller good included in it. Kahneman and Knetsch called this the ``perfect embedding effect,'' presumably because a demonstration of it requires perfect equality of WTP of the two goods. When subjects were asked their WTP for the smaller good after they had just been asked about the larger one, they gave much smaller values for the smaller good than for the larger one, and much smaller values than those given by subjects who were asked just about the smaller good. This order effect is called the ``regular embedding effect.'' It demonstrates that a good seen as embedded in a larger good has reduced value. Kemp & Maxwell (1993) replicated this regular embedding effect, starting with a broad spectrum of public goods, and narrowing the good down in several steps, obtaining WTPs for an embedded good that were 1/300 of WTP for the same good in isolation.

Adding up. In a related demonstration, Diamond et al. (1993), asked subjects their WTP values for preventing timber harvesting in federally protected wilderness areas. WTP for the prohibition in three areas was not much (if any) higher than WTP for prohibition in one of the areas alone. This result cannot be explained by assuming that subjects thought that protection of one area was sufficient: when they were asked their WTP to protect one area assuming that another was already protected, or a third area assuming that two were protected, their WTP values were just as high as those for protecting the first area. More generally, in this kind of ``adding-up effect,'' respondents are asked their WTP for good A (e.g., a single wilderness area), for good B assuming that good A has been provided already, and for goods A and B together. WTP for A and B together is much lower than the sum of the WTP for A and for B (with A provided). Schulze, McClelland, & Lazo (1994) found similar results in a within-subject design: each subject rated A, B, and A and B together.

Quantity effect. Jones-Lee, Loomes, and Philips (1993) asked subjects to evaluate hypothetical automobile safety devices that would reduce the risk of road injuries. WTP judgments were on the average only 20% higher for a risk reduction of 12 in 100,000 than for a reduction of 4 in 100,000. In general, the rate of substitution between money and the good, the dollars per unit, depends strongly on the amount of the good. This makes it difficult to generalize results to different amounts, a kind of generalization that is almost always required (Baron, 1995). Many other results of the same sort are reported by investigators who are content to show that WTP differs, if only by a small proportion, for a difference in goods of several orders of magnitude, resulting in extreme differences in rate of substitution (e.g., Boyle et al., 1994; Carson & Mitchell, 1993, p. 1265).

Explanations of quantity insensitivity

The present paper concerns the explanation of these effects. Several explanations are possible.

Moral satisfaction. Kahneman and his colleagues have suggested that WTP judgments are expressions of attitudes about the moral satisfaction to be obtained from contributing. WTP responses are highly correlated with judgments of satisfaction and of importance (Kahneman & Knetsch, 1992; Kahneman, Ritov, Jacowitz, & Grant, 1993) and with the judged importance of the issues examined and the respondent's feeling of responsibility for them (Guagnano et al., 1994).

Two subforms of the moral-satisfaction hypothesis may be distinguished, which we call the contribution account and the warm-glow account. By the contribution account, respondents think of the task as a charitable contribution. If you contribute to Oxfam, for example, your contribution goes into a pot of money that will help people. You can think of this pot as helping the whole world (spread very thin) or as helping a particular village or family. The amount of good your contribution does depends mainly on its size, not on the size the pot that it goes into.

Crucial to the contribution account is the subjects' assumption that each contribution increases the total amount of the good. This account predicts, then, that subjects will be more sensitive to quantity if they understand that the size of their contribution does not affect the size of the good provided. In our experiments, we tell subjects that the size of the good is fixed and that their WTP will be compared to their fair share of the cost of the good. If more than half of the respondents are willing to pay at least their fair share, then the good will be provided, and otherwise it will not.

The warm-glow account holds that WTP depends on personal agency or participation rather than on the expected consequences of a contribution. This is, we believe, equivalent to the proposals of Andreoni (1990) and Margolis (1982). (Baron & Spranca, 1995, discuss other ``agent relative'' principles that affect WTP.) This account predicts that sensitivity to quantity should increase if subjects simply provide ratings of the importance of various public goods, since no participation is involved. Kahneman and Ritov (1994) found that ratings of importance and support for interventions were insensitive to quantity, and we have replicated this result with ratings of satisfaction and benefit (Baron et al., 1993, and other unpublished work). Jones-Lee et al. (1993) suggest that their demonstration of the quantity effect is difficult to explain in terms of moral satisfaction (presumably the warm-glow version), because the good in question, auto safety, is private.

We might imagine a form of the warm-glow account in which satisfaction with provision of a new good is largely independent of the quantity of the good and does not depend on the individual's involvement; we take this to be equivalent to the prominence hypothesis, which we describe later.

Budget constraints. People sometimes say that they will ``contribute as much as they can afford.'' Taken literally, this is implausible, but people may view certain expenditures as setting precedents for similar expenditures, or they may think of their spending in terms of separate, limited accounts (Thaler, 1993), including, perhaps, accounts for public goods. People may answer with their limit when they are asked their WTP for one unit of a good. According to this account, insensitivity would be found only in WTP judgments, not in judgments of willingness to accept money to give up the good in question (WTA). We tested this prediction. Dubourg et al. (1994, p. 128) found insensitivity in WTA judgments, although they did not point out the relevance of this finding to the budget hypothesis. We report three replications of this important result.

A related explanation holds that insensitivity results from substitution effects (Hanemann, 1994; Carson & Mitchell, 1995). For example, subjects may think of wilderness areas as substitutable, and they may think that preserving one of them is less valuable when another is already preserved. They would then pay little more for two than for one. (Note that this is not the result of Diamond et al., 1993, because subjects were asked how much they would pay for the second assuming they had the first.) More generally, the marginal utility of goods of a certain sort may be declining. This explanation, too, predicts that insensitivity would not be found in WTA judgments. Indeed, it predicts oversensitivity. For example, subjects would require a very large amount of money to give up the second of two wilderness areas although they might require only a little money to give up one of them. Because the substitution account makes the same predictions for the present studies as the budget-constraint account, we shall lump them together henceforth.

If these accounts explain the results, then insensitivity need not be a problem for certain purposes. People are giving their true WTP for what they are asked about.

Availability. This explanation holds that subjects fail to think of other goods of the same type as the good evaluated, unless these other goods are explicitly presented. It is related to the finding that subjects assign too little probability to ``all other problems'' when they are given a tree of possible faults in a system (Fischhoff, Slovic, & Lichtenstein, 1978). This account is also similar to the ``part-whole'' bias described by Mitchell and Carson (1989). This account implies that insensitivity is a real problem, but it might be cured by reminding people about other goods.

Prominence. This explanation of insensitivity is based on a loose analogy with several results from the literature on decision making and decision analysis (Fischer, 1995; Tversky et al., 1988; Zakay, 1990). The essence of the idea is that people assign ``importance'' to attributes, dimensions, or types of goods, even if they do not know the magnitude or range of variation at issue. For example, when most people are asked which is more important, life or money, they will confidently answer that life is. If they are then asked whether they would spend the annual GNP of the United States in order to reduce someone's chance of death by .0001 this year, they feel tricked. In decision analysis, naive subjects often try to assign numerical weights to dimensions without knowing the range of each dimension. In the case of CV, when people are asked to say something about the importance of cleaning up oil spills, they respond to the type of good rather than the amount of it, even though they are asked to respond with an amount of money. When they are asked about two goods, they respond to their average importance rather than their total. The prominence explanation also implies that insensitivity is a real problem, but one that might be cured by using the methods of decision analysis, where it is apparently not a problem.

The present studies test these hypotheses as necessary or sufficient accounts of the various forms of insensitivity to quantity. The studies do not test every explanation of every effect, and materials are varied from study to study, sacrificing systematicity for variety. Table 4, and the Discussion, present an overview of the hypotheses and results.

In general we speak of both ``sensitivity'' and ``insensitivity.'' Complete insensitivity means that subjects ignore quanitity; proportional sensitivity means that the subject's price (e.g., WTP) is proportional to the magnitude of the good. We think of a continuum between these two poles. Sensitivity is the distance from complete insensitivity, and insensitivity is the distance from proportional sensitivity. A little insensitivity need not imply any problem of measurement. This measure, however, is useful for comparing experimental conditions.

Tests of moral satisfaction (contribution)

Study 1: Referendum and public safety

This study tests the contribution version of the moral-satisfaction account by comparing a standard condition, in which subjects are free to think of their WTP as a contribution, to a condition in which they are explicitly told that the good is fixed and that their WTP will be compared to their fair share of the good. The good will be provided if and only if at least half of the subjects are willing to pay their share. We call this a ``referendum'' condition because it is equivalent to majority rule: if your WTP is more than your share, then you would presumably vote for providing the good.

We tested this hypothesis in the context of tuition increases for increased public safety at the University of Pennsylvania, an urban university where safety is a salient issue for students. Subjects were all students. Within each group - referendum vs.\ no-referendum - about half of the subjects were asked about the small good only, and half were asked about the large good, and then about the small good. In order to encourage the contribution interpretation in the no-referendum condition, the amount of increase in the good was not specified, so that subjects were free to think that the increase would depend on their WTP. In the referendum condition, in order to discourage the contribution interpretation, the increase was specified as 10%, and subjects were told that the good would either be provided or not.

For about half of the subjects in each of these subgroups, the good would be provided only after the students graduated, so that the payment could not be seen as a private purchase of additional safety. Although the safety programs were truly public goods, many students in pilot studies tended to think of the payments as providing direct benefit to the individual who paid - although this benefit is a tiny fraction of the overall benefit. By saying that the program would begin only after the student left, we made it entirely a matter of altruism. In sum, then, we used a 2x2x2 between-subject design: referendum by order (large-small vs. small only) by altruism.

Method

Subjects were approached in campus residences and on walkways and offered $0.50 to complete a brief questionnaire. They were told that it was a psychology study, although the situation was so realistic that many had to be convinced that it was not being conducted for the University. There were 204 subjects in all, with at least 25 in each of the 8 groups.

In the no-referendum, altruism, large-small condition, the questionnaire read:

Consider a proposal to increase the security on Penn's campus and the surrounding area. The program would be funded by a student tuition increase. You would pay the tuition increase, but the new program would not begin until after you graduate.
For those students with financial aid, the increase will be pro-rated. Answer these questions on the assumption that you pay full tuition, even if you pay less. The exact tuition increase would depend on your response, as well as other Penn students' responses.
1. The extra amount of tuition would directly increase all safety programs equally - police patrols, escort services, emergency phones, etc. What is the greatest amount in dollars per year that you would be willing to pay? $
2. The proposal includes an increase in the amount of money spent on police in patrol cars. If the extra money were spent only on police in patrol cars (with all other spending unchanged), what is the largest amount in dollars per year that you would be willing to pay? (You would pay the tuition increase, but the new program would not begin until after you graduate.) $

In the small conditions, subjects were asked only about the patrol cars. In the no-altruism conditions, the statements about when the new program would begin were omitted. In the referendum conditions, all increases were specified as 10%, and the following sentence was added immediately after the question about willingness to pay but before the answer blank: ``If the median (middle) response to this question is greater than each student's part of the actual cost of the program, then the program will go into effect, and everyone will pay his or her part of the actual cost. Otherwise, the program will not go into effect.'' Subjects were also asked their sex and year in college.

Results

The embedding effect was found overall, and its magnitude was insensitive to referendum vs. no-referendum.

Over all conditions, the mean response was $148 (s.d.=$240) for the large good and $162 (s.d.=$441) for the small good. However, as is usually found, the response distribution was highly skewed: many responses were zero (29% for the large good, 26% for the small good, over all conditions, including one response of $0.0001); and some responses were unrealistically high (12 responses of $1,000 or more). We therefore used a method suggested by McFadden & Leonard (1993, also used by Green et al., 1994), in which the WTPs were log transformed and the numbers of $0 responses were analyzed separately. Table 1 shows the geometric mean responses by condition, for nonzero WTPs.

- Insert Table 1. -

In a multiple regression in which the log-transformed WTP for the small good was predicted from altruism vs. no altruism, referendum vs. no referendum, embedding (large good first) vs.\ no embedding, and the interaction of referendum and embedding, only the effect of embedding was significant (p = .009).³ The embedding effect was significantly greater in females than in males (p = .027 for the sex by embedding interaction). (The triple interaction of sex, embedding, and referendum was not significant. Nor was the triple interaction of altruism, embedding, and referendum.) A parallel analysis of the first response made - the large good in the embedded condition, the small good otherwise - found no effect of good size (embedding condition), altruism (whether the subject benefited), referendum, or any interaction. In sum, the WTP measure, excluding subjects with zero WTP, showed both the regular embedding effect and the perfect embedding effect, and no effect of altruism.

Log-linear analyses for the zero responses to the small good showed effects of embedding and altruism (simple 1 df c² tests yielding c² of 5.8 and 4.7, p = .030 and .016, respectively), but no effect of referendum, and no interactions. In sum, altruism affects the percent of zero responses (33% in the altruism condition vs. 18% in the no-altruism condition), although it does not affect WTP once the decision is made to pay something. Most importantly, an embedding effect is found here too, and, again, the effect does not differ between referendum and no-referendum conditions.

Study 2: Referendum and medical insurance for mental health

The second study tested the effect of a referendum format in the context of government medical insurance.

Method

Subjects were 87 people solicited in Philadelphia's main railroad station (36 females, 51 males, mean age 37).

The questionnaire asked subjects to imagine that, in a few years, everyone in the U.S. will have at least a standard medical insurance package costing $1,000 per year on the average, paid though payroll deductions and taxes, with payment adjusted for ability to pay. The mental health coverage in the standard package is limited to ``up to 5 hours of psychotherapy per year and for up to 10 days in a mental hospital.'' At issue was willingness to pay in order to extend the standard package so that it covered 25 hours of psychotherapy, 30 days in a mental hospital, or both. The questions appeared in three different orders: both, therapy, hospital; therapy, hospital, both; hospital, therapy, both.

In the referendum condition, the WTP question was followed by, ``If the typical answer to this question is high enough to cover the cost per person of the broader coverage, then the coverage will be increased and everyone will pay their share. Otherwise, the coverage will not be increased. The `typical answer' is the middle: half of the answers are higher, and half are lower.''

Results

Forty-nine percent of the subjects gave at least one $0 response. Analysis of the remaining subjects yielded no embedding effects but no effects of one vs. two coverages either. Accordingly, the analysis focused on the number of zero responses. The embedding effect was found in both conditions. That is, a zero response was more likely when a good was embedded than when it was presented first. The main result was that the referendum format did not reduce the effect. This and Study 1 are the first reported demonstrations of an embedding effect for number of zero WTPs.

Specifically, there were more zero responses for hospitalization when it followed ``both'' (60% in the referendum condition, 50% in the control condition) than when it came first in the questionnaire (15% referendum, 24% control). Likewise, there were more zeros for psychotherapy when it followed ``both'' (60% referendum, 50% control) than when it came first (31% referendum, 14% control). Log-linear analyses using order (first vs. second), referendum, and zero to classify subjects rejected a model in which zero responses were independent of order (p = .001 for hospitalization, p = .012 for psychotherapy). A model assuming that zero responses depended on order only was not rejected for either treatment. Hence, the effect of order seemed to be the same for both referendum and control conditions.

Study 3: Referendum and medical insurance for cancer and transplants

The third study used treatments that subjects were more likely to consider worth insuring against.

Method

Subjects were 210 people solicited in Philadelphia's main railroad station (90 females, 120 males, mean age 43) and 106 students (52 females, 54 males, mean age 20), solicited by advertisements and paid $6 per hour for filling out questionnaires in a quiet room.

The questionnaire was like that of Study 2, except that the standard package did not cover ``organ transplants'' and ``certain expensive cancer treatments.'' Only the rich would be able to afford extra insurance for these. We asked subjects how much extra they would be willing to pay (over $1,000) for extra coverage within the standard package for transplants, cancer treatments, and for both. At issue was whether their WTP for both was as high as the sum of their WTP for each of the two goods.

The questions appeared in three different orders: both, transplant, cancer; transplant, cancer, both; cancer, transplant, both. We folded the questionnaire so that subjects answered the first question without seeing the last two, to facilitate between-subjects analysis of the first question.

In the referendum condition, the WTP question was followed by, ``If most people (more than 50%)are willing to pay their share of the cost of the broader coverage, then it will be provided and everyone will pay their share. Otherwise, the broader coverage will not be provided.''

Results

Both conditions showed an embedding effect. The referendum format did not reduce it, either in a between-subject analysis of the first response or a within-subject analysis of differences in WTP for the three questions. In the between-subject analysis, log WTP for the first item did not depend significantly on whether it was single or double, in either the referendum or no-referendum condition, and condition did not interact with order. In the no-referendum condition, the geometric mean WTPs were $425 for the single good and $364 for the double good. In the referendum condition, the respective means were $467 and $364. The log of twice the WTP for the single good is significantly greater than the log WTP for the double good (t = 3.9, p < .001); hence, the procedure is capable of detecting a difference. The ``perfect'' embedding effect was thus found for the first item presented.

Twenty-nine responses to the first item, 9.5%, were zero. Twenty-eight of these responses were from nonstudents (p < .001 for the association). Otherwise, students and nonstudents did not differ in any way. (Nor were any effects of sex or age significant.) The proportion of zero responses did not depend on referendum condition, on whether the double item was presented first, or on the interaction of these variables (in a log-linear analysis).

In a within-subjects analysis the log of the sum of WTPs for the two goods presented individually was higher than the log of the WTP for both goods together, significantly for both conditions (geometric means of $727 vs. $513 for the referendum condition, $633 vs. $491 for the no-referendum condition; p < .001 for both by a Wilcoxon test). This difference was significantly higher for the referendum condition (p = .02, U test) - a result opposite to the hypothesis - but it was unaffected by other variables.

Tests of budget constraints (and moral satisfaction)

Study 4: Accepting tax cuts vs. paying tax increases

The results for ratings cast doubt on the budget-constraint hypothesis. Another test of this hypothesis (and related hypotheses based on substitution effects) is to compare WTP and WTA (willingness to accept). These hypotheses all predict substantial sensitivity in WTA. Study 4 examines the quantity effect and the adding-up effect within subjects in both WTP and WTA for government programs.

Method

Thirty-two students successfully completed a questionnaire. Seventeen did WTP judgments first, 15 did WTA judgments first. The questionnaire began, ``This questionnaire concerns the value you place on various programs paid for by the U.S. government. We ask how much you are willing to pay for increases in these programs, or how much tax reduction you are willing to accept in return for cuts.'' Subjects were asked to answer in mills, with one mill being 0.1% of current taxes, and a table was provided with taxes of $1,000, $5,000, $10,000, and $50,000 as examples. (For example, one mill would be $10 additional tax if you are now paying $10,000.)

Subjects in the WTP-first condition were asked, ``What is the highest tax increase that you would be willing to have added to everyone's taxes for each of the following increases in government spending? Imagine that everyone in the U.S. was asked this question. If more than half said they were willing to have their taxes raised enough to pay for the increase, then government spending would be increased and taxes would be raised enough to pay for it. Also imagine that each item is the only proposal for increases this year.'' Subjects then answered a test question about whether they were to write the most they were willing to pay for each increase, the least they were willing to pay for each increase, the least they were willing to accept, etc.

The WTA condition read, ``What is the lowest tax reduction in everyone's taxes at which you would be willing to see each of the following cuts in government spending? Imagine that everyone in the U.S. was asked this question. If more than half said they were willing to accept a tax reduction less than what could be saved by cutting the program, then the program would be cut and everyone's taxes would be reduced by the amount saved. Also imagine that each item is the only proposal for cuts this year.'' The test question was the same. Twenty-three subjects (in addition to the 32 used) did not answer this question correctly, so their data were not used. Some of these subjects were interviewed, and they seemed to understand the task but not the test question itself, so this procedure is conservative.

The items were then presented in groups of five, e.g.:
10% increase in acquisition of land for national parks
20% increase in acquisition of land for national parks
10% increase in aid for family planning programs for the poor in the U.S.
20% increase in aid for family planning programs for the poor in the U.S.
10% increase in acquisition of land for national parks and in aid for family planning programs for the poor in the U.S.
(The word ``cut'' replaced ``increase'' in the WTA condition.) Comparison of 10% and 20% measures the quantity effect (insufficient sensitivity to quantity) and comparison of the two 10% items with the double item measures the adding-up effect.

The remaining pairs of items were based on the following programs, in order: aid for international family planning and women's health programs, aid for international vaccination programs against polio, etc.; aid for vaccination of low-income children in the U.S., aid to police departments for fighting crime; AIDS research, cancer research; cleanup of hazardous chemicals, drug treatment programs for users; enforcement of race-discrimination laws by the Justice Department, enforcement of sex-discrimination laws by the Justice Department; highway safety, low-income housing; maintenance of national parks, public transportation construction; research on climate change, research on education; research on heart disease, and support for U.N.\ peace-keeping activities.

Results

Sensitivity was no greater in the WTA condition than in the WTP condition. To measure in sensitivity, we first averaged all the responses in each category for each subject in each condition (WTP and WTA): small (10%), large (20%), and both small goods together. To assess the quantity effect (a measure of insensitivity), we subtracted the log of the mean of the large goods from the log of twice the mean of the small goods. This measure would be zero if the rate of substitution were constant, that is, if subjects were completely sensitive to quantity, and positive if they are insensitive. To assess the adding-up effect, we subtracted the log mean of the double good from the log of twice the mean of the two small goods. This measure would also be zero if the price of the double good were the sum of the prices of its components, and it would be positive if subjects are insensitive.

- Insert Table 2. -

Table 2 shows the relevant geometric means across subjects. It is apparent, and statistical analysis confirmed, that both the quantity effect and the adding-up effect occurred in both conditions (WTA and WTP). That is, the values of the large goods were less than twice the value of the small goods, and the value of both goods was less than twice the value of the small goods (t > 4.5, p < .0005 for all comparisons). The magnitude of the quantity effect (difference of logs) was exactly equal in the WTA and WTP conditions, and the magnitude of the adding-up effect was slightly, but not significantly, larger in the WTA condition. The overall WTA vs. WTP difference was not quite significant.

The quantity effect for WTP was correlated with the quantity effect for WTA across subjects (r = .59, p < .0005), and likewise for the adding-up effect (r = .60, p < .0005). Correlations between the two types of effects were also mostly significant but were not so high (WTP adding-up with WTP quantity, r = .50, p = .004; WTA adding-up with WTA quantity, r = .31, p = .09; WTP adding-up with WTA quantity, r = .09, N.S.; WTA adding-up with WTP quantity, r = .44, p = .011). Canonical correlation analysis found only a single significant canonical correlate when the two adding-up measures (WTP and WTA) were predicted from the two quantity measures, but it found two significant correlates when the two WTP measures were predicted from the two WTA measures. In sum, different factors affect whether subjects show adding-up or quantity effects, but there was no evidence for different factors affecting WTA and WTP.

The results cast further doubt on the budget-constraint hypothesis, but they are consistent with the prominence hypothesis (which holds that people base their response on the type of good rather than the quantity). One subject gave identical responses for both large and small magnitudes of the same good, although her responses were different for different goods. When asked why she gave the same response for the two different magnitudes, she said (roughly), ``If I paid more for the larger amount of good X, then that would be more than what I was willing to pay for the smaller amount of good Y, and that would be wrong because Y is more important than X.'' Respondents could be making such comparisons implicitly, even when they are not asked about other goods in the same survey.

Study 5. Quantity sensitivity in WTP vs. WTA

Study 5 again compared quantity effects in WTP and WTA. The good was provision of public safety by the University of Pennsylvania.

Method

Subjects were 44 paid volunteers, including 42 students, 18 of these at the University of Pennsylvania, 24 at the nearby Philadelphia College of Pharmacy and Science.

A questionnaire began with a summary of crime statistics reported by the University: about 150 per 100,000 students and employees were victims of reported crimes against persons (assault, robbery, and rape) each year. The questionnaire also summarized the University's public-safety system: campus police, security personnel, emergency telephones, etc. The items in the questionnaire asked about WTP for tuition increases for increased public safety (pro rated for those with financial aid) or WTA for reductions resulting from decreased public safety. The expected effects of the increases or reductions were described in terms of the annual rate of crimes against persons per 100,000. For half of the subjects, the first item asked for WTP for a reduction from 150 to 100, or ``WTP150-100.'' The other questions were WTA150-200, WTP200-150, WTP200-100, WTA100-150, and WTA100-200. The order was reversed for the remaining subjects.

Results

Subjects were equally insensitive to quantity for WTP and WTA. Also, WTA was larger than WTP. Geometric means were $1067 for WTA150-200, $791 for WTA100-150, $1227 for WTA100-200, $472 for WTP200-150, $404 for WTP150-100, and $729 for WTP200-100. (Two subjects also said that they would not accept any amount for an increase of 50, and one additional subject would not accept any amount for an increase of 100.) WTP insensitivity was defined as the log of (WTP200-150 + WTP150-100)/WTP200-100, and WTA insensitivity was analogous. This measure would be zero if subjects are perfectly sensitive, and positive if they are insensitive. The mean insensitivity measures were .226 and .421 for WTP and WTA, respectively. Both were significantly greater than zero (p < .0005, t test), but they were not significantly different. WTA was significantly larger than WTP (p < .0005, t test) for each of the three comparisons (e.g., WTP150-100 vs. WTA100-150). The results did not depend on whether subjects were students at the University of Pennsylvania or not.

Study 6. WTP vs. WTA, between subjects

Study 6 compared WTP and WTA for quantity effects in a between-subject design using several goods at once.

Method

Each of 43 subject (students, selected as in previous studies) saw only one amount of each good, either high or low in quantity. Each of twelve items had a large- and small-quantity version. The items were (paraphrased, with the small version in brackets): saving 40,000 [4,000] acres of Alaskan forest; preventing 10,000 [1,000] Americans from passing a kidney stone; saving 30,000 [3,000] of a rare species of fish; preventing 80,000 [8,000] drunk driving accidents; eliminating pollution from rivers in the eastern U.S. [the Schuylkill River]; allowing 100 [10] dying men to see their only children once more; freeing 1,200 [120] innocent political prisoners; see your favorite football team deservedly win the Superbowl for three years [one year]; restore sight to 90,000 [9,000] blind Americans; allow 1,100 [110] American athletes to participate in the Olympics; inform 10,000 [1,000] people falsely diagnosed as having a fatal disease that they do not have it; allow 15,000 [1,500] alcoholics to recover. Half of the subjects did the WTP condition first, half, the WTA condition. For half the subjects in each order (WTA first, WTP first), high items were odd and low items even; for the other half the reverse was true. Forty three paid subjects were tested.

The WTP instructions read: ``Imagine that you are an average American, paying $10,000 per year in taxes. How much would you be willing to pay in extra taxes to prevent each of the following events? That is, what is the largest tax increase that would be justified by prevention of each event? Please take every question seriously even if you find it ridiculous. Most people find these questions very difficult. Please take the time to think about them and give the most accurate and consistent answer you can.'' The introduction to the WTA condition read, ``Now please indicate how much you would accept in reduced taxes to allow each of these events to happen. That is, how great would the tax reduction have to be in order for you to feel that each additional event is justified by the saving?''

Results

Sensitivity to quantity was computed for each subject in each condition (WTA and WTP) as the log of the ratio of the geometric mean of the high-quantity items to the geometric mean of the low-quantity items, divided by log(10) (the value that the log ratio would have if subjects were proportionally sensitive). This measure would be zero if the subject were completely in sensitive to quantity and 1 if subjects were proportionally sensitive. (Subjects with more than 2 out of 8 zero responses or ``no amount is enough'' responses in any of the four conditions were omitted from analyses including that condition.) Subjects were insensitive in the WTP condition (mean 0.04, not significantly greater than zero) but were somewhat sensitive in the WTA condition (mean 0.10, t = 2.81, p = .005). Both conditions were undersensitive however, that is, less than 1.0 (p < .0005, t test). Sensitivity in the WTA condition was greater than in the WTP condition (t = 1.77, p = .045, one tailed).

No-amount-enough responses were confined to the WTA condition and did not depend on the size of the good (mean of 0.77 out of 8 possible large-good responses and 0.88 for small goods). Zero responses were more frequent in the WTP condition and did not depend on size (large WTP, 1.14; small WTP, 1.32; large WTA, 0.93, small WTA, 0.79). Unlike the last study, this one showed a large and highly significant (p < .0005, t test) difference between WTA and WTP values: geometric mean of $66 for WTP, $303 for WTA.

The evidence is consistent with the possibility that some of the subjects think in terms of declining marginal utility of goods or budget constraints (increasing marginal disutility) for expenditures. But even the WTA conditions shows very little sensitivity to quantity (.10), and these hypotheses would both predict considerable sensitivity, specifically, more than 1 (assuming some insensitivity in the WTP condition).

Study 7. Matching money to risk vs. risk to money

Another test of the budget constraint hypothesis is to ask subjects to do the task in reverse, indicating the quantity of the good for which they would be willing to pay some given amount of money, rather than the amount they would pay for a given quantity of the good. If insensitivity to quantity is the result of budget constraints (real or perceived) or of declining marginal utility for the good, then subjects should be over sensitive in the new task. For example, they should want more than 10 times as much of the good for $50 than for $5.

Method

Each of 53 student subjects answered four pairs of questions, assuming that they were ``a U.S. citizen living alone, making $40,000 per year.'' The introductions to the respective pairs were:
1. Suppose that you are a typical driver. Statistics about death rates apply to you. Imagine a device that reduces death from a certain kind of auto accident. (Rates of injury are not affected.) Installation of the device is free, but each device requires a yearly maintenance fee for testing and adjustment.
2. Suppose that a flu epidemic is approaching. The flu is mild, but it can be fatal to some people who get it. A vaccine is available that prevents the flu.
3. Suppose there are small amounts of a cancer-causing chemical in your drinking water. The type of cancer has a 50% cure rate. The chemical can be removed by a filter, which must be replaced once a year.
4. Suppose you are in a foreign country and your work requires you to fly from one place to another several times. There are two airlines, A and B. Airline A has a poor safety record. Airline B has a perfect record but costs more. Your employer will pay only the cost of airline A, so you would have to pay the extra cost of B yourself. You must take all your trips on the same airline.

Half of the subjects were asked for their WTP for two levels of risk reduction after the first and third item, half after the second and fourth. For example, ``What is the largest amount that you would pay per year for the device if it prevented a type of accident that caused 1 [or 10] death per year for every 1,000,000 drivers?'' The high and low risks differed by a factor of 10, and the order of the two items in each pair was counterbalanced. For the other two items, subjects got risk questions, e.g., ``How high would the chance of death from this kind of accident have to be in order for you to be willing to pay $500 [$50] per year for the device? Answer in number of deaths per year for every million drivers.'' Again, amounts of money differed by a factor of 10, and order was counterbalanced. Note that, for each subject, each scenario was given only with one type of question, WTP or risk.

Results

To measure insensitivity to quantity, we computed the log ratio of ten times the small-amount answer to the large-amount answer for each item. The mean of these measures would be 0 if the rate of substitution were constant, and they would be 1 if subjects ignored quantity. For the money questions, the means of these measures across subjects were positive and significant (p < .001, t test) for all four items (1.32 overall - based on the means of the two items for each subject). This is just a replication of the quantity effect. For the risk questions, the measures were not significantly different from zero for any items or overall.

If subjects were basing their responses on a consistent marginal rate of substitution between risk and money, then the sum of the measures for the risk items and the money items would be zero. In fact, it was greater than zero across subjects (mean 0.50, t = 3.44, p = .001). Insensitivity was uncorrelated with (geometric mean) WTP (dollars per unit of risk), so this result cannot be explained in terms of any dependence of rate of substitution on WTP. Thus, it appears that the insensitivity to quantity does not result from the slope of the indifference function for money and risk, as implied by the idea of budget constraints or declining marginal utility.

The fact that the log ratios for risk did not differ from zero - indicating a mean ratio of about 10 to 1 for the two items in each pair - was the result of a few subjects who produced large negative ratios - indicating oversensitivity of the sort predicted by budget constraints - and many subjects who produced positive ratios, indicating undersensitivity. For 45% of the subjects, the positive ratios outnumbered the negatives; for 18%, the negatives outnumbered the positives. The difference was significant by a sign test (p = .026, one tailed). Thus, a plurality of subjects is undersensitive for risk, just as almost all subjects (88%) are undersensitive for WTP (none oversensitive). The fact that some subjects are oversensitive for risk suggests that people without special training can understand the concept of marginal rates of substitution, even though most people do not spontaneously apply this concept to judgment tasks.

The results of this study are analogous to those of Delquié (1993). Although he explains his results in terms of compatibility effects, the prominence explanation can also explain them. Such an explanation is consistent with the large number of subjects who simply gave the same response to both members of the pair, ignoring quantity totally (26% for WTP, and 6% for risk responses, did this at least once).

Study 8. Unit-price WTP

Study 8 examined the quantity effect in two different methods of eliciting WTP. In the whole-good method, the usual one, the subject gives a WTP for 1 unit of a good and for 10 units. Sensitivity to quantity is the log ratio of the two prices. In the unit-price method, the subject gives a WTP for 1 unit and then a WTP per unit for 10 units. The subject then calculates WTP for the 10 units by multiplying the unit price by 10. The budget-constraint hypothesis implies that these two methods should give the same rate of substitution: to determine their unit price, subjects would first determine their whole-good price for 10 units and then divide by 10. Subjects with an implicit budget constraint would pay less per unit when the number of units in the purchase was larger.

Method

The items were the WTP versions of those used in Study 7. For each good in each condition, subjects provided their WTP for 1 and 10 units. The unit-price questions (using item 1 as an example) read, ``What is the largest amount that you would pay per year for each death prevented out of 1,000,000 for a device that prevented a type of accident that caused 10 deaths per year for every 1,000,000 drivers? How much is this for 10 deaths prevented out of 1,000,000? (Hint: Multiply your last answer by 10.)'' We used two whole-good conditions. The basic condition was identical to that used in Study 7. The divide-whole-good condition was identical except that, immediately after pricing 10 units subjects were asked (e.g.), ``How much is this for each death prevented out of 1,000,000?'' The three conditions - unit-price, basic whole-good, divide whole-good - were between-subjects. Within each condition, half the subjects were asked about 1 unit, then 10, for each item, and half were asked about 10 units, then 1. Twenty-seven student subjects completed the questionnaire. Subjects were asked to indicate any sources of difficulty in answering the questions.

Results

Sensitivity was defined for each item as log([WTP for 10]/[WTP for 1]/log(10), yielding a value of 0 for no sensitivity and 1 for proportional sensitivity. Sensitivity for each subject was the mean across items. Mean sensitivity was 0.36 for the basic condition, showing the usual insensitivity, 0.54 for the divide-whole-good condition, and 1.04 for the unit-price condition. The unit-price condition differed significantly from the basic condition (p = .001, t test) and from the divide-whole-good condition (p = .018), but the basic condition and divide-whole-good conditions did not differ significantly. (This result was identical to that obtained in a follow-up study in which subjects matched risk to money for half of the items. That study is not reported because it simply replicated the findings of Studies 7 and 8.) Geometric mean WTPs 1 and 10 units, respectively, were $24 and $196 for the unit-price condition, $17 and $50 for the divide-whole-good condition, and $33 and $79 for the basic condition. Order had no significant effect.

In the question about difficult aspects of the questionnaire, two subjects (out of 8 in the unit-price condition) did mention the difficulty of answering the unit-price question. Most complaints, however, were about the lack of any standard for making the judgment.

In sum, the unit-price method induced perfect proportional sensitivity to quantity. The result casts doubt upon any hypothesis in which responses are taken to reflect true economic value, such as the budget-constraint hypothesis. It is consistent with the prominence account, in that responses are dependent on the type of good rather than the amount of it. A similar result was reported by Kemp and Willetts (1995): judgments of the value of government services correlated highly with judgments of value per dollar spent, despite large differences in expenditures.

Test of availability

Study 9: Effect of fuller context

In two unreported studies, we found perfect sensitivity to quantity for medical insurance or risk reduction. Goods were presented in tabular form or on the same page, so that subjects could see all the goods before evaluating any of them. These findings suggested that putting the goods in a larger context can increase sensitivity. This would support the availability explanation of insensitivity. Loomis, Lockwood, & DeLacy, (1993) also found what they considered to be only a small embedding effect in a CV study of forest preservation in Australia. The effect may have been small because the full context - all the possible forests - was presented at the beginning of the questionnaire. Even though subjects did not have to price the full context, its presence might have reduced the prices assigned to parts of the whole, compared to cases (not presented) in which the part was presented on its own, without any mention of the full context. Study 9 examines sensitivity to quantity with and without prior presentation of the full context.

Method

We tested 142 subjects: 51 female, 91 male; 73 students, 69 nonstudents; 45 tested in the laboratory, 97 responding by electronic mail (email) after volunteering in response to postings on various newsgroups and bulletin boards. Laboratory subjects were paid $6/hour. For each email subject, $1.50 was put into a kitty, and one person selected at random received the kitty when it reached $99 (or when the study was over).

The questionnaire asked about expensive cancer treatments and transplants, as in Study 3. Examples were given of transplants as well as of cancer treatments, but no other details were provided except that the standard policy cost $1,000/year. Three orders were used: both cancer transplants; transplants both cancer; cancer transplants both. For half of the subjects who got cancer or transplants first, no other treatments were mentioned on the first page (or screen) of the questionnaire. The other half received a ``preview'' in which both treatments were described before the subjects were asked to evaluate one of them. Of course, the subjects who got both treatments first necessarily got the preview.

Results

Eleven subjects, nine of them nonstudent email subjects, gave zero answers to all cases (or all but one); the student-nonstudent difference, and the email-lab difference, did not quite reach significance by two-tailed Fisher tests. Otherwise students and nonstudent subjects did not differ either in overall WTP or in any effects, and the same was true for email and lab subjects.

Preview (presenting the context of both possible treatments) did increase sensitivity, but only in the within-subject measures. Between-subject comparisons yielded perfectly proportional sensitivity. Table 3 shows the geometric mean WTP responses for the three conditions.

- Insert Table 3. -

For within-subject analysis, the log of Both was less than the log of the Sum of the two separate treatments for the no-preview condition (t = 3.45, p = .001) and for the both-first condition (t = 2.84, p = .007), but not for the preview condition with one item given first. The Single-first preview and no preview conditions differed significantly in the magnitude of this effect (t = 2.48, p = .008, one tailed).

Between-subject analysis showed no difference between both-first and twice the WTP for the first item in any condition or overall, although the both-first WTP was larger than the raw WTP for the first item. In sum, no insensitivity was found between subjects.

The increase of within-subject sensitivity with preview suggests that the embedding effect does not require pricing of the larger good in order for prices assigned to the embedded good to be reduced. Subjects may need only to be reminded of the existence of the larger good as a context, as implied by the availability hypothesis. For practical purposes, however, asking subjects to price the superordinate good (as done by Kemp and Maxwell, 1993) may be more effective than simply reminding subjects of its existence (as done by Loomis et al., 1993).

Tests of prominence

Study 10: Good-money vs. good-good

The prominence explanation suggests that sensitivity can be increased by inducing subjects to attend more to the amount of the good. One way to do this might be to ask for responses in a more compatible medium (Tversky et al., 1988). For example, people might find it easier to trade off the size of a budget cut in government-program A with the size of a cut in program B than to trade off a cut in A with their taxes. Such tradeoffs are often requested in multi-attribute decision analysis, which has been suggested as an alternative to CV (Gregory, Lichtenstein, & Slovic, 1993). Study 10 compared sensitivity in this kind of good-good comparison with the usual good-money comparison.

Method

Twenty-six students completed a questionnaire, which began, ``This questionnaire concerns the value you place on various programs paid for by the U.S. government. Some of the programs must be cut to reduce the budget deficit. But each of them could be saved from cuts if taxes are raised or if other programs are cut. Our concern is how much you are willing to pay to prevent reductions in each of these programs, or how much you are willing to give up of one program in order to save another.''

The WTP condition, which half of the subjects completed first, asked, ``What is the highest tax increase that you would be willing to pay in order to prevent each of the following reductions in government spending? Imagine that everyone in the U.S. was asked this question. If more than half said they were willing to pay enough to prevent the cut, then everyone's taxes would be raised by what was needed and the cut would not be made. Also imagine that each item is the only proposal for tax reductions this year. Feel free to use fractions or decimals in your answer.'' Responses were in mills, as in Study 4. The questionnaire then presented the 20 goods used in Study 4, with 10% reductions and 50% reductions alternating. Then it presented the same goods with 50% and 10% switched.

The good-good condition asked subjects ``to evaluate one program in terms of another, in pairs. In each case, please fill in the blank so that the reduction in one program by the amount we give is just as bad as the reduction in the other program by the amount you write into the blank.

``Notice that, when you write a very small percent, that means you think that the program is very important, because a small cut in it is just as bad as a larger cut in the other program. You may write zero if you think that it is better to cut the other program than to cut the program with the blank by any amount.

``On the other hand, if you write a greater percent in the blank than the one we give you, that means you think that the program is less important. You cannot cut a program by more than 100%. So we will take answers of 100% to mean that you prefer the program with the blank to be cut completely rather than cut the other program by the amount we give.'' Test questions followed, and 12 subjects (not included in the 26) did not answer these questions or answered them incorrectly.

The first two pairs then read:

A. 10% reduction in acquisition of land for national parks
B. % reduction in aid for family planning programs for the poor in the U.S.

A. 50% reduction in aid for international family planning and women's health programs
B. % reduction in aid for international vaccination programs against polio, etc.

Then the same 10 pairs were listed, with 50% and 10% switched. Finally, the 20 pairs were listed again with the first member left blank and the second member marked with a percent (10% on odd items for the first ten pairs, on even for the second ten).

Results

We defined sensitivity to quantity in the WTP condition as

log
WTP₅₀
WTP₁₀
log5

for each pair, where the subscript indicates the size of the cut. Then we found the mean of these values for each subject. This measure is 0 when the subject is completely insensitive and 1 when the subject is proportionally sensitive. We defined sensitivity in the good-good condition analogously, using the equivalent percent reduction instead of WTP, excluding items with answers of 100%. (Subjects might want to indicate a higher ratio, but cuts of greater than 100% are impossible.)

The mean sensitivity was 0.266 in the WTP condition and 0.773 in the good-good condition. The difference was significant (t = 8.12, p < .001). In both conditions, sensitivity was less than 1 and greater than 0 (p < .001). In sum, sensitivity to quantity was greater in good-good comparisons than in WTP judgments. In principle, this effect could be explained by budget constraints, but Study 4 ruled out this account for the same stimuli. More likely, subjects find good-money tradeoffs difficult, so, as a result, they pay less attention to the quantity of the good, using global importance as a cue for WTP.

Study 11: Producing quantities and prices

A second way to force attention to quantity of the good is to ask subjects to produce the quantity information about the good as well as about the money. Study11 asked subjects to provide an amount for the good, their WTP for that amount, a second amount (larger or smaller than the first), and their WTP for the second amount. The study thus compares a one-blank condition, in which the subject writes WTP in one blank, to a two-blank condition, in which the subject writes the size of the cut in one blank and WTP in the other.

Method

Thirty subjects did a questionnaire introduced as in Study 10. Instructions for the one-blank condition were also essentially identical. Instructions for the two-blank condition read, ``In this part, we give you each program twice, with two blanks for you to fill in each time. One blank is the size of the cut, in percent. The other blank is the the highest tax increase that you would be willing to pay in order to prevent it. You are to fill in both blanks. The first case in each pair should have a lower percent or a lower number of mills than the second. Try to imagine very different numbers in the two cases. If this is not clear, please ask.'' The rest of the instructions were identical to the one-blank condition.

Half of the subjects did the one-blank condition first, and half did the two-blank condition first. For the former half, the small cut was on odd-numbered items and the large cut on even; for the latter half the reverse was true.

Results

We defined sensitivity to quantity for each pair of items as

log WTP_large
WTP_small
log size_large
size_small
,

where size refers to the size of the cut and large and small refer to the larger and smaller cuts in each pair, respectively. For the one-blank condition, the size ratio was always 50/10, of course. Sensitivity was 0 when responses were the same for both size cuts. Dividing by the log of the size ratio meant that sensitivity was 1 when WTP was proportional to quantity. For each subject, we computed the geometric mean sensitivity in each condition. (Pairs on which one or both responses were zero were, of course, omitted from this mean. Most commonly, both responses in a pair were zero rather than just one.)

Mean sensitivity was 0.570 for the one-blank condition and 0.885 for the two-blank condition. These two means differed from each other (t = 3.10, p = .004, two tailed). In addition, the one-blank mean was significantly less than 1 (t = 5.01, p < .0005), but the two-blank mean was not significantly less than 1. The fact that mean sensitivity in the one-blank condition was higher than in Study 10 may be partly ascribed to a (nonsignificant) order effect: sensitivity was higher when the two-blank condition was first (0.68 vs. 0.45). In the two-blank condition, subjects generally used ratios of cut sizes less than 5 to 1 (geometric mean, 3.34, significantly less than 5); although the ratio was negatively correlated (r = -0.29) with two-blank sensitivity, the correlation was not significant.

In sum, the two-blank condition made WTP almost proportionally sensitive to quantity. Presumably, asking subjects to fill in the size of the cut made them pay more attention to it.

Discussion

- Insert Table 4. -

Table 4 summarizes the results and their implications. All the results but one are consistent with the prominence hypothesis, and some are explained uniquely by this hypothesis. The one remaining result, the effect of context indicates that availability is operating as well. Although Study 9 found increased sensitivity when the whole context was available, supporting the availability hypothesis, other studies (Study 4 for adding up, and Studies 4, 7, 8, 10, and 11 for quantity effects) found considerable insensitivity even with full context (or with all questions in view, for the quantity effect), suggesting that availability is not the only explanation. However, the fact that this hypothesis has some support suggests that reminding subjects of the broader context when eliciting values is a good idea (as is recommended by most authors now).

Three studies found that the use of a referendum format does not increase sensitivity, contradicting the contribution version of the moral-satisfaction hypothesis. Moreover, both versions of this hypothesis have difficulty with some other results. As noted earlier, insensitivity has been found for ratings as well as for pricing, and for private goods as well as public goods. The present findings of insensitivity for WTA as well as WTP are also difficult to explain. We would have to postulate a cold feeling of guilt that limits willingness to accept payment to allow harm, the opposite of a warm glow. And we would have to postulate a belief that the amount of harm is in proportion to the amount one accepts in order to maintain the contribution hypothesis. Note that Study 4 used a referendum format in both WTP and WTA conditions and found little sensitivity in both conditions.

Both forms of the moral-satisfaction hypothesis hold that the overall size of the contribution is crucial, so neither form is consistent with the increase in sensitivity from unit pricing (Study 7). Likewise, both hypotheses have difficulty explaining the fact that many subjects were insensitive to the amount of money when they matched goods to money in Study 8.

If moral satisfaction is a problem, then efforts should be made to make the judgment independent of the subject's action of contributing or participating. This seems to be unnecessary. However, greater effects of participation may occur with different goods than those used here. Baron and Spranca (1995) have found that certain goods - such as prevention of the destruction of nature - seem to evoke ``protected values'' that relate specifically to personal participation.

The budget-constraint hypothesis and related hypotheses have been put forward to argue that we need not worry about quantity insensitivity when it occurs in CV. The present results give almost no support to this hypothesis. Insensitivity is found for WTA, and it disappears when subjects are asked for unit prices, which suggests that the whole phenomenon depends on how the questions are asked, not on the true value of the goods in question. Subjects are not committed to the tradeoffs they express through standard WTP questions. This conclusion, however, need not generalize to other methods of value elicitation, such as those used in decision analysis.

The prominence hypothesis is supported by the effects of manipulations designed to influence prominence: good-good matching and the two-blank condition, both of these drawn from standard practice in decision analysis. This hypothesis can explain all other results except the effect of providing full context. In all other studies, answers are governed largely by perceived importance of issues, which is independent of quantity. In the adding-up effect, some averaging of importance may occur. Further research should address the source of importance judgments. In the case of public goods, subjects may answer according to whether they think more or less of the good should be provided than is currently being provided, taking into account the cost of provision as they perceive it. Of course, this is not the judgment that is needed for policy decisions, since people are supposed to judge the benefits of proposals alone, not their benefit/cost ratio.

The prominence hypothesis has two practical implications. First, subjects must be induced to attend to quantity of the good. Perhaps the simplest way to do this is to extend the method of Study 11 by asking subjects for an entire function relating WTP (or some other attribute) to the quantity of the good. This is standard practice in decision analysis. Second, it is necessary to check to see that subjects have taken quantity into account adequately. One way to do this is to reverse the judgment task, as done in Study 7. Another is to ask for tradeoff judgments between two goods and between each good and money, then checking to see that the three functions are consistent. If the checks fail, then the subjects must be asked to repeat the judgments. These methods, too, are commonly used in decision analysis.

It may be argued that these methods will increase the time required for CV analysis per subject. But current CV practice involves spending substantial amounts of time with each subject anyway. Moreover, the errors that result from insensitivity to quantity are large, sometimes several orders of magnitude. When such large errors can result from the procedure itself, the advantage of large samples of subjects is rather small. Much greater accuracy may be achievable by use of more extensive checking of the judgments of fewer subjects.

References

Baron, J. (1995). Rationality and invariance: Response to Schuman. In D. J. Bjornstad & J. Kahn (Eds.) The contingent valuation of environmental resources: methodological issues and research needs, pp. 145-163. London: Edward Elgar.

Baron, J., Chen, L., & Greene, J. (1993). Determinants of insensitivity to quantity in valuation of public goods. Poster presented to JDM Society, November, Washington, DC.

Baron, J. & Spranca, M. (1995). Protected values. Manuscript, University of Pennsylvania.

Boyle, K. J., Desvousges, W. H., Johnson, F. R., Dunford, R.\ W., & Hudson, S. P. (1994). An investigation of part-whole biases in contingent valuation studies. Journal of Environmental Economics and Management, 27, 64-83.

Carson, R. T., & Mitchell, R. C. (1993). The issue of scope in contingent valuation. American Journal of Agricultural Economics, 75, 1263-1267.

Carson, R. T., & Mitchell, R. C. (1995). Sequencing and nesting in contingent valuation surveys. Journal of Environmental Economics and Management, 28, 155-173.

Diamond, P. A., Hausman, J. A., Leonard, G. K., & Denning, M. A. (1993). Does contingent valuation measure preferences? Some experimental evidence. In J. A. Hausman (Ed.), Contingent valuation: A critical assessment. Amsterdam: North Holland Press.

Dubourg, W. R., Jones-Lee, M. W., & Loomes, G. (1994). Imprecise preferences and the WTP-WTA disparity. Journal of Risk and Uncertainty, 9, 115-133.

Fischhoff, B., Slovic, P., & Lichtenstein, S. (1978). Fault trees: Sensitivity of estimated failure probabilities to problem representation. Journal of Experimental Psychology: Human Perception and Performance, 4, 330-334.

Fischer, G. W. (1995). Range sensitivity of attribute weights in multiattribute value models. Organizational Behavior and Human Decision Processes, 62, 252-266.

Gregory, R., Lichtenstein, S., & Slovic, P. (1993). Valuing environmental resources: A constructive approach. Journal of Risk and Uncertainty, 7, 177-197.

Guagnano, G. A., Dietz, T., & Stern, P. C. (1994). Willingness to pay for public goods: a test of the contribution model. Psychological Science, 5, 411-415.

Jones-Lee, M. W., Loomes, G., & Philips, P. R. (1993). Valuing the prevention of non-fatal road injuries: Contingent valuation vs. standard gambles. Manuscript, Department of Economics, University of Newcastle upon Tyne, England.

Kahneman, D. & Knetsch, J. L. (1992a). Valuing public goods: The purchase of moral satisfaction. Journal of Environmental Economics and Management, 22, 57-70.

Kahneman, D., & Ritov, I. (1994). Determinants of stated willingness to pay for public goods: A study of the headline method. Journal of Risk and Uncertainty, 9, 5-38.

Kahneman, D., Ritov, I., Jacowitz, K. E., & Grant, P. (1993). Stated willingness to pay for public goods: A psychological perspective. Psychological Science, 4, 310-315.

Kemp, M. A., & Maxwell, C. (1993). Exploring a budget context for contingent valuation estimates. In J. A. Hausman (Ed.), Contingent valuation: A critical assessment. Amsterdam: North Holland Press.

Kemp, S., & Willetts, K. (1995). Rating the value of government-funded services: comparison of methods. Journal of Economic Psychology, 16, 1-21.

Loomis, J., Lockwood, M., & DeLacy, T. (1993). Some empirical evidence on embedding effects in contingent valuation of forest protection. Journal of Environmental Economics and Management, 24, 45-55.

Margolis, H. (1982). Selfishness, altruism, and rationality: A theory of social choice. New York: Cambridge University Press.

McFadden, D. L., & Leonard, G. K. (1993). Issues in the contingent valuation of environmental goods: Methodologies for data collection and analysis. In J. A. Hausman (Ed.), Contingent valuation: A critical assessment. Amsterdam: North Holland Press.

Mitchell, R. C., & Carson, R. T. (1989). Using surveys to value public goods: The contingent valuation method. Washington: Resources for the Future.

Portney, P. R. (1994). The contingent valuation debate: Why economists should care. Journal of Economic Perspectives, 8, 3-17.

Schulze, W., McClelland, G., & Lazo, J. (1994). Methodological issues in using contingent valuation to measure nonuse values. Paper presented at DOE/EPA Workshop on Using Contingent Valuation to Measure Non-market Values, May 19-20, 1994, Herndon, VA.

Schuman, H. (1995). The sensitivity of CV outcomes to CV survey methods. In D. J. Bjornstad & J. Kahn (Eds.) The contingent valuation of environmental resources: methodological issues and research needs, pp. 75-96.. London: Edward Elgar.

Tversky, A., Sattath, S., & Slovic, P. (1988). Contingent weighting in judgment and choice. Psychological Review, 95, 371-384.

Zakay, D. (1990). The role of personal tendencies in the selection of decision-making strategies. Psychological Record, 40, 207-213.

Table 1: Geometric mean responses and percent zeros by condition.

Geometric mean Zeros

Condition Order Large good Small good Large good Small good

referendum small only $87.01 21.6

large-small $84.10 $64.46 15.7 25.5

no-referendum small only $130.71 16.0

large-small $96.35 $50.65 42.3 38.5

Table 2: Geometric mean responses (in mills) by condition.

Condition Small goods Large good Both small

WTP 6.46 9.15 10.39

WTA 8.66 12.27 12.99

Table 3: Geometric mean responses by condition.

Condition Both treatments Sum of separate treatments

Single first, no preview $155 $180

Both first $182 $198

Single first, preview $163 $165

Table 4: Results (with relevant studies in parentheses) and hypotheses that fail to explain the result (X).

Result Contribution Warm glow Budget Availability Prominence

Sensitivity not increased by referendum or trigger price (1-3) X

Insensitivity in WTA as well as WTP (4-6) X

No hyper-sensitivity with reverse match (7) X X X X

Good sensitivity with unit price (8) X X X X

Increased sensitivity with full context (9) X X X X

Increased sensitivity with good-good match (10) X X X X

Increased sensitivity with production of good quantity (11) X X X X

Footnotes:

¹This research was supported by N.S.F. grant SBR92-23015. We thank Jane Beattie and Nicholas Maxwell for comments on an earlier draft and Lisa Chen for assistance in the first study. Send correspondence to Jonathan Baron, Department of Psychology, University of Pennsylvania, 3815 Walnut St., Philadelphia, PA 19104-6196, or (e-mail) baron@cattell.psych.upenn.edu.

²Now at Harvard College.

³An earlier report of this study (Baron et al., 1993) found a significant interaction between referendum and embedding, using a power transform instead of a log transform, thus keeping the zero responses. This transform is inappropriate when so many responses are zero: the assumption of normally-distributed error is seriously violated.

File translated from T_EX by T_TH, version 2.73.
On 18 Jun 2001, 19:22.

		Geometric mean		Zeros
Condition	Order	Large good	Small good	Large good	Small good
referendum	small only		$87.01		21.6
	large-small	$84.10	$64.46	15.7	25.5
no-referendum	small only		$130.71		16.0
	large-small	$96.35	$50.65	42.3	38.5

Condition	Small goods	Large good	Both small
WTP	6.46	9.15	10.39
WTA	8.66	12.27	12.99

Condition	Both treatments	Sum of separate treatments
Single first, no preview	$155	$180
Both first	$182	$198
Single first, preview	$163	$165

Result	Contribution	Warm glow	Budget	Availability	Prominence
Sensitivity not increased by referendum or trigger price (1-3)	X
Insensitivity in WTA as well as WTP (4-6)			X
No hyper-sensitivity with reverse match (7)	X	X	X	X
Good sensitivity with unit price (8)	X	X	X	X
Increased sensitivity with full context (9)	X	X	X		X
Increased sensitivity with good-good match (10)	X	X	X	X
Increased sensitivity with production of good quantity (11)	X	X	X	X

Determinants of insensitivity to quantity in valuation of public goods: Moral satisfaction, budget constraints, availability, and prominence

Jonathan Baron1 and Joshua Greene2 University of Pennsylvania

Abstract

Introduction

Types of quantity insensitivity

Explanations of quantity insensitivity

Tests of moral satisfaction (contribution)

Study 1: Referendum and public safety

Method

Results

Study 2: Referendum and medical insurance for mental health

Method

Results

Study 3: Referendum and medical insurance for cancer and transplants

Method

Results

Tests of budget constraints (and moral satisfaction)

Study 4: Accepting tax cuts vs. paying tax increases

Method

Results

Study 5. Quantity sensitivity in WTP vs. WTA

Method

Results

Study 6. WTP vs. WTA, between subjects

Method

Results

Study 7. Matching money to risk vs. risk to money

Method

Results

Study 8. Unit-price WTP

Method

Results

Test of availability

Study 9: Effect of fuller context

Method

Results

Tests of prominence

Study 10: Good-money vs. good-good

Method

Results

Study 11: Producing quantities and prices

Method

Results

Discussion

References

Footnotes:

Jonathan Baron¹ and Joshua Greene²
University of Pennsylvania