Applied Demography In Action: A Case Study of “Population Identification.” *

133-158


Introduction
The research reported here was used to support the Biosphere component of the Total System Performance Assessment/Viability Assessment (TSPA/VA) for the high level nuclear waste repository proposed for Yucca Mountain, Nevada, which is located approximately 100 miles north of Las Vegas (U.S. DOE 1998).The research was used to determine if Yucca Mountain, Nevada was a suitable site for a spent nuclear fuel and high-level radioactive waste repository.This determination was positive: the Secretary of Energy recommended Yucca Mountain to the President as the repository site for highly radioactive materials and the President recommended the site to Congress.Exhibit 1 shows the general area around the site.
A key issue in determining if the Yucca Mountain site was suitable for the high level nuclear waste repository, was the identification of the "critical group," an empirically-based population deemed to be at highest risk to the repository, with risk being related to exposure to the ingestion of radionuclides at levels dangerous to humans.The critical group was a crucial element in two areas: (1) deciding if the repository should go forward and: the design of man-made barriers for the repository.In identifying the critical group, two sets of "risk"

Exhibit 1 The Yucca Mountain Study Area
David A Swanson CSP 2008, 35.1: 133-158 136 parameters were generated: (1) a reasonable, conservative set; and (2) a high bounding set.These provided a set of parameters that are in the case of the first set, consistent with requirements for the critical group promulgated by the National Academy of Sciences as implemented in 10 CFR 63 (64 FR 8640) proposed by the Nuclear Regulatory Commission, and in the case of the second set, with an extremely conservative approach.The critical group and its risk parameters represent a conceptual model that was referenced as inputs to the process of generating "Biosphere Dose Conversion Factors" (BDCFs), which used the GENII-S computer code (SNL 1993).

Data Parameters
The identification of a critical group and its characteristics relied on a 1997 Food Consumption Survey of the communities within the 50 mile centered on Yucca Mountain, Nevada (U.S. DOE 1997).The survey data were used primarily to determine the consumption levels for locally-produced food and tap water needed for ingestion exposure pathways.They also were used to develop a profile of the average member of the critical group for use in assessing exposure pathways other than food and water consumption.
In the survey, dietary and lifestyle data were collected on adults residing within the 50-mile grid centered on Yucca Mountain (U.S. DOE 199).Included within this grid are the communities of Amargosa Valley, Beatty, Indian Springs, and Pahrump (U.S. DOE 1997).The survey was a stratified random sample consisting of 1,079 respondents, of which 195 were in the Amargosa Valley.

Criteria
In February 1999, the U.S. Nuclear Regulatory Commission (NRC) issued proposed "10 CFR 63," which implemented the definition of a critical group and a reference biosphere in part 115 (64 FR 8640).Guidance issued by the Department of Energy (DOE) on the use of proposed 10 CFR 63 stated that individuals reasonably expected to receive the highest exposure under reasonable assumptions were to be used as the critical group (Dyer 1999).The NRC provides the following definition of the reference biosphere and the critical group in part 115 of proposed 10 CFR 63 (64 FRC 8640):

Applied Demography in Action:
A Case Study of "Population Identification" a. Reference biosphere.
(1) Features, events, and processes that describe the reference biosphere shall be consistent with present knowledge of the conditions in the region surrounding the Yucca Mountain site.
(2) Biosphere pathways shall be consistent with arid or semi-arid conditions.
(3) Climate evolution shall be consistent with the geologic record of natural climate change in the region surrounding the Yucca Mountain site.
(4) Evolution of the geologic setting shall be consistent with present knowledge of natural processes.
b. Critical group.
(1) The critical group shall reside within a farming community located approximately 20 km south from the underground facility (in the general location of U.S. Route 95 and Nevada Route 373, near Lathrop Wells, Nevada).
(2) The behaviors and characteristics of the farming community shall be consistent with current conditions of the region surrounding the Yucca Mountain site.Changes over time in the behaviors and characteristics of the critical group including, but not necessarily limited to, land use, lifestyle, diet, human physiology, or metabolics; shall not be considered.
(3) The critical group resides within a farming community consisting of approximately 100 individuals, and exhibits behaviors or characteristics that will result in the highest expected annual doses.
(4) The behaviors and characteristics of the average member of the critical group shall be based on the mean value of the critical group's variability range.The mean value shall not be unduly biased based on the extreme habits of a few individuals.
(5) The average member of the critical group shall be an adult.Metabolic and physiological considerations shall be consistent with present knowledge of adults.

Analysis
Using survey data from the food consumption survey as the source of input and having defined the critical group, summary descriptive statistics were then derived on the consumption of locally produced food and tap water.
Among the communities in the vicinity of Yucca Mountain, Amargosa Valley is physically closest to the area selected by the NRC for the location of the critical group that could have been classified as a farming community based on production information (TRW 1998).The only source of food consumption data specific to the Amargosa Valley was the 1997 survey.Table 1 shows the percent of respondents consuming tap water and locally produced food, by type, for the Amargosa Valley and in the remainder of the study area.Specifically, Table 1 shows: (1) 79 of every 100 adults in the Amargosa Valley ate some type of locally produced food year prior to the survey compared to 57 out of every 100 in the remainder of the study area; (2) 88 out of every 100 adults in the Amargosa Valley reported consuming tap water compared to 79 out of 100 in the remainder of the study area; and (3) with the exception of grains, a higher percent of the adults in the Amargosa Valley consume locally produced food across all types than found in the remainder of the study area.For purposes of this study, the operational definition of an adult was that of a person 18 years of age and over.
Table 2 shows that the average total consumption of locally produced food was higher in the Amargosa Valley (28.37 kg annually per adult) than in the remainder of the study area (12.20 kg annually per adult).The consumption of tap water also was higher in the Amargosa Valley (684 liters annually per adult) than in the remainder of the study area (646 liters annually per adult).With the exception of grains and milk, adults in the Amargosa Valley, on average, consumed more locally produced food across all food types than found in the remainder of the study area for the 1997 survey.
Of the 195 cases representing Amargosa Valley respondents, one had so many missing values that it was deemed unsuitable for analysis.Of the 194 usable cases, 77 reported that they both consumed locally produced food during the year prior to the survey and had a food garden.These 77 cases were found to exhibit homogeneous behaviors and characteristics and, as such, were used to define a critical group consistent with Proposed 10 CFR 63. shown are the same as those used in the biosphere analysis completed for TSPA/VA.Although the total sample was 195 in the Amargosa Valley and 884 in the remainder of the study area, some respondents either could not or would not provide specific information (i.e., they responded "don't know" or otherwise declined).The percentages shown do not reflect weighting.

David A Swanson
The set of 77 cases was found by using the "filter" procedure in NCSS (Hintze 1995) to select from the survey respondent file those respondents who met the following three conditions: (1) located in the Amargosa Valley; (2) had a food garden last year; and (3) consumed locally produced food.Upon activating this filter, the NCSS procedure "Descriptive Tables" (Hintze 1995,) was used to collect summary statistics, including a count of the number of respondents satisfying the three conditions set in the filter.The procedure revealed that 77 respondents met the desired criteria.
Once it was known that 77 respondents met the desired criteria, the NCSS "Sort" procedure (Hintze 1995) was used in two steps to assemble the 77 cases representing these respondents at the top of the file.This was done only while the master survey file was active, which means that the sorted cases were not made a permanent feature of the master file.In the first step, the sort feature was set so that only those 194 cases from the Amargosa Valley were found at the top of the file.When this was done, the remaining cases were deleted from the active file and the active file was saved as a new file.In the second step, the sort feature was applied to the active file by sorting on two variables simultaneously so that the 77 cases in question were represented at the top of the file: Presence of a garden; and consumed locally produced food.When this step was accomplished, the topmost 77 cases were kept by deleting the remaining 117 cases.
With the second step accomplished, the 77 cases remaining represented members of the hypothetical farming community located near Lathrop Wells.They formed a group that exhibited behaviors and habits that were expected to result in the highest expected doses.There are 28 male respondents and 49 female respondents in this set.
As is reported in the documentation for the survey, males were underrepresented in both the survey as a whole and each of its constituent communities (U.S. DOE 1997).This would not be important if males and females had the same daily intake of food, but this is not the case.Males consume on average different amounts than females (U.S. DOE 1997).It was known in advance of the survey that this disproportionate representation by gender was likely to occur and weights were developed to compensate for it (U.S. DOE 1997).The proportion of adult females in the Amargosa Valley was estimated to be .49(U.S. DOE 1997) while the proportion of adult females in the sample is .615.That is, 120 of the 195 sample respondents were female while we expected that there should have been only 96 females, based on the proportion that represent of the adult population.Weighting is required so that the input parameters such as the mean reflect the proportion of females in the Amargosa Valley adult population, not the proportion in the sample.For the Amargosa Valley, the gender weights were already determined (U.S. DOE 1997): for a female it was 0.80; and for a male it was 1.32.That is, every 100 females comprise 80 females in the context of the weighted results for the Amargosa Valley while every 100 males comprise 132.
Males also were under-represented in the set of 77 respondents selected to live in the Lathrop Wells farming community.There were 28 males and 49 females in this set.Again we would expect females to represent about half of the Lathrop Wells farming community, but they represented about 64 percent.This suggested that the parameters for the critical group should be assembled from data weighted by gender (post-stratification).To achieve this end, the weights developed for the Amargosa Valley as a whole were applied to the set of 77 respondents selected to live in the Lathrop Wells farming community.That is, each male was weighted by a factor of 1.32 and each female by a factor of .80.This gender weighting scheme was deemed to be appropriate because the 77 respondents making up the hypothetical Lathrop Wells farming community were taken from a random sample of the Amargosa Valley population.This population is, recall, the one deemed to be at highest risk to exposure.Because there was neither a "random sample" nor a "population" associated with the hypothetical Lathrop Wells farming community, there is no other empirical basis on which a set of alternative gender-based weights could have been developed.
While the weighting scheme selected for the Lathrop Wells had the advantage of being based on the gender distribution of the population of the Amargosa Valley, it also had a slight drawback: when the results are weighted, algebraically, there are 37 males and 39 females.That is, the weighted results sum to 76 rather than 77 respondents.This drawback was deemed acceptable in order to have weights based on a random sample of the "real" population deemed to be at highest risk, Amargosa Valley.The parameters for the consumption of locally produced food and tap water for this weighted set are shown in Table 3.
The survey data underlying the data presented in Table 3 were subject to error from a number of sources.However, tests done in regard to non-response bias as well as validity and reliability tests suggested that the survey data are valid and reliable and generally adequate for biosphere modeling purposes.Thus, the data in Table 3 as well as other data from the survey were found adequate for the task of developing both sets of parameters: (1) the reasonable, conservative estimates and their statistical distributions, which are in accordance with proposed 10 CFR 63; and (2) the high bounding values, which were, recall, designed to provide an extremely conservative set of parameters.4. The only known source of "locally produced" fish in the entire study area is a catfish "farm" in the Amargosa Valley.Thus, the values provided are specific to the consumption of fish from this location, but under the assumption that it is now located at the Lathrop Wells farming community.
area in question (Amargosa Valley, Remainder of Study Area, Total Study Area), not just those who reported consuming locally produced food (or tap water) of the type in question.
The values shown reflect weighting by gender.
mean is calculated by summing the annual consumption amount of locally produced food reported by those who responded and dividing this sum by the number responding.Keep in mind that many of the respondents reported that they consumed no locally produced food of the type in question.The conceptual denominator of this mean is the total resident adult population of the value for the given food type in the total study area; and "X" is the (unknown) mean consumption value for the given food type in the remainder of the study area.the data for the remainder of the study are were found by algebraically solving for X in the following formula for the weighted average.3. The only known source of "locally produced" fish in the entire study area is a catfish "farm" in the Amargosa Valley.Thus, the values provided are specific to the consumption of fish from this location, but under the assumption that it is now located at the Lathrop Wells farming community.1.The values shown for food are in kilograms; for milk and tap water they are in liters.The arithmetic mean is calculated by summing the annual consumption amount of locally produced food reported by those who responded and dividing this sum by the number responding.Keep in mind that many of the respondents reported that they consumed no locally produced food of the type in question.The conceptual denominator of this mean is the total resident adult population of the hypothetical farming community located near Lathrop Wells, not just those who reported consuming locally produced food (or tap water) of the type in question.

David A. Swanson
The values shown reflect weighting by gender for the Amargosa Valley stratum of the survey (male weight =1.32; female weight = 0.80).4. "All food Types" is measured in kilograms consumed annually and includes: Leafy Vegetables; Root Vegetables; Grains; Fruit; Poultry; Meat; Fish; and Eggs. 5.This refers to water from a local ground source.It excludes any bottled water purchased from a commercial vendor.

Applied Demography in Action: A Case Study of "Population Identification"
Considering the reasonable, conservative estimates, the approximate statistical precision for them is + -5 percent at a 99 percent level of confidence for the sample as whole.For subsets (e.g., the Amargosa Valley), the precision is less.As an illustrative example, the mean level of annual consumption for locally produced leafy vegetables for all adults in the Amargosa Valley was estimated by the survey to be 8.01 kg/yr.Using the normal approximation, the 95 percent confidence interval around this estimate is 6.20 to 9.82kg/yr.That is, one is 95 percent certain that the true mean level of annual consumption of locally produced leafy vegetables by adults in the Amargosa Valley was between 6.20 and 9.82 kg./yr.In using the normal approximation, the lower and upper limit of a 95 percent confidence interval can be calculated by multiplying 1.96 by the standard error and subtracting and adding this product to the mean, respectively.The standard error is calculated by dividing the standard deviation by the square root of the number responding.
As an illustration of statistical precision for the set of 77 respondents, consider the consumption of leafy vegetables among the 77 adults assigned to the Lathrop Wells community.Given that the "weighted number is 76, the mean and standard deviation for the consumption of locally produced leafy vegetables are 15.47 and 15.31, respectively, and the estimated standard error is 1.76= ((15.31/(76) .5 ).Thus, a 95 percent confidence interval using the normal approximation is from 12.02 kg/yr.(15.47 -1.96*1.76) to 18.92 kg./yr.(15.47 + 1.96*1.76).Similar confidence intervals can be constructed for the other food types as well as milk and tap water consumption.Because there was no modeling in this analysis, a sensitivity analysis was not required in regard to the effect of sampling variation.
Part 115 in Proposed 10 CFR 63 specifies that the mean value shall not be unduly biased based on the extreme habits of a few individuals (64 FR 8640).That is, there should not be extreme outliers on the high end.Boxplots were constructed and examined for any extreme outliers.However, this analysis was not done for grains, poultry, meat, fish, and milk because the median consumption level for these food types was zero, which tends to make any consumption level appear as an outlier.The data for these food types were left "as is."For the remaining food types (leafy vegetables, root vegetables, fruit, and eggs) as well as tap water, the analysis was done.
A boxplot is a device that helps identify several distributional characteristicslocation, spread, skewness, tail length, and outliers.The main component of a boxplot is a box whose endpoints represent the middle half of the distribution.This is known as the InterQuartile Range (IQR).A crossbar in the interior of the box denotes the median and the tails are represented by a line drawn from each end of the box to the most remote point that is not an outlier.These points are known as upper and lower adjacent values, respectively.The upper adjacent value is the largest observation less than or equal to the 75 th percentile plus 1.5 times IQR; the lower adjacent value is the smallest observation greater than or equal to the th percentile plus times IQR (Hintze 1995).
The length of the box displays variability in the data.The relative position of the median in the box and the length and direction of the tails depict the distributional shape of the observations.A median closer to the lower end of the box with a long upper tail indicates a right-skewed distribution.Conversely, a median closer to the upper end of the box with a long lower tail suggests a leftskewed distribution.A median in the middle of the box with lower and upper tails of equal length is characteristic of a symmetrical distribution.
Keep in mind that the width of a boxplot has no substantive meaning.A given width is simply designed to provide a balance that is pleasing to the eye.This means that the tick marks on the horizontal axis have no substantive meaning and are simply an artifact of the NCSS boxplot procedure.
Values outside the upper and lower adjacent values are identified as outliers.There are two types of outliers, mild and severe (Hintze 1995).A mild outlier is one that is less than 3 IQRs from the nearest adjacent value; a severe outlier is 3 or more IQRs from the nearest adjacent value (Hintze 1995).The statistical package used to construct the boxplots (NCSS 6.0) has the capability to identify severe and mild outliers directly from a boxplot.That is, the package will perform all the calculations and the user need only specify that severe and mild outliers are represented by different symbols (Hintze 1995).For purposes of this analysis, a circle was selected to represent mild outliers and a square was selected to represent severe outliers.
The boxplots for each of the variables of interest are shown below as figures 1a through 1e.In each part of the figure, the number shown on the vertical axis indicates average consumption per year.For food, this is given in kilograms, while for milk and tap water, it is given in liters.
The boxplots show that the food consumption is right-skewed and truncated on the left at zero (nobody consumes a negative amount of locally produced food or tap water).This is supported by the finding that for each of the nine food types, the median is less than the mean, as shown in Table 3.In regard to the consumption of tap water, the distribution is not right-skewed.
David A Swanson CSP 2008, 35.1: 133-158 146 David A Swanson CSP 2008, 35.1: 133-158 148 For two of the five variables in which the median is not zero, root vegetables (Figure 1b) and tap water (Figure 1e) there are no outliers identified.Thus, there are no extreme values.For the two of the remaining three, leafy vegetables (Figure 1a) and eggs (Figure 1d), each outlier is displayed as a circle, which means that they are not severe.However, for the third, fruit consumption (Figure 1c), there is a single outlier in the shape of a square, which means it is severe.This outlier is the maximum value for fruit consumption, 97.69, as shown in Table 3.
Table 4 shows the parameters for the critical group located in the hypothetical framing community near Lathrop Wells, as found using the "outlier" analysis.With the exception of fruit consumption, the parameters shown are the same as those in Table 1.For fruit consumption, omitting the extreme value of 97.69 resulted in a change in the mean consumption level, from 15.05 to 14.17, in the maximum value, which fell to 53.01, and in the standard deviation, which decreased from 18.10 to 15.46.The maximum values shown in Table 4 were used for the second set of parameters, the bounding values.This set of maximum values was useful for this purpose because they were consistent with the reasonable, conservative parameters in that they provide bounding limits to the reasonable, conservative consumption levels.
Histograms showing the distribution of consumption for each food type as well as milk and tap water are provided as Figures 2 through 7. The numbers on the vertical axis of each histogram show the number of respondents, while the numbers on the horizontal axis of each histogram show the level of consumption.
The graphs and data suggest that the consumption of locally produced food of all types was likely to follow a negative exponential distribution, while tap water was likely to follow a uniform distribution, although there are other distributions that could provide an adequate fit as well.It was known the software to be used to develop ingestion exposure estimates, "GENII-S," accommodated a uniform distribution, but not a negative exponential distribution (SNL 1993).Of the distributions found in GENII-S, the log uniform appeared to be the most suitable substitute for the negative exponential.As a consequence, the log uniform distribution was recommended for use with all food types in terms of the reasonable, conservative set of estimated parameters.
Two parameters are required for the log uniform distribution: the minimum and the maximum (SNL 1993).However, the minimum value in the empirical data from which the log uniform distribution is generated cannot be zero (SNL 1993).Thus, the actual minimum of zero must be replaced.In order to avoid

Count
Percent Consuming Median 3. The only known source of "locally produced" fish in the entire study area is a catfish "farm" in the Amargosa Valley.Thus, the values provided are specific to the consumption of fish from this location, assuming that it is now located at the Lathrop Wells farming community.

Maximum
Standard Deviation Minimum 2. "Meat" is comprised of beef and pork.

Mean
1.The values shown for food are in kilograms; for milk and tap water they are in liters.The arithmetic mean is calculated by summing the annual consumption amount of locally produced food reported by those who responded and dividing this sum by the number responding.Keep in mind that many of the respondents reported that they consumed no locally produced food of the type in question.The conceptual denominator of this mean is the total resident adult population of the hypothetical farming community located near Lathrop Wells, not just those who reported consuming locally produced food (or tap water) of the type in question.The values shown reflect weighting by gender for the Amargosa Valley stratum of the survey (male weight =1.32; female weight = 0.80).

Applied Demography in Action: A Case Study of "Population Identification"
unduly biasing the mean by this action, a very small value is required.Given that the means, standard deviations, minima, and maxima are only reported to two decimal places for the empirical data, it was determined that setting zero to a smaller value (e.g., 1.00E-07), would not affect parameters in the empirical data.

Study Recommendations
Both sets of parameters, the reasonable conservative set and the bounding set, are found in Table 5.For the reasonable, conservative set, the parameters are given by the minimum and maximum values for use with a log-uniform distribution, while for the high bounding set, the parameters are given by the maximum values.For the high bounding set, the parameters Aare recommended to be considered as fixed, without a distribution.These parameters were duly supplied to the health physicists responsible for developing ingestion exposure estimates.

Recommendations for Applied Demographers
For applied demographers tasked with developing information, this case study suggests that a wide range of skills is needed in dealing with the identification of a population of interest.This is not a unique finding (Kintner et al., 1995;Murdock and Swanson, 2008;Pol and Thomas, 2001;Smith and McCarty, 1996).In this case, however, the population of interest is, indeed, a "special" one -extremely small in size, but with a huge impact.The identification of this population required not only knowledge of basic demographic methods and data sources, but a reasonable level of knowledge of both survey research methods and inferential statistics.Understanding what data were available from public sources what data needed to be collected also were important components in developing the information needed to complete the task.
The problem reported here is very different than the typical one facing most applied demographers.It asked for the identification of a "population" rather than an estimate (or forecast) of the size and composition of the population in a given geographic area.This can be taken as an example of the new types of challenges facing applied demography in the 21 st century, some of which are listed by Swanson, Smith, and Tayman (2001).Further, as has been demonstrated by Smith and McCarty (1996) and Swanson et al. (2007) in regard to estimating the demographic effects of natural disasters, the sequelae of September 11 th , 2001 may foreshadow even more demanding challenges.This case study not only underscores the importance of having team specialists in any major project who have common grounds of understanding, but gives an idea of the extreme data needs likely to be demanded of demographers in the 21 st century.The demographer in this project had to communicate with health physicists, mathematicians, engineers, federal agency representatives, and appointed officials while working under tight time deadlines and the ubiquitous budget constraints.As such, budding applied demographers, especially those nearing completion of their graduate studies, should consider adopting a set of skills beyond traditional demography as opportunities present themselves (Morrison et al. 2000).

Figure*
Figure 1.a Boxplot for Leafy Vegetables*

Table 1 Percent of Resident Adults Consuming Locally Produced Food and Tap Water, by Food Type and Area 1
catfish "farm" in the Amargosa Valley.Thus, the values provided are specific to the consumption of fish from this location, but under the assumption that it is now located at the Lathrop Wells farming community Poultry; Meat; Fish; and Eggs.5.This refers to water from a local ground source.It excludes any bottled water 1.Data are taken fromTable 2.3.1 (U.S. DOE 1997).The specific food types 139 Applied Demography in Action: A Case Study of "Population Identification"

Table 2 Annual Mean Consumption Levels of Locally Produced Food and Tap Water for Resident Adults, by Food Type and Area 1
Where: (195/1079) is the proportion of the achieved sample in the Amargosa Valley; "Avmean" is 1.The data for the Amargosa Valley are taken from Table2.3.5 (U.S. DOE 1997); the data for the total study area are taken from Table 2.3.2 (U.S. DOE 1997);