A New Approach in Measuring Local Migration and Population

This paper presents a new model for measuring local migration and population
and report results of a promising pilot application to Massachusetts. This model operationalizes Ravenstein’s classic “push-pull” paradigm, which posits that local migration is determined by the area’s relative attractiveness or a compound function of distinct factors that push migrants out of the area or pull them in. The attraction factors and changes are measured using varied data sources, including decennial census migration flow data and data on group quarters and school enrollments. This model yields timely population estimates with accuracy superior to the corresponding estimates based on the Census Bureau’s methodology. Such results warrant further applications to test and refine this promising approach.


Introduction
Measuring annual population and population growth following a decennial census is called postcensal population estimation.In efforts to improve measurements of population estimates for states and local areas, a variety of methods have been created, including Component Methods (of I and II) developed by the US Census Bureau (The US Census Bureau, 1947 and1960), Regression Ratio Correlation Methods (Schmitt & Crosetti 1954;Schmitt & Crosetti 1956), Housing Unit Methods (Starsinic & Zitte 1968), Vital Rates/Censal Ratio Methods (Bogue 1950), Composite Methods (Bogue & Duncan 1959), and Survey Methods (Ericksen 1973;Rives 1982).These methods are well documented by a small number of books and also by the Current Population Reports series published by the Census Bureau (Committee on National Statistics 1980;Lee & Goldsmith 1982;Murdock & Ellis 1991;Rives, et al. 1995;The Census Bureau 1996, Smith & Cody 1999;Bryan 2003).Our research is the latest in a series of efforts to produce accurate estimates.
The most widely adopted means in producing estimates has been the component method.This method estimates current population by adding vital data (births and deaths) and migration counts to the base year population.While vital statistics such as these are generally available at various levels, however finding adequate data and appropriate measurements for estimating migration, especially domestic migration, remains a challenge to demographers.
Currently, the US Census Bureau and many other agencies utilize tax return data (IRS) to estimate domestic migration for the population aged 0-64 and Medicare enrollments for the population aged 65 and older.This is the so-called Administrative Records Method (AR).However, as pointed out by Galdi (1978) and Smith (1999), in using this method, geographic issues including incorrect and outdated mailing addresses, inaccurate reflections on boundary changes, and coding changes, frequently complicate the IRS data, especially at the sub-county level.To overcome these issues, the Bureau replaced the AR method with the Housing Unit Method (HU) in estimating sub-county (or city/town) population beginning with the 1996 round of estimates.In doing so, a methodological inconsistency emerged, for the AR method is basically a component method while the HU method is not.
In using the HU method, two types of measurements are required: group quarters data and households data.The former involves the population living in mental hospitals, nursing homes, jails, college dormitories, and military barracks; the latter consists of population in single-family homes.The need to assemble so many different types of measurements from separate agencies poses practical difficulties, and the collected measurements might not meet necessary standards of accuracy and completeness.Moreover, since there is no easy way to gather household information, in practice, the U. S. Census Bureau collects only house building permit data, treating the data as the measurement of annual changes in household population.However, house building permit data are not good enough to measure the complex population changes, which are a joint effect of many variables, including migration, death, and birth.In fact, no component of population growth is well measured using this method.
Because of such shortcomings with the HU method, the U.S. Census Bureau, following the top-bottom procedure, first produces estimates at state and county levels using the AR method, then estimates at the sub-county level using the HU method, and finally adjusts sub-county estimates by county counts.As mentioned, the AR method depends heavily on the IRS data.The biggest issue with IRS is under-coverage in addition to geographic inaccuracy.For instance, in the IRS method, college students who are claimed by their parents as dependents in tax reports, or who fill out IRS forms for the first time are not classified as migrants even if they have changed their addresses to college towns.Immigrants, minorities and individuals of low socioeconomic status are under-covered as many of them do not report their income to the IRS at all.The issue of under-coverage can be a serious source of bias in migration estimation for regions such as Massachusetts, where a large number of students move in from other states and countries yearly.In the last two decades, population estimates constructed by the U. S. Census Bureau displayed a dramatic underestimation for that state.In preparing the 1991-2000 population estimates for Massachusetts, it was essential to explore a new approach in order to reduce underestimation at the state level while measuring population at the other two levels (i.e., county and sub-county) in an efficient and accurate way.

General Framework
This model retains the component method's logic but adds a refinement for estimating migration -domestic migration in particular.The theoretical framework for this approach derives from the classic pull-push paradigm originated by Ravenstein (1889), whereby human migration behavior in an area is governed by local attraction factors.These factors are a compound function of a series of push and pull variables influenced by socioeconomic conditions relative to the outside areas.
Ravenstein's abstract theory is not practically applicable without concrete quantitative measurements of attractions.The focus here is placed on devising such measurements and validating their usefulness.The following formulas describe the process: A t is the variable of attractions, equaling the difference between pull and push factors.In-migrants (I t ) or in-migration rate (i t ) are used to measure pull factors, and out-migrants (O t ) or out-migration rate (o t ) to measure push factors.The difference between in-migration rate i t and out-migration rate o t is the net migration rate m t , measuring the extent of the relative attraction in an area at a given time.A positive m t indicates a positive attraction or the domination of pull over push factors.A negative m t demonstrates a negative attraction or the denomination of push over pull factors.The higher the m t or the net migration rate, the more attractive the region and vice versa.Pull and push factors can be operationalized in various ways.
In the above formula, demographic measurements are employed in operationalization.In other applications, economic indexes and social indicators could be considered possibilities.
Attraction factors change or remain continuous over time depending on distinct socioeconomic determinants in that area.Certain determinants such as geographic location, climate, natural environment, education, and transportation are relatively constant; others, especially economic conditions, typically fluctuate.Therefore, two types of determinants or attraction factors are identified in this model: constant and variable.Formula 1 then evolves into: Here, A t , the attraction variable in an area, consists of A b , the constant or basic attraction, and C t , the variable attraction.A b is determined by a series of elemental or relatively constant conditions in that community, and in this model it will be operationalized as m b , the net migration rate at the base time or the base period (i.e., 1985-90 in this estimation) on assumptions that the constant conditions in that community in the new period would be basically the same as those in the last period and that these conditions would be a reflection of migration flow.C t is determined by the following formula: estimated by numerous timely changes in local socioeconomic environments especially economic conditions.It can be seen that the more the changes are recognized, the more precisely the variable attraction C t is measured.These changes, C 1t through C nt , will be operationalized as a set of net migration change rates between the previous and the current periods.This is discussed in the next subsection.

Framework for the Population Aged 0-64
In estimation, Massachusetts population is divided into two age groups: 64 and younger, and 65 and older.The new method is applied primarily to the age group 64 and younger.The constant attraction A b and the variable attraction C t are estimated separately for each of the three levels: state, county, and subcounty or Minor Civil Divisions (MCD) (i.e., city/town in Massachusetts).
As discussed previously, A b was operationalized as the base time net migration rate m b. and, in the age group 0-64, it is estimated on the basis of data from the long form questionnaires in the 1990 Census, commonly referred to as the "place of residence five years ago" tables and files.This, in the age group 0-64, is estimated on the basis of the data from the long form questionnaires in the 1990 census, commonly referred to as the "place of residence five years ago" tables and files.A special tabulation, "Special Tape File 28," which shows the total number of persons living in that place by residence status five years ago, is available for 1990 Massachusetts residents at state, county, and sub-county or MCD levels.The Census Bureau prepares such tabulation under contract for any state. 1For most states, data are available on a county-to-county basis.MCD-to-MCD migration flow data are available only for the New England region.The gross county-to-county migration flow data in the 2000 census has been recently released by the Bureau. 2 From the data contained in the tabulation, one is able to calculate the five-year average net migration rate as m b for the state, counties, and MCDs.Refer to Appendix I for calculation formulas.
From Formula 3, the variable attraction C t is further operationalized as the following: C 1t is defined here as the annual net migration change rate for the population aged 6-17, equaling the difference between C t s , the annual net migration rate for the population aged 6-17 at time t, and C b s , the five-year average net migration rate for the same group at the base time (i.e., 1985-90 for this estimation).These rates are calculated on school enrollments of grades 1 through 12 on the assumption that the enrolled students are representatives of the population aged 6-17 and that children will change schools with their parents.Enrollment data are collected annually by state education agencies as well as the National Center for Education Statistics (NCES).In Massachusetts, the data are gathered by the Massachusetts Department of Education (MDE), available by grade (from kindergarten to grade 12) and school district of student residence (i.e., city/town in Massachusetts).The data files are referred to as the annual "School Attending Children (SAC)" or January reports.In Component Methods I and II, the net migration rate for the population aged 0-64 is estimated entirely on the basis of elementary school enrollments.High school enrollments are ruled out owing to the high dropout rate in this group.
Since 1986, the MDE has collected the annual data of dropout students in high schools as well as the dropouts who return to schools by October 1 of the following year.In the country, all local education agencies were required to report dropouts to the National Center of Education Statistics (NCES) beginning with the 1992-93 school year.In the 2000-01 school year, 45 states reported dropout data to the NCES using the CCD (The Common Core of Data) forms.
The availability of dropout information allows one to fill in the gaps of high school statistics and extend the employment of the SAC data from elementary schools to all schools (i.e., grades 1 though 12), giving this model an advantage over the quality achieved in Component Methods I and II. 3 C 2t is defined here as the annual net migration change rate for the population aged 25-44, equaling the difference between C t p , the annual net migration rate for the population aged 25-44 at time t, and C b p , the five-year average net migration rate for the same population at the base time.It is assumed that this age group is the parental counterpart of the student population and that nonfamily persons have the migration pattern similar to family persons.Net migrants for this group are derived through multiplying net student age migrants by 1.52, a ratio obtained through dividing the national ever-married population aged 25-44 (or the sum of currently married, separated, widowed, and divorced persons in that age group) by the student age population (aged 6 to 17) in the 1990 census.The ratio is calculated on national rather than local basis because of the importance of interstate migration on estimates.In other cases, ratios estimated on basis of local populations could be considered alternatives.
C 3t is defined here as the annual net migration change rate for group quarters, equaling C t q , the annual net migration rate for group quarters at time t, minus C b q , the five-year average net migration rate for the same population at the base time.Group quarter populations are considered to be only those college students in school dormitories proceeding from their importance on Massachusetts estimates, especially on college town estimates.Another consideration involves the issue of overlapping since certain group quarters such as people in mental hospitals, military bases, and nursing houses have been or will be counted in age group population.
Group quarter data are collected annually by state data agencies.Usually, these agencies are the members of the Federal-State Cooperative Program for Local Population Estimates (FSCPE).In Massachusetts, the information is assembled by the state data center, an agency of the University of Massachusetts at Amherst.The collected data are reported to the Bureau and used in its annual estimates.Calculation formulas for C t are contained in Appendix II.

Framework for the Population Aged 65 plus
The Medicare data, perceived as the best in estimating the elderly population for the quality of coverage, and the method developed by the Bureau (The US Census Bureau, 2002), are adopted in the estimation of the old age migration at state and county levels.The Medicare data are not available for cities/towns, and therefore, the new method will be applied to this level.
At this level, A b is estimated by the same approach as we did on the population aged 0-64 (see Appendix III for the formula).But, only one measurement of the variable attraction C t is identified this time.Because of limitations to data, and also considering that the old age population is relatively stable in mobility, it is assumed that C t in a specific city or town would be affected by annual socioeconomic changes in that county, which could be operationalized as the annual net migration change rate, equaling the difference between the current migration rate and the base time rate in that county.Migrants estimated in this way will be controlled for county counts (see Appendix IV for formulas).

Results
Applying the net domestic migration rate, obtained in the previous section, to the current year expected population (or the sum of survived population plus net immigrants in the current year), generates domestic migration estimates (see Appendix V for the formula).Adding domestic migrants to the expected population yields the complete estimates for that year (see Appendix VI for the formula).Estimates in this case began with the 1990 census.Following the topbottom procedure, estimates are first estimated for the state, then down to counties, and finally cities/towns by age, sex, and race in Massachusetts. 4 The annual vital data including births and deaths were gathered by the Department of Public Health in Massachusetts, and used by the Bureau in their estimation as well.The figures of international migration, the same as used by the Bureau, were provided by the INS. 5   Estimates must be evaluated to see whether any progress has been made.An estimate is considered accurate if it is close to the value of the parameter.The 2000 census counts will be used as the parameter in the evaluation, even though the census itself is subject to various errors.Deviations between estimates and the census counts are assumed to be due to errors in estimates.Meanwhile, a similar comparison will be carried out between the Bureau's estimates and the 2000 census counts to determine if the new approach is superior to the Bureau's.Since the Bureau did not release the 2000 estimates publicly, comparisons will be conducted among the 2000 census counts (April 1, 2000), this study's 2000 and 1999 estimates (April 1, 2000 and July 1, 1999), and the Bureau's 1999 estimates (July 1, 1999). 6If the deviation between the 2000 census counts and the 1999 estimates is smaller than that between the census counts and the Bureau's 1999 estimates, an improvement is assumed to be achieved.
The progress made at the state level is quite evident.Table 1 shows that the 2000 estimates (titled as New in this table as well as following tables) are only 3,441 less than the census counts, and that the 1999 estimates are 44,754 less as compared with the Bureau's 173,925 less than the census counts.The percent deviation (or error) of the 1999 estimates is -0.7, two points lower than the Bureau's.Figure 1 describes population trends in Massachusetts as shown by the two decennial censuses and annual estimates of the new method and the Bureau during the decade of 1990-2000.It can be seen that the underestimation has been considerably reduced.CSP 2008, 35.1: 27-48 Progress is made at the county level too.The mean absolute percent error (MAPE) for counties is 1.6 percent for the 1999 estimates, three points lower than the Bureau's in the same year.Table 2 presents further comparisons at the county level.Of 14 counties in Massachusetts, 12 are improved in terms of the MAPE.
Although the Bureau its released research the 2000 estimates for Massachusetts at the MCD level is 12.4 percent.In contrast, the MAPE of our estimates as shown in Table 1 is only 5.1 percent.
In general, estimates are more accurate for large populations and less accurate for small populations as observed in Table 3. the similar pattern found with both the Bureau's and this study's estimates and these estimates exhibit a lower degree of error except for the population size of 2500 to 9999.For large cities with a population greater than 100,000, the MAPE in the 1999 estimates shows 0.6 percent for the new method, 3.4 points lower than the Bureau's.Bias among subgroups, explained as a specific situation in which estimates tend to be too high (upward bias) or too low (downward bias) for certain areas, can be measured as the absolute percent deviation or error (APE).High APE scores indicate a greater amount of bias, and low APE scores suggest a lesser degree of bias.Table 4 illustrates the distribution of the APE scores across regions.The new method once again presents a lower degree of bias than the Bureau.

Discussion
This pilot application in Massachusetts has noticeably improved the estimates especially for the state total and large populations.To a certain degree, this success in estimates could be attributed to progress made on data coverage.The theoretical framework in the model makes it possible.Enrollments are the primary data source of this estimation.Minorities, the poor, and immigrants, who are more likely to settle down in large cities and also more likely to be left out of IRS records, would not be overlooked by the school enrollment system.It accounts for a major contribution to improvements made in large populations.Moreover, school data collectors do not treat students who no longer live within their districts as enrolled there, and they are more capable of tracking address changes than the IRS.This gives enrollment data another advantage over the IRS for the reduced errors relevant to geography.
Nevertheless, one cannot say that the issue of coverage does not exist any more with enrollments.Both Component Methods I and II are based on elementary enrollments.The assumption that school children represent the whole population in migration is essential to the two methods.This assumption however is subject to question because a bias in terms of horizontal representativeness emerges when the inference drawn from enrollments is applied to the whole population.
To improve the representativeness as addressed, it is necessary to seek new migration measurements that are unconnected to enrollments.That is why group quarters was introduced to delegate the college age population who behaved differently from other groups in migration.Meanwhile, the constant attraction variable in the model was operationalized as the base time migration rate, which actually denoted a migration pattern of people who were counted in neither enrollments nor group quarters.For these people, who are generally aged between 50 and 64, would not move so frequently as the younger people, it was hypothesized that their migration behavior was determined by the elemental socioeconomic conditions in that area.As assumed, these conditions hardly change over time.Having adopted these measures, one is able to embrace all groups with dissimilar migration patterns into the estimation.Consequently, the improved data coverage generated a better accuracy in estimation.
Yet, progress made in small populations is not so apparent as we did on large populations.Unlike enrollments, dropouts are recorded by the school of attendance rather than by the district of residence.In calculating net dropouts, one needs to aggregate the data on school districts (or cities/towns) on the assumption that students attend schools located in their districts of residence.In reality, however, a certain degree of inconsistency between residence and attendance is unavoidable.Furthermore, it is not unusual that several small towns share one high school.Under these circumstances, one must split the dropouts among these towns based on population percentages.This would generate another type of inconsistency.Problems like these have very little impact on estimates for the state and counties, but would certainly bias estimates for cities/towns, especially small cities/towns.Starting in 2005, the US Census Bureau plans to implement the American Community Survey (ACS) in all counties.The Bureau expects that once the survey is in full operation, the ACS will be able to provide annual migration information for areas and population groups of 65,000 or more beginning in 2006 summer.The Bureau intends to use the ACS to replace the long form in the decennial census.The ACS, according to the Bureau, will sample about 2.5 percent of the population, while the long form is a survey of about 17 percent of the population.Even with the larger sample size, the long form in the 2000 census was unable to establish a true migration trend for many areas especially small areas. 8Therefore, it is doubtful that the ACS, given such a small sample size, is capable of offering migration measurements as accurate as the long form.
The striving for methodological innovations in the field is by no means over.In this struggle, our model gives an alternative direction and provides a flexible platform, especially for areas and developing countries where a timely random sample survey is impossible to carry out.
End Notes: 1. MCD-to-MCD data are available only for the New England area.For most states, data are available at only two levels including state and county, which are called as county-to-county data.

3.
Annual net migration of the school age population is first determined by comparing the number of enrollments from kindergarten to grade 11 in one year with the number enrolled in grades 1 to 12 in the following year, then added student deaths and deducted student international migration.

4.
Stepwise derivations of migration and population estimates are provided upon request.

5.
The estimates for cities/towns in Massachusetts will be provided upon request.

6.
The published version of the 2000 estimates was revised on the basis of the 2000 census by the Bureau.

7.
Private schools represented about ten percent of the total students in Massachusetts in 2001.
8. This is not discussed in detail in this paper due to the limited space.
M p t expected net domestic migrants of population aged 25-44 for a given area at time t on the assumption that the population of age 25-44 is the parental cohort of the student population.
M p t = M s t * 1.52 M p 85-90 : expected net domestic migrants of for population aged 24-44 for a given area between 1985 and 1990 on the assumption that the population of age 25-44 is the parental cohort of the student population.
M p 85-90 = M s 85-90 * 1.52 P t and P 90 : defined above C t q -C b q = (M q t ) / (P t ) -(M q 85-90 / 5) / (P 90 ) C t q : annual net migration for group quarters at time t C b q : five-year average net migration rate for the group quarters at the base time, 1985-1990 M q t : expected net migrants of group quarters for a given area at time t, calculated as the difference between the number of current year group quarters and the number of the previous year group quarters.M q 85-90 : net migrants of group quarters for a given area between 1985 and 1990, calculated by the above mentioned method P t and P 90 : defined above.P t and P 90 : defined above M 85-90 : net domestic migrants aged 65+ for a given MCD between 1985 and 1990 (from the special tabulation in the 1990 census).
s P 90: population aged 65+ in the 1990 census for a given MCD, domestic migration not included.
Estimates (Bureau)  and the 2000 population data are provided by the U.S. Census Bureau.

Table 4 Distribution of Absolute Percent Deviation or Error (APE) Massachusetts: 1999 and 2000
While enrollment data are available for both public and private schools in Massachusetts, dropouts are recorded only by public schools.Even if private schools make up a slight proportion in Massachusetts and hold lower drop rate than public schools, the deficiency would surely result in an underestimation of in-migrants, and accordingly undercounts in population estimates of cities/towns where private schools are situated. 7nclusionAlthough this model made improvements in a single state during a single decade, it does demonstrate its potential and warrants further test applications elsewhere.All the data employed in this model, including vital statistics, immigrants, enrollments, group quarters, and dropouts, are obtainable in all states.The US Census Bureau has recently released the 2000 long form migration data (county-to-county for most states), which are required for estimating the constant attraction factor in the model.