|
5 | 5 | Introduction |
6 | 6 | ============= |
7 | 7 |
|
8 | | -PopulationSim is an open platform for population synthesis and survey weighting. It emerged from Oregon DOT's desire to |
9 | | -build a shared, open, platform that could be easily adapted for statewide, regional, and urban |
10 | | -transportation planning needs. |
| 8 | +PopulationSim is an open platform for population synthesis and survey weighting. It emerged from |
| 9 | +`Oregon DOT <https://www.oregon.gov/odot>`_'s desire to build a shared, open, platform that could |
| 10 | +be easily adapted for statewide, regional, and urban transportation planning needs. |
11 | 11 |
|
12 | 12 | What is population synthesis? |
13 | 13 | ----------------------------- |
14 | | -Activity Based Models (ABMs) operate in a micro-simulation framework , wherein the travel choices of person and household decision-making agents are predicted by applying Monte Carlo methods to behavioral models. This requires a data set of households and persons representing the entire population in the modeling region. Population synthesis refers to the process used to create this data. |
15 | | - |
16 | | -The required inputs to population synthesis are a population sample and marginal distributions. The population |
17 | | -sample is commonly referred to as the *seed or reference sample* and the marginal distributions are referred to |
18 | | -as *controls or targets*. **The process of expanding the seed sample to match the marginal distribution |
19 | | -is termed population synthesis.** The software tool which implements this population synthesis process |
| 14 | +Activity based travel demand models such as `ActivitySim <http://www.activitysim.org>`_ operate at an individual |
| 15 | +level, wherein the travel choices of person and household decision-making agents are predicted by applying |
| 16 | +Monte Carlo methods to behavioral models. This requires a data set of households and persons representing |
| 17 | +the entire population in the modeling region. Population synthesis refers to the process used to create this data. |
| 18 | + |
| 19 | +The required inputs to population synthesis are a population sample and marginal distributions (or control totals). |
| 20 | +The population sample is commonly referred to as the *seed or reference sample* and the marginal distributions are |
| 21 | +commonly referred to as *controls or targets*. **The process of expanding the seed sample to match the marginal |
| 22 | +distribution is termed population synthesis.** The software tool which implements this population synthesis process |
20 | 23 | is termed as a **Population Synthesizer**. |
21 | 24 |
|
22 | 25 | What does a Population Synthesizer produce? |
23 | 26 | ------------------------------------------- |
24 | 27 | The objective of a population synthesizer is to generate a synthetic population for |
25 | | -a modeling region. The main outputs from a population synthesizer include lists of persons and households |
26 | | -representing the entire population of the modeling region. These databases include household and person-level |
27 | | -attributes of interest. Examples of attributes at the household level include household income, household size, housing type, and number of vehicles. Examples of person attributes include |
| 28 | +a modeling region. The main outputs from a population synthesizer include tables of persons and households |
| 29 | +representing the entire population of the modeling region. These tables also include household and person-level |
| 30 | +attributes of interest. Examples of attributes at the household level include household income, household size, housing |
| 31 | +type, and number of vehicles. Examples of person attributes include |
28 | 32 | age, gender, work\school status, and occupation. Depending on the use case, a population synthesizer may also |
29 | 33 | produce multi-way distribution of demographic variables at different geographies to be used as an input |
30 | | -to aggregate travel models. In the case of PopulationSim specifically, an additional option is also included to |
31 | | -modify an existing regional synthetic population for a smaller geographical area. In this case, the outputs are a modified list of persons and households. |
| 34 | +to aggregate (four-step) travel models. In the case of PopulationSim specifically, an additional option is also included to |
| 35 | +modify an existing regional synthetic population for a smaller geographical area. In this case, the outputs are a modified |
| 36 | +set of persons and households. |
32 | 37 |
|
33 | 38 | How does a population synthesizer work? |
34 | 39 | --------------------------------------- |
35 | 40 | The main inputs to a population synthesizer are disaggregate population samples and marginal control |
36 | | -distributions. In the United States, the disaggregate population sample is typically obtained from the Census Public Use Microdata Sample (PUMS), but other sources, such as a household travel survey, can also be used. The seed sample should |
37 | | -include demographic variables corresponding to each marginal control termed as *controlled variables* (e.g., |
38 | | -household size, household income, etc.). The seed sample could also include other variables of interest but not |
39 | | -necessarily controlled via marginal controls. These are termed as *uncontrolled variables*. The seed sample should also include an initial weight on each household record. |
40 | | - |
41 | | -Base-year marginal distributions of person and household-level attributes of interest are available from Census. For future years, marginal distributions are either held constant, or forecasted. Marginal distributions can be for both household or person level variables and are specified at a specific geography (e.g., Block Groups, Traffic Analysis Zone or County). PopulationSim allows controls to be specified at multiple geographic levels. |
42 | | - |
43 | | -The objective of a population synthesizer is to |
44 | | -generate household weights which satisfies the marginal control distributions. This is achieved by use of |
45 | | -a data fitting technique. The most common fitting technique used by various population synthesizers is the |
46 | | -Iterative Proportional Fitting (IPF) procedure. Generally, the IPF procedure is used to obtain joint distributions of demographic |
47 | | -variables. Then, random sampling from PUMS generates the baseline synthetic population. |
| 41 | +distributions. In the United States, the disaggregate population sample is typically obtained from the `Census Public Use |
| 42 | +Microdata Sample (PUMS) <https://www.census.gov/programs-surveys/acs/microdata.html>`_, but other sources, such as a household |
| 43 | +travel survey, can also be used. The seed sample should include demographic variables corresponding to each marginal control |
| 44 | +termed as *controlled variables* (e.g., household size, household income, etc.). The seed sample could also include other |
| 45 | +variables of interest but not necessarily controlled via marginal controls. These are termed as *uncontrolled variables*. |
| 46 | +The seed sample should also include an initial weight on each household record. |
| 47 | + |
| 48 | +Current year marginal distributions of person and household-level attributes of interest are available from Census. For |
| 49 | +future years, marginal distributions are either held constant, or forecasted. Marginal distributions can be for both |
| 50 | +household or person level variables and are specified at a specific geography (e.g., Block Groups, Traffic Analysis Zone |
| 51 | +or County). PopulationSim allows controls to be specified at multiple geographic levels. |
| 52 | + |
| 53 | +The objective of a population synthesizer is to generate household weights which satisfies the marginal control |
| 54 | +distributions. This is achieved by use of a data fitting technique. The most common fitting technique used by various |
| 55 | +population synthesizers is the Iterative Proportional Fitting (IPF) procedure. Generally, the IPF procedure is used |
| 56 | +to obtain joint distributions of demographic variables. Then, random sampling from PUMS generates the baseline synthetic |
| 57 | +population. |
48 | 58 |
|
49 | 59 | One of the limitations of the simple IPF method is that it does not incorporate both household and person |
50 | 60 | level attributes simulatenously. Some population synthesizers use a heuristic algorithm called the |
51 | 61 | Iterative Proportional Updating Algorithm (IPU) to incorporate both person and household-level variables in the fitting procedure. |
52 | 62 |
|
53 | | -Besides IPF, entropy |
54 | | -maximization algorithms have been used as a fitting technique. In most of the entropy based methods, |
| 63 | +Besides IPF, entropy maximization algorithms have been used as a fitting technique. In most of the entropy based methods, |
55 | 64 | the relative entropy is used as the objective function. The relative entropy based optimization ensures |
56 | 65 | that the least amount of new information is introduced in finding a feasible solution. The base entropy |
57 | 66 | is defined by the initial weights in the seed sample. The weights generated by the entropy maximization |
58 | 67 | algorithm preserves the distribution of initial weights while matching the marginal controls. This is an |
59 | | -advantage of the entropy maximization based procedures over the IPF based procedures. PopulationSim uses the entropy maximization based list balancing to match controls specified at various geographic levels. |
| 68 | +advantage of the entropy maximization based procedures over the IPF based procedures. PopulationSim uses the entropy maximization |
| 69 | +based list balancing to match controls specified at various geographic levels. |
60 | 70 |
|
61 | | -Once the final weights |
62 | | -have been assigned, seed sample is expanded using these weights to generate a synthetic population. Most |
| 71 | +Once the final weights have been assigned, the seed sample is expanded using these weights to generate a synthetic population. Most |
63 | 72 | population synthesizers create distributions using final weights and employ random sampling to expand the |
64 | 73 | seed sample. PopulationSim uses Linear Programming to convert the final weights to integer values and expands |
65 | | -the seed sample using these integer weights. For detailed description of PopulationSim algorithm, please refer to the TRB paper link in the :ref:`docs` section. For information on software implementation refer to :ref:`core_components` and :ref:`model_steps`. To learn more about PopulationSim application and configuration, please follow the content index below. |
| 74 | +the seed sample using these integer weights. For detailed description of PopulationSim algorithm, please refer to the TRB paper |
| 75 | +link in the :ref:`docs` section. For information on software implementation refer to :ref:`core_components` and :ref:`model_steps`. To |
| 76 | +learn more about PopulationSim application and configuration, please follow the content index below. |
66 | 77 |
|
67 | 78 | How does population synthesis work for survey weighting? |
68 | 79 | -------------------------------------------------------- |
|
0 commit comments