Skip to content

Commit b664d22

Browse files
authored
Develop (#132)
* deprecate py2.7 * Multiprocess (#130)
1 parent e990822 commit b664d22

52 files changed

Lines changed: 622 additions & 289 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
sandbox/
22
regress/
33
example_test_no_integerizing/
4+
example_mtc/
45
.idea
56
.ipynb_checkpoints
67

.travis.yml

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,24 +3,27 @@ language: python
33
sudo: false
44

55
python:
6-
- '2.7'
7-
- '3.6'
86
- '3.7'
7+
- '3.8'
98

109
install:
11-
- wget http://repo.continuum.io/miniconda/Miniconda-3.7.0-Linux-x86_64.sh -O miniconda.sh
10+
- wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
1211
- bash miniconda.sh -b -p $HOME/miniconda
13-
- export PATH="$HOME/miniconda/bin:$PATH"
12+
- source "$HOME/miniconda/etc/profile.d/conda.sh"
1413
- hash -r
1514
- conda config --set always_yes yes --set changeps1 no
1615
- conda update -q conda
17-
- conda create -q -n test-environment python=$TRAVIS_PYTHON_VERSION future
18-
- source activate test-environment
16+
- conda info -a
17+
- conda create -q -n test-environment python=$TRAVIS_PYTHON_VERSION
18+
- conda activate test-environment
1919
- conda install pytest pytest-cov coveralls pycodestyle
2020
- pip install .
21+
- pip freeze
22+
2123
script:
2224
- pycodestyle populationsim
2325
- py.test --cov populationsim --cov-report term-missing
26+
2427
after_success:
2528
- coveralls
2629
# Build docs

LICENSE.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
BSD 3-Clause License
2+
13
PopulationSim
24
Contributions Copyright (C) by the contributing authors
35

MANIFEST.in

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
include ez_setup.py
2-
include README.rst
32
graft example_calm
43
graft example_calm_repop
54
graft example_survey_weighting

docs/application_configuration.rst

Lines changed: 127 additions & 62 deletions
Large diffs are not rendered by default.

docs/conf.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,9 @@
2020
# -- Get Package Version --------------------------------------------------
2121
with open("../setup.py") as file:
2222
lines = file.readlines()
23-
for l in lines:
24-
if "version" in l:
25-
VERSION = l.replace("version='", "").replace("',", "").replace(" ", "")
23+
for line in lines:
24+
if "version" in line:
25+
VERSION = line.replace("version='", "").replace("',", "").replace(" ", "")
2626
print("package version: " + VERSION)
2727

2828
# If extensions (or modules to document with autodoc) are in another directory,

docs/getting_started.rst

Lines changed: 39 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -27,34 +27,35 @@ Installation
2727

2828
::
2929

30-
conda create -n popsim python=3.7
30+
conda create -n popsim python=3.8
3131

32-
#Windows
32+
# Windows
3333
activate popsim
3434

35-
#Mac
35+
# Mac
3636
conda activate popsim
3737

3838
4. Get and install the PopulationSim package on the activated conda Python environment:
3939

4040
::
4141

42+
# best to use the conda version of pytables for consistency with activitysim
43+
conda install pytables
44+
4245
pip install populationsim
4346

4447

45-
.. _anaconda_notes :
48+
.. _activitysim :
4649

47-
Python 2 or 3?
48-
~~~~~~~~~~~~~~~
50+
ActivitySim
51+
~~~~~~~~~~~
4952

5053
.. note::
5154

52-
PopulationSim is a 64bit Python 2 or 3 library that uses a number of packages from the
55+
PopulationSim is a 64bit Python 3 library that uses a number of packages from the
5356
scientific Python ecosystem, most notably `pandas <http://pandas.pydata.org>`__
54-
and `numpy <http://numpy.org>`__. It relies heavily on the
55-
`ActivitySim <https://activitysim.github.io>`__ package. Both ActivitySim and PopulationSim
56-
currently support Python 2, but Python 2 will be `retired <https://pythonclock.org/>`__ at the
57-
end of 2019 so Python 3 is recommended.
57+
and `numpy <http://numpy.org>`__. It also relies heavily on the
58+
`ActivitySim <https://activitysim.github.io>`__ package.
5859

5960
The recommended way to get your own scientific Python installation is to
6061
install 64 bit Anaconda, which contains many of the libraries upon which
@@ -67,7 +68,17 @@ Python 2 or 3?
6768
Run Examples
6869
------------
6970

70-
There are three examples for running PopulationSim, two created using data from the Corvallis-Albany-Lebanon Modeling (CALM) region in Oregon and the other using data from the Metro Vancouver region in British Columbia. The `example_calm`_ set-up runs PopulationSim in base mode, where a synthetic population is created for the entire modeling region. This takes approximately 12 minutes on a laptop with an Intel i7-4800MQ CPU @ 2.70GHz and 16 GB of RAM. The `example_calm_repop`_ set-up runs PopulationSim in the *repop* mode, which updates the synthetic population for a small part of the region. The `example_survey_weighting`_ set-up runs PopulationSim for the case of developing final weights for a household travel survey. More information on the configuration of PopulationSim can be found in the **Application & Configuration** section.
71+
There are four examples for running PopulationSim, three created using data from the
72+
Corvallis-Albany-Lebanon Modeling (CALM) region in Oregon and the other using data from
73+
the Metro Vancouver region in British Columbia.
74+
75+
1. The `example_calm`_ set-up runs PopulationSim, where a synthetic population is created single-processed for the entire modeling region.
76+
77+
2. The `example_calm_mp`_ set-up runs PopulationSim `multi-processed <http://docs.python.org/3/library/multiprocessing.html>`_, where a synthetic population is created for the entire modeling region by simultaneously balancing results using multiple processors on your computer, thereby reducing runtime.
78+
79+
3. The `example_calm_repop`_ set-up runs PopulationSim in the *repop* mode, which updates the synthetic population for a small part of the region.
80+
81+
4. The `example_survey_weighting`_ set-up runs PopulationSim for the case of developing final weights for a household travel survey. More information on the configuration of PopulationSim can be found in the **Application & Configuration** section.
7182

7283
Example_calm
7384
~~~~~~~~~~~~
@@ -84,6 +95,22 @@ Follow the steps below to run **example_calm** set up:
8495

8596
* Review the outputs in the *output* folder
8697

98+
Example_calm_mp
99+
~~~~~~~~~~~~~~~
100+
101+
Follow the steps below to run **example_calm_mp** multiprocessed set up:
102+
103+
* Open a command prompt in the example_calm folder
104+
* In ``configs_mp\setting.yaml``, set ``num_processes: 2`` to a reasonable number of processors for your machine
105+
* Run the following commands:
106+
107+
::
108+
109+
activate popsim
110+
python run_populationsim.py -c configs_mp -c configs
111+
112+
* Review the outputs in the *output* folder
113+
87114
Example_calm_repop
88115
~~~~~~~~~~~~~~~~~~
89116

docs/index.rst

Lines changed: 43 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -5,64 +5,75 @@
55
Introduction
66
=============
77

8-
PopulationSim is an open platform for population synthesis and survey weighting. It emerged from Oregon DOT's desire to
9-
build a shared, open, platform that could be easily adapted for statewide, regional, and urban
10-
transportation planning needs.
8+
PopulationSim is an open platform for population synthesis and survey weighting. It emerged from
9+
`Oregon DOT <https://www.oregon.gov/odot>`_'s desire to build a shared, open, platform that could
10+
be easily adapted for statewide, regional, and urban transportation planning needs.
1111

1212
What is population synthesis?
1313
-----------------------------
14-
Activity Based Models (ABMs) operate in a micro-simulation framework , wherein the travel choices of person and household decision-making agents are predicted by applying Monte Carlo methods to behavioral models. This requires a data set of households and persons representing the entire population in the modeling region. Population synthesis refers to the process used to create this data.
15-
16-
The required inputs to population synthesis are a population sample and marginal distributions. The population
17-
sample is commonly referred to as the *seed or reference sample* and the marginal distributions are referred to
18-
as *controls or targets*. **The process of expanding the seed sample to match the marginal distribution
19-
is termed population synthesis.** The software tool which implements this population synthesis process
14+
Activity based travel demand models such as `ActivitySim <http://www.activitysim.org>`_ operate at an individual
15+
level, wherein the travel choices of person and household decision-making agents are predicted by applying
16+
Monte Carlo methods to behavioral models. This requires a data set of households and persons representing
17+
the entire population in the modeling region. Population synthesis refers to the process used to create this data.
18+
19+
The required inputs to population synthesis are a population sample and marginal distributions (or control totals).
20+
The population sample is commonly referred to as the *seed or reference sample* and the marginal distributions are
21+
commonly referred to as *controls or targets*. **The process of expanding the seed sample to match the marginal
22+
distribution is termed population synthesis.** The software tool which implements this population synthesis process
2023
is termed as a **Population Synthesizer**.
2124

2225
What does a Population Synthesizer produce?
2326
-------------------------------------------
2427
The objective of a population synthesizer is to generate a synthetic population for
25-
a modeling region. The main outputs from a population synthesizer include lists of persons and households
26-
representing the entire population of the modeling region. These databases include household and person-level
27-
attributes of interest. Examples of attributes at the household level include household income, household size, housing type, and number of vehicles. Examples of person attributes include
28+
a modeling region. The main outputs from a population synthesizer include tables of persons and households
29+
representing the entire population of the modeling region. These tables also include household and person-level
30+
attributes of interest. Examples of attributes at the household level include household income, household size, housing
31+
type, and number of vehicles. Examples of person attributes include
2832
age, gender, work\school status, and occupation. Depending on the use case, a population synthesizer may also
2933
produce multi-way distribution of demographic variables at different geographies to be used as an input
30-
to aggregate travel models. In the case of PopulationSim specifically, an additional option is also included to
31-
modify an existing regional synthetic population for a smaller geographical area. In this case, the outputs are a modified list of persons and households.
34+
to aggregate (four-step) travel models. In the case of PopulationSim specifically, an additional option is also included to
35+
modify an existing regional synthetic population for a smaller geographical area. In this case, the outputs are a modified
36+
set of persons and households.
3237

3338
How does a population synthesizer work?
3439
---------------------------------------
3540
The main inputs to a population synthesizer are disaggregate population samples and marginal control
36-
distributions. In the United States, the disaggregate population sample is typically obtained from the Census Public Use Microdata Sample (PUMS), but other sources, such as a household travel survey, can also be used. The seed sample should
37-
include demographic variables corresponding to each marginal control termed as *controlled variables* (e.g.,
38-
household size, household income, etc.). The seed sample could also include other variables of interest but not
39-
necessarily controlled via marginal controls. These are termed as *uncontrolled variables*. The seed sample should also include an initial weight on each household record.
40-
41-
Base-year marginal distributions of person and household-level attributes of interest are available from Census. For future years, marginal distributions are either held constant, or forecasted. Marginal distributions can be for both household or person level variables and are specified at a specific geography (e.g., Block Groups, Traffic Analysis Zone or County). PopulationSim allows controls to be specified at multiple geographic levels.
42-
43-
The objective of a population synthesizer is to
44-
generate household weights which satisfies the marginal control distributions. This is achieved by use of
45-
a data fitting technique. The most common fitting technique used by various population synthesizers is the
46-
Iterative Proportional Fitting (IPF) procedure. Generally, the IPF procedure is used to obtain joint distributions of demographic
47-
variables. Then, random sampling from PUMS generates the baseline synthetic population.
41+
distributions. In the United States, the disaggregate population sample is typically obtained from the `Census Public Use
42+
Microdata Sample (PUMS) <https://www.census.gov/programs-surveys/acs/microdata.html>`_, but other sources, such as a household
43+
travel survey, can also be used. The seed sample should include demographic variables corresponding to each marginal control
44+
termed as *controlled variables* (e.g., household size, household income, etc.). The seed sample could also include other
45+
variables of interest but not necessarily controlled via marginal controls. These are termed as *uncontrolled variables*.
46+
The seed sample should also include an initial weight on each household record.
47+
48+
Current year marginal distributions of person and household-level attributes of interest are available from Census. For
49+
future years, marginal distributions are either held constant, or forecasted. Marginal distributions can be for both
50+
household or person level variables and are specified at a specific geography (e.g., Block Groups, Traffic Analysis Zone
51+
or County). PopulationSim allows controls to be specified at multiple geographic levels.
52+
53+
The objective of a population synthesizer is to generate household weights which satisfies the marginal control
54+
distributions. This is achieved by use of a data fitting technique. The most common fitting technique used by various
55+
population synthesizers is the Iterative Proportional Fitting (IPF) procedure. Generally, the IPF procedure is used
56+
to obtain joint distributions of demographic variables. Then, random sampling from PUMS generates the baseline synthetic
57+
population.
4858

4959
One of the limitations of the simple IPF method is that it does not incorporate both household and person
5060
level attributes simulatenously. Some population synthesizers use a heuristic algorithm called the
5161
Iterative Proportional Updating Algorithm (IPU) to incorporate both person and household-level variables in the fitting procedure.
5262

53-
Besides IPF, entropy
54-
maximization algorithms have been used as a fitting technique. In most of the entropy based methods,
63+
Besides IPF, entropy maximization algorithms have been used as a fitting technique. In most of the entropy based methods,
5564
the relative entropy is used as the objective function. The relative entropy based optimization ensures
5665
that the least amount of new information is introduced in finding a feasible solution. The base entropy
5766
is defined by the initial weights in the seed sample. The weights generated by the entropy maximization
5867
algorithm preserves the distribution of initial weights while matching the marginal controls. This is an
59-
advantage of the entropy maximization based procedures over the IPF based procedures. PopulationSim uses the entropy maximization based list balancing to match controls specified at various geographic levels.
68+
advantage of the entropy maximization based procedures over the IPF based procedures. PopulationSim uses the entropy maximization
69+
based list balancing to match controls specified at various geographic levels.
6070

61-
Once the final weights
62-
have been assigned, seed sample is expanded using these weights to generate a synthetic population. Most
71+
Once the final weights have been assigned, the seed sample is expanded using these weights to generate a synthetic population. Most
6372
population synthesizers create distributions using final weights and employ random sampling to expand the
6473
seed sample. PopulationSim uses Linear Programming to convert the final weights to integer values and expands
65-
the seed sample using these integer weights. For detailed description of PopulationSim algorithm, please refer to the TRB paper link in the :ref:`docs` section. For information on software implementation refer to :ref:`core_components` and :ref:`model_steps`. To learn more about PopulationSim application and configuration, please follow the content index below.
74+
the seed sample using these integer weights. For detailed description of PopulationSim algorithm, please refer to the TRB paper
75+
link in the :ref:`docs` section. For information on software implementation refer to :ref:`core_components` and :ref:`model_steps`. To
76+
learn more about PopulationSim application and configuration, please follow the content index below.
6677

6778
How does population synthesis work for survey weighting?
6879
--------------------------------------------------------

docs/software.rst

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,8 @@ This page describes the PopulationSim software implementation and how to contrib
99

1010
The implementation starts with
1111
the ActivitySim framework, which serves as the foundation for the software. The framework, as briefly described
12-
below, includes features for data pipeline management, expression handling, testing, etc. Built upon the
13-
framework are additional core components for population synthesis such as balancers and integerizers.
12+
below, includes features for data pipeline management, expression handling, multiprocessing, testing, etc. Built upon
13+
the framework are additional core components for population synthesis such as balancers and integerizers.
1414
Built upon the population synthesis core components are the model steps that make up a PopulationSim run,
1515
such as the inputs pre-processor, setting up the data strucutres, doing the initial seed balancing, etc.
1616

@@ -42,7 +42,8 @@ being implemented in the ActivitySim framework means:
4242
* Model Orchestrator
4343

4444
* `ORCA <https://github.com/UDST/orca>`__ is used for running the overall model system and for defining dynamic data tables, columns, and injectables (functions). ActivitySim wraps ORCA functionality to make a Data Pipeline tool, which allows for re-starting at any model step.
45-
45+
* Support for `multiprocessing <http://docs.python.org/3/library/multiprocessing.html>`_ to reduce runtime
46+
4647
* Expressions
4748

4849
* Model expressions are in CSV files and contain Python expressions, mainly pandas/numpy expression that operate on the input data tables. This helps to avoid modifying Python code when making changes to the model calculations.
@@ -236,4 +237,5 @@ Release Notes
236237
* v0.4 - transfer to ActivitySim.org
237238
* v0.4.1 - package updates
238239
* v0.4.2 - validation script in Python
239-
* v0.4.3 - allow non-binary incidence
240+
* v0.4.3 - allow non-binary incidence
241+
* v0.5 - support for multiprocessing

0 commit comments

Comments
 (0)