Skip to content

Commit dc87cf5

Browse files
hmgaudeckermj023
andauthored
Endogenize investments (#80)
* Revert to using 64-bit precision. * Add function to check whether the model has investments. * [WIP] Refactor: Include stuff in process_model / process_data. * Backup commit before removing fill_list() feature. * Fix tests, model processing seems to work with additional periods now. Missing data and constraints. * Remove fill_list(). * [WIP] re-write process_data s.t. it returns a dictionary. Not working yet with investments. * [WIP] Test for augmenting data with investments. * Stack data for investments. * Fix interfaces. * Readability. * Update environment. * Jax 0.5 installed via pypi, only way that works right now. * [WIP] implement fixed constraints right away. * Refactor, enforce bounds on fixed parameters. * Refactor test for augmented data so we can re-use the fixture. * Implement restrictions for augmented periods. Allow stages + investments. * Update pixi on GHA. * Fix requirements for MacOS. * Correct misnomer. * Update & run hooks. * In case investments are present, leave out entire 'raw' period (=2 skillmodels periods) for transitions. * Fast QR Factorization on GPU (#81) * Implement QR Factorization * Move QR and add tests * Fix Seed, Split tests * Jax via conda. * Fix (?) periods for shocks. * Remove the last two internal periods for transitions if investments are present. * Do not accidentally modify the params_template when enforcing constraints. * Use local optimagic. * Clip at -1e30 in case we end up running in 32bit mode. Use bounds_distance instead of 0, stick to dicts for the moment though code for custom constraints is there. * Reduce memory usage (#82) * Move Checkpoints * Add splitting for gradient * Speed up QR * Update hooks and GHA pipeline. * Use optimagic from GitHub. * Fix heatmaps, update hooks etc. * Fix heatmaps also for factor correlations. * Got started updating transition equations. * Be pedantic about using 'periods' for interface and 'aug_periods' for internal usage. * Implement Janos' suggestion to only ever talk of endogenous factors. * Get rid of deprecation warning. * Use 'aug_period' as the index. * More changes of periods -> aug_periods. * Use 'aug_period' also in DataFrame. * Use 'aug_period' as index also in update_info. * More fixes to aug_period / period. * Do not include correction factors as dependend variables in transition plots by default. * Fix tests, basically allowing non-augmented models again. * Update actions/pyproject. * Use codecov token again. * Fix docstring. * Update GHA, pre-commit hooks. * Add CLAUDE.md * Move from mypy -> ty. Allow anchoring and endogenous factors at the same time. --------- Co-authored-by: Max Jahn <max.jahn45@gmail.com>
1 parent 8417b4b commit dc87cf5

43 files changed

Lines changed: 9504 additions & 14317 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/main.yml

Lines changed: 12 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -23,51 +23,27 @@ jobs:
2323
- macos-latest
2424
- windows-latest
2525
python-version:
26-
- '3.12'
26+
- '3.13'
2727
steps:
28-
- uses: actions/checkout@v4
29-
- uses: prefix-dev/setup-pixi@v0.8.0
28+
- uses: actions/checkout@v6
29+
- uses: prefix-dev/setup-pixi@v0.9.3
3030
with:
31-
pixi-version: v0.29.0
31+
pixi-version: v0.62.2
3232
cache: true
3333
cache-write: ${{ github.event_name == 'push' && github.ref_name == 'main' }}
3434
environments: test-cpu
3535
activate-environment: true
36+
frozen: true
3637
- name: Run pytest
38+
if: runner.os != 'Linux' || matrix.python-version != '3.13'
39+
shell: bash -l {0}
40+
run: pixi run -e test-cpu tests
41+
- name: Run pytest with coverage
42+
if: runner.os == 'Linux' && matrix.python-version == '3.13'
3743
shell: bash -l {0}
3844
run: pixi run -e test-cpu tests-with-cov
3945
- name: Upload coverage report
40-
if: runner.os == 'Linux' && matrix.python-version == '3.12'
41-
uses: codecov/codecov-action@v4
46+
if: runner.os == 'Linux' && matrix.python-version == '3.13'
47+
uses: codecov/codecov-action@v5
4248
env:
4349
CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
44-
# run-mypy:
45-
# name: Run mypy on Python 3.12
46-
# runs-on: ubuntu-latest
47-
# strategy:
48-
# fail-fast: false
49-
# steps:
50-
# - uses: actions/checkout@v4
51-
# - uses: prefix-dev/setup-pixi@v0.8.0
52-
# with:
53-
# pixi-version: v0.28.2
54-
# cache: true
55-
# cache-write: ${{ github.event_name == 'push' && github.ref_name == 'main' }}
56-
# environments: mypy
57-
# - name: Run mypy
58-
# shell: bash -l {0}
59-
# run: pixi run mypy
60-
# run-explanation-notebooks:
61-
# name: Run explanation notebooks on Python 3.12
62-
# runs-on: ubuntu-latest
63-
# steps:
64-
# - uses: actions/checkout@v4
65-
# - uses: prefix-dev/setup-pixi@v0.8.0
66-
# with:
67-
# pixi-version: v0.28.2
68-
# cache: true
69-
# cache-write: ${{ github.event_name == 'push' && github.ref_name == 'main' }}
70-
# environments: test
71-
# - name: Run explanation notebooks
72-
# shell: bash -l {0}
73-
# run: pixi run -e test explanation-notebooks

.pre-commit-config.yaml

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,11 @@ repos:
66
- id: check-useless-excludes
77
# - id: identity # Prints all files passed to pre-commits. Debugging.
88
- repo: https://github.com/lyz-code/yamlfix
9-
rev: 1.17.0
9+
rev: 1.19.1
1010
hooks:
1111
- id: yamlfix
1212
- repo: https://github.com/pre-commit/pre-commit-hooks
13-
rev: v4.6.0
13+
rev: v6.0.0
1414
hooks:
1515
- id: check-added-large-files
1616
args:
@@ -30,43 +30,45 @@ repos:
3030
args:
3131
- --fix=lf
3232
description: Forces to replace line ending by the UNIX 'lf' character.
33+
- id: no-commit-to-branch
34+
args:
35+
- --branch
36+
- main
3337
- id: name-tests-test
3438
args:
3539
- --pytest-test-first
3640
- id: trailing-whitespace
3741
- id: check-ast
3842
- id: check-docstring-first
3943
- repo: https://github.com/adrienverge/yamllint.git
40-
rev: v1.35.1
44+
rev: v1.37.1
4145
hooks:
4246
- id: yamllint
4347
- repo: https://github.com/astral-sh/ruff-pre-commit
44-
rev: v0.6.4
48+
rev: v0.14.10
4549
hooks:
46-
# Run the linter.
47-
- id: ruff
50+
- id: ruff-check
4851
types_or:
4952
- python
5053
- pyi
5154
- jupyter
5255
args:
5356
- --fix
5457
# - --unsafe-fixes
55-
# Run the formatter.
5658
- id: ruff-format
5759
types_or:
5860
- python
5961
- pyi
6062
- jupyter
6163
- repo: https://github.com/kynan/nbstripout
62-
rev: 0.7.1
64+
rev: 0.8.2
6365
hooks:
6466
- id: nbstripout
6567
args:
6668
- --extra-keys
6769
- metadata.kernelspec metadata.language_info.version metadata.vscode
6870
- repo: https://github.com/executablebooks/mdformat
69-
rev: 0.7.17
71+
rev: 1.0.0
7072
hooks:
7173
- id: mdformat
7274
additional_dependencies:

CLAUDE.md

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in
4+
this repository.
5+
6+
## Project Overview
7+
8+
skillmodels is a Python implementation of estimators for nonlinear dynamic latent factor
9+
models, primarily used for skill formation research in economics. It implements Kalman
10+
filter-based maximum likelihood estimation following Cunha, Heckman, Schennach (2010).
11+
12+
## Development Commands
13+
14+
```bash
15+
# Run tests
16+
pixi run tests
17+
18+
# Run tests with coverage
19+
pixi run tests-with-cov
20+
21+
# Run a single test file
22+
pixi run -e test-cpu pytest tests/test_kalman_filters.py
23+
24+
# Run a single test
25+
pixi run -e test-cpu pytest tests/test_kalman_filters.py::test_function_name
26+
27+
# Type checking
28+
pixi run ty
29+
30+
# Install pre-commit hooks (required before committing)
31+
pre-commit install
32+
33+
# Build documentation (from docs/ directory)
34+
make html
35+
```
36+
37+
## Architecture
38+
39+
### Core Pipeline Flow
40+
41+
```
42+
Model Dict + Data
43+
44+
process_model() → Validates/extends model specification
45+
46+
process_data() → Transforms data to estimation format
47+
48+
get_maximization_inputs() → Creates optimization problem (likelihood, gradients, constraints)
49+
50+
[optimagic optimization]
51+
52+
get_filtered_states() → Extract estimated latent factors
53+
```
54+
55+
### Key Modules
56+
57+
- **process_model.py**: Model specification validation and preprocessing. Handles
58+
dimensions, labels, stagemap, anchoring, and endogenous factors.
59+
- **kalman_filters.py**: Core Kalman filter implementation (predict/update steps). Uses
60+
square-root form for numerical stability.
61+
- **likelihood_function.py**: Log-likelihood computation using Kalman filtering.
62+
Includes soft clipping for numerical stability.
63+
- **constraints.py**: Generates parameter constraints (bounds, equalities from stagemap,
64+
fixed values) for optimization.
65+
- **parse_params.py**: Converts flat parameter vectors to structured model parameters.
66+
- **transition_functions.py**: Pre-built transition equations (`linear`, `log_ces`,
67+
`constant`). Custom functions can be added.
68+
69+
### JAX Usage
70+
71+
All computation-heavy code uses JAX for automatic differentiation and JIT compilation.
72+
The codebase uses:
73+
74+
- `jax.vmap` for vectorization across observations
75+
- `jax.jit` for compilation
76+
- JAX arrays throughout the estimation pipeline
77+
- Optional GPU support via CUDA
78+
79+
### Public API
80+
81+
The main package exports three functions:
82+
83+
- `get_maximization_inputs()`: Prepare optimization problem for parameter estimation
84+
- `get_filtered_states()`: Extract filtered latent factor estimates
85+
- `simulate_dataset()`: Generate synthetic data from model specification
86+
87+
## Code Style
88+
89+
- Uses Ruff for linting (target: Python 3.13, line length: 88)
90+
- Google-style docstrings
91+
- Pre-commit hooks enforce formatting and linting
92+
- Type checking via `ty` with strict rules
93+
94+
## Testing
95+
96+
- pytest with markers: `wip`, `unit`, `integration`, `end_to_end`
97+
- Test files mirror source structure in `tests/`
98+
- Memory profiling available via pytest-memray (Unix only)

docs/source/getting_started/tutorial.ipynb

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -289,9 +289,7 @@
289289
"metadata": {},
290290
"outputs": [],
291291
"source": [
292-
"from om.process_constraints import process_constraints\n",
293-
"\n",
294-
"pc, pp = process_constraints(constraints, params)\n",
292+
"pc, pp = om.process_constraints(constraints, params)\n",
295293
"params[\"group\"] = params.index.get_level_values(\"category\")\n",
296294
"params.loc[\"controls\", \"group\"] = params.loc[\"controls\"].index.get_level_values(\"name2\")\n",
297295
"\n",

docs/source/how_to_guides/model_specs.rst

Lines changed: 3 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -96,14 +96,8 @@ Any python string is possible as factor name. The values are dictionaries with t
9696
entries:
9797

9898
- measurements: A nested list that is as long as the number of periods of the model.
99-
Each sublist contains the names of the measurements in that period. If A factor has
100-
no measurements in a period, it has to be an empty list. If a factor only has
101-
measurements up to a certain period you can leave out the empty lists at the end. In
102-
the example this is done for factor 3. Note that even in that case, the measurements
103-
have to be specified as nested list. If a factor only starts having measurements in
104-
some period, you still have to specify the empty lists for all periods before that
105-
period.
106-
99+
Each sublist contains the names of the measurements in that period. If a factor has
100+
no measurements in a period, it has to be an empty list.
107101

108102
- normalizations: This entry is optional. It is a dictionary that can have the keys
109103
``"loadings"`` and ``"intercepts"``. The values are lists of dictionaries. The list
@@ -114,7 +108,7 @@ entries:
114108
- transition_equation: A string with the name of a pre-implemented transition equation
115109
or a custom transition equation. Pre-implemented transition equations are
116110
linear, log_ces (in the known location and scale version), constant and translog.
117-
The example model dictionary only uses pre-implement transition functions.
111+
The example model dictionary only uses pre-implemented transition functions.
118112

119113
To see how to use custom transition functions, assume that the yaml file shown above
120114
has been loaded into a python dictionary called ``model`` and look at the following

environment.yml

Lines changed: 0 additions & 31 deletions
This file was deleted.

0 commit comments

Comments
 (0)