Add framesets (.dtas)#28
Conversation
Support signing Stata .dtas framesets end-to-end: add a frameset_file option to complete_datasignature.ado (bumped version to 3.0.4) which signs each frame, concatenates results, and ignores time-tainted frlink_* chars; add skip_char to control excluded characteristics and tolerate empty frames. Add dev helper dev_adopath_prefix to let tests point Stata at a local src/ during development. Extend Python SCons integration: accept a file_arg in get_datasign, emit the dev adopath prefix into generated recipes, add get_dtas_sign wrapper, and register .dtas in special_sig_fns so SCons pipelines can build/consume framesets. Add SCons test pipeline and several Stata smoke/unit test scripts (producer/consumer, smoke tests) and update tests/SConstruct and statacons_test.do to exercise the .dtas path.
Remove embedded program definitions from tests/statacons_test.do and add separate helper files: tests/store_modts.ado, tests/touch_dta.ado, and tests/write_txt.ado. This separates reusable test helper programs (store_modts, touch_dta, write_txt) into their own .ado files for clarity and reuse; no functional changes intended.
Both are local-only private files (project instructions and session log) that should not appear in the public repo. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Includes Stata-bundled datasets (auto, census) and Stata Press webuse datasets (persons, txcounty, family, discharge1/2, hsng) along with _refresh_datasets.do to re-download the webuse datasets from Stata's servers. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Four examples covering: frame create/change/copy/put/drop/reset (01), frlink/frget/fralias/frval (02), frame post Monte Carlo pattern (03), and frames save/use/modify/describe (04). Datasets loaded via `local datasets "../datasets"` relative to frames/datasets/. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three files: sources-format.md (bibliography of .dtas format sources), sources-applications.md (bibliography of frames application sources), and stata-features-frameset.md (format summary notes). Verbatim Stata IP (.sthlp, PDFs, blog-post conversions) stays in a private repo. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… signing Tests: basic round-trip, empty default frame, name collision, frame count, and signature stability across re-saves. All cases pass. Must be run by opening Stata interactively and doing the file -- not with -e do. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduce a comprehensive test harness for .dtas support under frames/tests: SConstruct files for scons-driven builds, producer/consumer Stata do-files for dtas and linked workflows, interactive and smoke test scripts (including SCons rerun checks), and a Python helper (make_malformed_dtas.py) to generate malformed .dtas fixtures. Also add testlib helpers, run_all wrapper, and binary fixtures for expected outputs. These tests validate signature stability across compression/topology/alias/linked workflows, error handling for malformed archives, and SCons no-rebuild behavior on identical reruns.
Move legacy small-data .dtas test scaffolding from the top-level tests/ folder into frames/tests. Added SConstruct-legacy, legacy producer/consumer do-files, legacy interactive round-trip and legacy smoke scripts, and integrated them into run_all.do and README-tests.md. Removed the old duplicates from tests/ to centralize the harness and keep branch-only legacy scripts out of the top-level tests tree.
Introduce a new 'frameset_signing' SCons config (auto|enabled|disabled) to control .dtas (frameset) signing on Stata <18. Implement logic in pystatacons/stata_utils: get_dtas_sign now detects Stata version once, delegates to get_datasign on Stata 18+, and in Stata <18 either falls back to MD5 with a one-time warning (auto), raises a hard error (enabled), or always use MD5 (disabled). init_env defaults frameset_signing to 'auto' and only registers the .dtas special signature function when not disabled. Also improve error reporting from get_datasign to include the Stata log when a batch run fails. Add tests and SCons test scaffolding for Stata 17, update config templates and docs, and bump package and ado/documentation versions to 3.1.0-alpha2. Note: switching between frameset-aware and MD5 signatures requires a one-time full rebuild due to incompatible sconsign entries.
…ple do-files Add consistent header comments to example and test .do scripts clarifying the assumed working directory (frames/examples, frames/tests, tests/, etc.) and showing how to run each script interactively or in batch (StataMP-64.exe -e). Mark legacy/interactive roundtrip tests as interactive-only and adjust run_all.do header and ordering (place clear all/do testlib.do after header). Also update the datasets refresh script comment to use paths relative to frames/datasets/.
|
A few comments:
Other than that, I can test the frames code, so I'm assuming there's sufficient tests and that they pass on your machine. |
|
Ideas not implemented: Allow SConscripts to refer to specific frames. Currently if one frame changes, the signature for the Stata 16 and 17 could |
Yes, these are all I wasn't too worried about keeping them on the repo since they are <0.5 MB total, but we could
Claude says no, and suggests a fix. I will give it a try and run the tests. Copilot (GPT 5.4) also says no:
Gemini says no
Yes, these passed on Stata17 and 19 on my PC and Stata 19 on our cluster (only tested batch on the cluster I think). It also worked well in an "out of sample" test - a project where I had written up an SConscript with framesets ( |
Save and restore the previously-active frame when complete_datasignature loads a frameset in interactive mode. Implemented by capturing `c(frame)` before saving frames and calling `frame change` after restoring from the temporary .dtas. Add a legacy interactive test to verify restoration when calling from a non-default frame, and update documentation to mention that the previously-active frame is restored.
|
I implemented a fix for your comment 2 above and ran the interactive and batch mode tests again successfully. This is in a new commit 683c8f5, with tag 3.1.0-alpha3. I'm not sure what the etiquette is, do I close this PR and open a new one from that commit? Actually it looks like those commits show up in here so I guess no action needed? |
|
@bquistorff should I have created a fork instead of a branch? So we can merge in milestone versions before publishing 3.1.0? If you think this is close to publication-ready then we can just work here, I guess, and I'll do a branch for future work (tutorial, frames within framesets). |
Introduce several new frames/tests: smoke_dtas_frame_order, smoke_dtas_volatile_chars, smoke_dtas_fralias, smoke_dtas_degenerate, smoke_stata17_fallback, and interactive_collision. Update run_all.do to include the new non-interactive smoke tests and expand README-tests.md with descriptions, expected behavior, and diagnostics for each new test (including notes about interactive vs batch usage). The new tests cover frame ordering, volatile dataset characteristics and skip_char(), fralias aliasing behavior, single-/empty-frameset edge cases, Stata 17 version-guard behavior, and an interactive name-collision restoration scenario. Logs and cleanup steps are included in the test files.
Add support for Stata framesets (.dtas).
The main substantive changes are in
pypkg/src/pystatacons/stata_utils.pyandsrc/complete_datasignature.ado.Checksum for .dtas is concatenated checksums of individual frames, in alphabetical order, for example
output/data/project-area.dtas: LGD_district=23:4(30770):1387127574:1572535773:1213271928|LGD_subdistrict=97:9(94209):517290162:1301972081:2412684702Ignores the order in which frames are saved within frameset, ignores frame's date characteristic.
statacons.adoandstataconsign.adoare not substantively altered (just bump version number):Also added examples in
frames/examplesand tests inframes/tests.Still works on Stata 16 and 17 even though these cannot use framesets (.dtas) -- uses default scons checksum for .dtas. (Works in the sense that statacons will parse things properly, of course code using .dtas with Stata < 18 will fail -- gives warnings.)