|
| 1 | +# AGENTS-pisd.md — AI Coding Assistant Guide: `pisd_shape` Module |
| 2 | + |
| 3 | +**Version:** 1.0.0 |
| 4 | +**Module:** `pisd_shape` (Pflugerville ISD Attendance Boundary Shapefile Extractor) |
| 5 | +**Environment:** Python 3.12+, uv, ruff, pytest, GitHub Actions CI |
| 6 | +**Model:** Claude Sonnet 4.6 (claude-sonnet-4-6) |
| 7 | +**Repository:** `Abstract-Data/RyanData-Address-Utils` |
| 8 | +**Branch convention:** `claude/<slug>-<id>` (e.g., `claude/continue-work-uO5cO`) |
| 9 | + |
| 10 | +--- |
| 11 | + |
| 12 | +## Module Purpose |
| 13 | + |
| 14 | +`pisd_shape` extracts Pflugerville ISD (PFISD) school attendance boundary layers from an ArcGIS |
| 15 | +Experience Builder WebMap and writes them as ESRI Shapefiles for use in GIS tools (QGIS, ArcGIS Pro, etc.). |
| 16 | + |
| 17 | +Layers extracted: |
| 18 | +- `Elementary_School_Locations` — point geometries, school site locations |
| 19 | +- `Elementary_Schools_2025-26` — polygon attendance boundaries |
| 20 | +- `Middle_School_Locations` — point geometries |
| 21 | +- `Middle_Schools_2025-26` — polygon attendance boundaries |
| 22 | +- `High_School_Locations` — point geometries |
| 23 | +- `High_Schools_2025-26` — polygon attendance boundaries |
| 24 | +- `Pflugerville_ISD_Boundary` — district boundary polygon |
| 25 | + |
| 26 | +**Source:** https://experience.arcgis.com/experience/0bc78994af534cd1a703c8959abeac9d |
| 27 | +**WebMap JSON:** `https://Pflugervilleisd.maps.arcgis.com/sharing/rest/content/items/bb587c1043a949cca04f1b1904c235e3/data?f=json` |
| 28 | + |
| 29 | +--- |
| 30 | + |
| 31 | +## Agent Scope |
| 32 | + |
| 33 | +### Reads |
| 34 | +- `src/pisd_shape/pfisd_extract_shapefiles.py` — only source file in this module |
| 35 | +- `src/pisd_shape/__init__.py` — module docstring |
| 36 | +- `src/pisd_shape/export/` — output shapefiles (read-only reference; agent does not parse them) |
| 37 | +- `pyproject.toml` — dependency and tool config |
| 38 | + |
| 39 | +### Writes |
| 40 | +- `src/pisd_shape/pfisd_extract_shapefiles.py` — geometry helpers, layer extraction, CLI |
| 41 | +- `src/pisd_shape/__init__.py` — module-level exports if any are added |
| 42 | +- `src/pisd_shape/export/` — shapefile outputs (`.shp`, `.dbf`, `.shx`, `.prj`, `.cpg`) |
| 43 | +- `tests/` — new test files for `pisd_shape` (currently no tests exist) |
| 44 | + |
| 45 | +### Executes |
| 46 | +```bash |
| 47 | +python src/pisd_shape/pfisd_extract_shapefiles.py # fetch from ArcGIS Online |
| 48 | +python src/pisd_shape/pfisd_extract_shapefiles.py --local data.json # load from local JSON |
| 49 | +uv run ruff check src/pisd_shape/ # lint |
| 50 | +uv run ruff format src/pisd_shape/ # format |
| 51 | +uv run mypy src/pisd_shape/ # type check |
| 52 | +uv run pytest tests/ -k pisd # run pisd-specific tests |
| 53 | +``` |
| 54 | + |
| 55 | +### Off-limits (do not touch without explicit instruction) |
| 56 | +- `src/ryandata_address_utils/` — main address parsing package; unrelated to this module |
| 57 | +- `tests/test_address_parser.py`, `test_factories.py`, `test_unified_model.py`, etc. |
| 58 | +- `.github/workflows/` — CI configuration |
| 59 | +- `pyproject.toml` `[project.scripts]` section — no CLI entrypoint for pisd_shape currently |
| 60 | + |
| 61 | +--- |
| 62 | + |
| 63 | +## File Structure |
| 64 | + |
| 65 | +``` |
| 66 | +src/pisd_shape/ |
| 67 | +├── __init__.py # Module docstring only; no public API exports yet |
| 68 | +└── pfisd_extract_shapefiles.py # All logic: fetch → parse → reproject → write shapefiles |
| 69 | + ├── CONFIG block # WEBMAP_URL, OUTPUT_DIR, transformer (EPSG:3857 → 4326) |
| 70 | + ├── Geometry helpers # reproject_ring(), esri_polygon_to_shapely(), esri_point_to_shapely() |
| 71 | + ├── Layer extraction # extract_layer() → GeoDataFrame |
| 72 | + ├── Filename sanitizer # safe_filename() |
| 73 | + └── main() # argparse CLI + orchestration |
| 74 | +
|
| 75 | +src/pisd_shape/export/ # Committed shapefile outputs (pre-extracted) |
| 76 | +├── Elementary_School_Locations.* |
| 77 | +├── Elementary_Schools_2025-26.* |
| 78 | +├── Middle_School_Locations.* |
| 79 | +├── Middle_Schools_2025-26.* |
| 80 | +├── High_School_Locations.* |
| 81 | +├── High_Schools_2025-26.* |
| 82 | +└── Pflugerville_ISD_Boundary.* |
| 83 | +``` |
| 84 | + |
| 85 | +--- |
| 86 | + |
| 87 | +## Data Flow |
| 88 | + |
| 89 | +``` |
| 90 | +ArcGIS Online WebMap JSON |
| 91 | + │ |
| 92 | + ▼ requests.get(WEBMAP_URL) [or --local <file>] |
| 93 | +webmap["operationalLayers"] |
| 94 | + │ |
| 95 | + ▼ for each layer |
| 96 | +layer["featureCollection"]["layers"] |
| 97 | + │ |
| 98 | + ▼ extract_layer(sub_layer, title) |
| 99 | +featureSet["features"] |
| 100 | + │ |
| 101 | + ├─ esriGeometryPolygon → esri_polygon_to_shapely() |
| 102 | + │ └─ reproject_ring() [EPSG:3857 → EPSG:4326 via pyproj.Transformer] |
| 103 | + │ └─ Polygon / MultiPolygon (Shapely, .buffer(0) cleaned) |
| 104 | + │ |
| 105 | + └─ esriGeometryPoint → esri_point_to_shapely() |
| 106 | + └─ transformer.transform(x, y) → Point (Shapely) |
| 107 | + │ |
| 108 | + ▼ |
| 109 | + gpd.GeoDataFrame(rows, crs="EPSG:4326") |
| 110 | + │ |
| 111 | + ▼ gdf.to_file(path, driver="ESRI Shapefile") |
| 112 | + src/pisd_shape/export/<safe_filename>.shp |
| 113 | +``` |
| 114 | + |
| 115 | +### Key data facts |
| 116 | +- All source geometry is **Web Mercator (EPSG:3857)**; output is always **WGS84 (EPSG:4326)** |
| 117 | +- Layers are **inline Feature Collections** — there is no FeatureServer REST endpoint to query |
| 118 | +- ESRI polygon rings use winding order for outer/hole distinction; current code treats each ring as an |
| 119 | + independent polygon with `buffer(0)` cleanup (acceptable for boundary data) |
| 120 | +- Shapefile field names are truncated to **10 characters** (dBASE III limitation) |
| 121 | +- Missing or empty geometries are skipped and counted; the module logs warnings, not exceptions |
| 122 | + |
| 123 | +--- |
| 124 | + |
| 125 | +## CLI Reference |
| 126 | + |
| 127 | +```bash |
| 128 | +# Fetch live from ArcGIS Online (requires network access): |
| 129 | +python src/pisd_shape/pfisd_extract_shapefiles.py |
| 130 | + |
| 131 | +# Use a pre-downloaded local WebMap JSON (for offline/testing): |
| 132 | +python src/pisd_shape/pfisd_extract_shapefiles.py --local path/to/webmap.json |
| 133 | +python src/pisd_shape/pfisd_extract_shapefiles.py -l path/to/webmap.json |
| 134 | +``` |
| 135 | + |
| 136 | +There is currently **no `pyproject.toml` script entrypoint** for this module. Run it directly |
| 137 | +via `python` or add one under `[project.scripts]` if a CLI entrypoint is needed. |
| 138 | + |
| 139 | +--- |
| 140 | + |
| 141 | +## Code Style |
| 142 | + |
| 143 | +### General |
| 144 | +- **Python version:** 3.12+ (matches `pyproject.toml` `requires-python`) |
| 145 | +- **Line length:** 100 characters (matches `[tool.ruff]` config) |
| 146 | +- **Formatter/linter:** `ruff format` + `ruff check` with `E, F, I, UP, B, SIM` rules |
| 147 | +- **Type checker:** `mypy` — `disallow_untyped_defs = true`, `ignore_missing_imports = true` |
| 148 | +- **Function names:** `snake_case` |
| 149 | +- **Class names:** `PascalCase` (none currently exist in this module) |
| 150 | +- **Type hints:** required on all function signatures |
| 151 | + |
| 152 | +### Geometry helpers pattern |
| 153 | +```python |
| 154 | +def reproject_ring(ring: list[list[float]]) -> list[tuple[float, float]]: |
| 155 | + """Convert a list of [x, y] Web Mercator coords to (lon, lat) WGS84.""" |
| 156 | + return [transformer.transform(x, y) for x, y in ring] |
| 157 | +``` |
| 158 | + |
| 159 | +### Layer extraction pattern |
| 160 | +```python |
| 161 | +def extract_layer(layer_data: dict, layer_title: str) -> gpd.GeoDataFrame | None: |
| 162 | + """Return a GeoDataFrame for a single ESRI featureCollection layer, or None on failure.""" |
| 163 | + ... |
| 164 | + rows: list[dict] = [] |
| 165 | + skipped = 0 |
| 166 | + for feat in features: |
| 167 | + geom = ... # dispatch by geom_type |
| 168 | + if geom is None or geom.is_empty: |
| 169 | + skipped += 1 |
| 170 | + continue |
| 171 | + row = {"geometry": geom} |
| 172 | + row.update(attrs) |
| 173 | + rows.append(row) |
| 174 | + ... |
| 175 | + return gpd.GeoDataFrame(rows, crs="EPSG:4326") |
| 176 | +``` |
| 177 | + |
| 178 | +### Warning/error output convention |
| 179 | +- Use `print(f" [WARN] ...")` for recoverable geometry issues |
| 180 | +- Use `print(f" [INFO] ...")` for skipped feature counts |
| 181 | +- Use `print(f"[ERROR] ...")` + `sys.exit(1)` for fatal failures (bad URL, unreadable file) |
| 182 | +- Do **not** raise exceptions inside `extract_layer`; return `None` and let `main()` skip |
| 183 | + |
| 184 | +--- |
| 185 | + |
| 186 | +## Key Dependencies |
| 187 | + |
| 188 | +| Package | Role | |
| 189 | +|---------|------| |
| 190 | +| `requests` | Fetch WebMap JSON from ArcGIS Online | |
| 191 | +| `geopandas` | Build GeoDataFrames; write ESRI Shapefiles via `to_file()` | |
| 192 | +| `shapely` | `Polygon`, `MultiPolygon`, `Point` geometry objects | |
| 193 | +| `pyproj` | CRS transformation: EPSG:3857 (Web Mercator) → EPSG:4326 (WGS84) | |
| 194 | +| `fiona` | Shapefile I/O backend used by geopandas (indirect dependency) | |
| 195 | + |
| 196 | +These are **not** in `pyproject.toml` — they are expected to be installed in the project |
| 197 | +environment separately (e.g., `uv pip install geopandas shapely pyproj requests fiona`). |
| 198 | +If adding them to `pyproject.toml`, create an optional extras group (e.g., `[project.optional-dependencies] pisd = [...]`). |
| 199 | + |
| 200 | +--- |
| 201 | + |
| 202 | +## Testing |
| 203 | + |
| 204 | +There are currently **no tests** for `pisd_shape`. When adding them: |
| 205 | + |
| 206 | +- **Framework:** pytest (already configured in `pyproject.toml`) |
| 207 | +- **Test file:** `tests/test_pisd_shape.py` |
| 208 | +- **Hypothesis:** use for property-based geometry tests (ring winding, coordinate validity) |
| 209 | +- **Offline-first:** always use `--local` fixture JSON, never hit ArcGIS Online in CI |
| 210 | + |
| 211 | +### Testing patterns |
| 212 | + |
| 213 | +```python |
| 214 | +import json |
| 215 | +import pytest |
| 216 | +from pathlib import Path |
| 217 | +from src.pisd_shape.pfisd_extract_shapefiles import ( |
| 218 | + reproject_ring, |
| 219 | + esri_polygon_to_shapely, |
| 220 | + esri_point_to_shapely, |
| 221 | + extract_layer, |
| 222 | + safe_filename, |
| 223 | +) |
| 224 | + |
| 225 | +# Fixture: minimal WebMap JSON (inline, no network required) |
| 226 | +POINT_LAYER = { |
| 227 | + "layerDefinition": {"geometryType": "esriGeometryPoint"}, |
| 228 | + "featureSet": { |
| 229 | + "features": [ |
| 230 | + {"geometry": {"x": -10880000, "y": 3637000}, "attributes": {"NAME": "Pflugerville HS"}} |
| 231 | + ] |
| 232 | + }, |
| 233 | +} |
| 234 | + |
| 235 | +def test_reproject_ring_returns_lon_lat_tuples(): |
| 236 | + ring = [[-10880000, 3637000], [-10881000, 3637000], [-10881000, 3638000]] |
| 237 | + result = reproject_ring(ring) |
| 238 | + assert all(isinstance(pt, tuple) and len(pt) == 2 for pt in result) |
| 239 | + # WGS84 lon in Texas should be roughly -97 to -100 |
| 240 | + assert all(-102 < lon < -94 for lon, _ in result) |
| 241 | + |
| 242 | +@pytest.mark.parametrize("title,expected", [ |
| 243 | + ("Elementary Schools 2025-26", "Elementary_Schools_2025-26"), |
| 244 | + ("My Layer/Name!", "My_Layer_Name_"), |
| 245 | +]) |
| 246 | +def test_safe_filename(title, expected): |
| 247 | + assert safe_filename(title) == expected |
| 248 | + |
| 249 | +def test_extract_layer_returns_geodataframe_for_valid_points(): |
| 250 | + gdf = extract_layer(POINT_LAYER, "Test Layer") |
| 251 | + assert gdf is not None |
| 252 | + assert len(gdf) == 1 |
| 253 | + assert gdf.crs.to_epsg() == 4326 |
| 254 | + |
| 255 | +def test_extract_layer_returns_none_for_empty_features(): |
| 256 | + empty_layer = { |
| 257 | + "layerDefinition": {"geometryType": "esriGeometryPoint"}, |
| 258 | + "featureSet": {"features": []}, |
| 259 | + } |
| 260 | + assert extract_layer(empty_layer, "Empty") is None |
| 261 | +``` |
| 262 | + |
| 263 | +--- |
| 264 | + |
| 265 | +## Git Workflow |
| 266 | + |
| 267 | +- **Branch convention:** `claude/<slug>-<id>` (current: `claude/continue-work-uO5cO`) |
| 268 | +- **Commit style:** [Conventional Commits](https://www.conventionalcommits.org/) |
| 269 | + - `feat(pisd): add argparse --output-dir flag` |
| 270 | + - `fix(pisd): handle empty rings in esri_polygon_to_shapely` |
| 271 | + - `test(pisd): add offline layer extraction tests` |
| 272 | + - `chore(pisd): add geopandas to optional pisd extras in pyproject.toml` |
| 273 | +- **Push target:** `origin/claude/continue-work-uO5cO` |
| 274 | +- **PR target:** `main` |
| 275 | +- **CI checks that must pass:** `ruff check`, `ruff format --check`, `mypy src/`, `pytest` |
| 276 | + |
| 277 | +--- |
| 278 | + |
| 279 | +## Security |
| 280 | + |
| 281 | +- **No hardcoded credentials** — the ArcGIS WebMap is a public endpoint requiring no auth token |
| 282 | +- **No secrets in code** — if auth is ever added, use `pydantic-settings` with env vars |
| 283 | +- **URL validation** — `WEBMAP_URL` is a module-level constant; do not accept user-supplied URLs |
| 284 | + without validation in a future CLI expansion |
| 285 | +- **Local file input** — `--local` accepts arbitrary paths; if expanding, validate with `Path.resolve()` |
| 286 | + and check the file exists before `open()` |
| 287 | +- **No parameterized queries** — no database; not applicable |
| 288 | + |
| 289 | +--- |
| 290 | + |
| 291 | +## Definition of Done |
| 292 | + |
| 293 | +Before marking any change complete: |
| 294 | + |
| 295 | +- [ ] `uv run ruff check src/pisd_shape/` passes with no errors |
| 296 | +- [ ] `uv run ruff format src/pisd_shape/` produces no diff |
| 297 | +- [ ] `uv run mypy src/pisd_shape/` reports no errors |
| 298 | +- [ ] `uv run pytest tests/ -k pisd` passes (or skipped if no tests exist yet) |
| 299 | +- [ ] Geometry output projection is WGS84 (EPSG:4326) — verify with `gdf.crs` |
| 300 | +- [ ] `safe_filename()` truncates to ≤60 characters and replaces unsafe chars |
| 301 | +- [ ] `--local` flag works end-to-end with a saved WebMap JSON fixture |
| 302 | +- [ ] No live network calls in tests (mock `requests.get` or use `--local`) |
| 303 | +- [ ] Commit message follows conventional commits format |
| 304 | + |
| 305 | +--- |
| 306 | + |
| 307 | +## Tool Resolution Priority |
| 308 | + |
| 309 | +When looking up APIs or documentation: |
| 310 | + |
| 311 | +1. **Context7 MCP** (`resolve-library-id` + `get-library-docs`) — first stop for geopandas, |
| 312 | + shapely, pyproj, fiona, requests |
| 313 | +2. **GitHub MCP** — check `Abstract-Data/RyanData-Address-Utils` issues/PRs for known problems |
| 314 | +3. **Web search** — ArcGIS REST API docs, EPSG.io for CRS details |
| 315 | +4. **Read source** — check `src/pisd_shape/pfisd_extract_shapefiles.py` directly before guessing |
| 316 | + |
| 317 | +--- |
| 318 | + |
| 319 | +## Boundaries |
| 320 | + |
| 321 | +### ALWAYS DO |
| 322 | +- Reproject all output geometry to WGS84 (EPSG:4326) before writing shapefiles |
| 323 | +- Apply `.buffer(0)` to Shapely polygons to fix self-intersections from ESRI rings |
| 324 | +- Truncate GeoDataFrame column names to 10 characters before `gdf.to_file()` |
| 325 | +- Skip `None` or empty geometries with a `[WARN]` log rather than raising an exception |
| 326 | +- Use `OUTPUT_DIR.mkdir(parents=True, exist_ok=True)` before writing |
| 327 | +- Run `ruff check` and `mypy` before committing |
| 328 | + |
| 329 | +### ASK FIRST |
| 330 | +- Adding new CLI flags to `argparse` beyond `--local` |
| 331 | +- Adding a `pyproject.toml` script entrypoint for `pisd_shape` |
| 332 | +- Adding `pisd` optional dependencies to `pyproject.toml` |
| 333 | +- Changing the output directory from `src/pisd_shape/export/` to somewhere else |
| 334 | +- Modifying how ESRI winding order is handled (current simplified approach is intentional) |
| 335 | +- Adding geometry type support beyond Polygon and Point (e.g., Polyline) |
| 336 | +- Committing updated shapefiles in `export/` (large binary files — confirm with user first) |
| 337 | + |
| 338 | +### NEVER DO |
| 339 | +- Touch `src/ryandata_address_utils/` — completely separate package from `pisd_shape` |
| 340 | +- Make live HTTP requests to ArcGIS Online in automated tests |
| 341 | +- Remove the `--local` flag (required for offline/CI use) |
| 342 | +- Raise exceptions inside `extract_layer()` — return `None` and let `main()` handle it |
| 343 | +- Write output shapefiles outside `src/pisd_shape/export/` without explicit instruction |
| 344 | +- Hardcode auth tokens or API keys anywhere in source code |
| 345 | +- Force-push to `main` |
0 commit comments