Skip to content

Dataset Uploader: allow CSV → GeoJSON conversion with user-defined coordinate columns#224

Merged
rudokemper merged 12 commits intomainfrom
218/csv-geojson-dataset-uploader
Apr 6, 2026
Merged

Dataset Uploader: allow CSV → GeoJSON conversion with user-defined coordinate columns#224
rudokemper merged 12 commits intomainfrom
218/csv-geojson-dataset-uploader

Conversation

@rudokemper
Copy link
Copy Markdown
Member

@rudokemper rudokemper commented Mar 17, 2026

Goal

Closes #218. (And note that most the discussion describing the desired behavior is actually in #197.)

Screenshots

If tabular data (CSV, XLS) has been uploaded:
image

If lon/lat columns have been selected:
image

If spatial data has been uploaded, the toggle is disabled:
image

What I changed and why

GC Dataset Uploader

  • On step 2:
    • In (2_upload_and_convert_file.inline_script.py), tracking column_names to be able to feed those into the Select fields on step 3.
  • On step 3:
    • Add a lonLatToggle that is disabled if outputFormat !== "csv" (Note: all tabular data is converted "down" to CSV on step 2)
    • longitudeCol and latitudeCol Select fields allow selection from column_names. Both make a heuristic guess to set the first value to be ones that start with lon and lat, respectively.
  • On step 4:
    • If state.longitudeCol and state.LatitudeCol are set, then we do an optional conversion to GeoJSON in 4_apply_transformation_and_write_to_database_inline_script.py
    • This leverages work from Decouple input from output in data conversion; support CSV to GeoJSON #217 to use to_geojson() from geo_utils.py common logic module.
    • (Minor thing here -- I moved main() to the top, as per the repo-wide convention. This makes diffing kind of ugly, sorry.)
  • Validations across the app to check the desired behavior.
  • Adjustments to the app UI to make room for this new behavior.

data_conversion.py

In #217 I had this code expect a coord_col String, with the expectation that the dataset uploader app would construct a [lon, lat] string upstream. When working on this I decided that didn't make sense. (For one, the validation logic required to inspect a [lon, lat] format is way too complex.). So I switched to expect longitude and latitude arguments.

Note to the reviewer

Per the app README.md,

The app.yaml file generated by Windmill is highly impractical to review directly. Instead, it is recommended to sync the app to a Windmill instance, and review the code in the Windmill App builder UI.

This is absolutely the case for this PR. Also, +484 out of the +845 line changes are in app.yaml! The only code I would actually review are:

  • 2_upload_and_convert_file.inline_script.py (+18 / -9)
  • 4_apply_transformation_and_write_to_database.inline_script.py (+193 / -144 (but includes moving main()))
  • data_conversion.py (+47 / -31) (and tests, +39 / -30)

Otherwise, I recommend test-driving the app on demo. I am happy to jump on a call to walk through how this works, or discuss feedback instead of through comments or code review here on Github!

I am requesting @nicopace to have a look as you will most likely work the most with this functionality, either yourself and/or with users; and generate documentation.

LLM use disclosure

None

@rudokemper rudokemper requested a review from nicopace March 17, 2026 01:37
@nicopace
Copy link
Copy Markdown
Contributor

nicopace commented Mar 26, 2026

I've tested the UI.
I've tried uploading files that were not csv, renamed a geojson as csv (didn't allowed to carry on with the import, as i was expecting).
I noticed that it doesn't do any check on the values of the columns once you selected the columns. An improvement could be to do a sample of the columns to make sure they resemble a lat lon.
will send the code review now.

Copy link
Copy Markdown
Contributor

@nicopace nicopace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished reviewing, all good. Thanks!!

@rudokemper rudokemper merged commit d6d2ecd into main Apr 6, 2026
1 check passed
@rudokemper rudokemper deleted the 218/csv-geojson-dataset-uploader branch April 6, 2026 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dataset Uploader: allow CSV → GeoJSON conversion with user-defined coordinate columns

2 participants