You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+29-6Lines changed: 29 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -135,13 +135,36 @@ Like [selectors](#selectors), `keep-keys` and `drop-keys` are comma-separated li
135
135
```bash
136
136
lightbeam validate -c path/to/config.yaml
137
137
```
138
-
You may `validate` your JSONL before transmitting it. This checks that the payloads
139
-
1. are valid JSON
140
-
1. conform to the structure described in the Swagger documents for [resources](https://api.ed-fi.org/v5.3/api/metadata/data/v3/resourcess/swagger.json) and [descriptors](https://api.ed-fi.org/v5.3/api/metadata/data/v3/descriptors/swagger.json) fetched from your API
141
-
1. contain valid descriptor values (fetched from your API and/or from descriptor values in your JSONL files)
142
-
1. contain unique values for any natural key
138
+
You may `validate` your JSONL before transmitting it. Configuration for `validate` goes in its own section of `lightbeam.yaml`:
139
+
```yaml
140
+
validate:
141
+
methods:
142
+
- schema # checks that payloads conform to the Swagger definitions from the API
143
+
- descriptors # checks that descriptor values are either locally-defined or exist in the remote API
144
+
- uniqueness # checks that local payloads are unique by the required property values
145
+
- references # checks that references resolve, either locally or in the remote API
146
+
# or
147
+
# methods: "*"
148
+
```
149
+
Default `validate`.`methods` are `["schema", "descriptors", "uniqueness"]` (not `references`; see below). In addition to the above methods, `lighteam validate` will also (first) check that each payload is valid JSON.
150
+
151
+
The `references` `method` can be slow, as a separate `GET` request may be made to your API for each reference. (Therefore the validation method is disabled by default.) `lightbeam` tries to improve efficiency by:
152
+
* batching requests and sending several concurrently (based on `connection`.`pool_size` of `lightbeam.yaml`)
153
+
* caching responses and first checking the cache before making another (potentially identical) request
154
+
155
+
Even with these optimizations, checking `references` can easily take minutes for even relatively small amounts of data. Therefore `lightbeam.yaml` also accepts a further configuration option:
156
+
```yaml
157
+
validate:
158
+
references:
159
+
max_failures: 10 # stop testing after X failed payloads ("fail fast")
160
+
```
161
+
This is optional; if absent, references in every payload are checked, no matter how many fail.
162
+
163
+
**Note:** Reference validation efficiency may be improved by first `lightbeam fetch`ing certain resources to have a local copy. `lightbeam validate` checks local JSONL files to resolve references before trying the remote API, and `fetch` retrieves many records per `GET`, so total runtime can be faster in this scenario. The downsides include
164
+
* more data movement
165
+
* `fetch`ed data becoming stale over time
166
+
* needing to track which data is your own vs. was `fetch`ed (all the data must coexist in the `config.data_dir` to be discoverable by `lightbeam validate`)
143
167
144
-
This command will not find invalid reference errors, but is helpful for finding payloads that are invalid JSON, are missing required fields, or have other structural issues.
0 commit comments