Skip to content

Commit ce30c37

Browse files
authored
Merge pull request #61 from edanalytics/feature/validate_references_selector
implementing validate references selector, behavior and remote switch, plus update docs
2 parents 57ab369 + 9b03e2f commit ce30c37

2 files changed

Lines changed: 74 additions & 17 deletions

File tree

README.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,19 @@ count:
6464
separator: ,
6565
fetch:
6666
page_size: 100
67+
validate:
68+
methods:
69+
- schema # checks that payloads conform to the Swagger definitions from the API
70+
- descriptors # checks that descriptor values are either locally-defined or exist in the remote API
71+
- uniqueness # checks that local payloads are unique by the required property values
72+
- references # checks that references resolve, either locally or in the remote API
73+
# or `methods: "*"`
74+
references:
75+
selector:
76+
- studentAssessments.studentReference
77+
- studentSchoolAssociations.schoolReference
78+
behavior: exclude # or `include`
79+
remote: False # default=True
6780
force_delete: True
6881
log_level: INFO
6982
show_stacktrace: True
@@ -94,6 +107,7 @@ show_stacktrace: True
94107
* (optional) Whether to `verify_ssl`. The default is `True`. Set to `False` when working with `localhost` APIs or to live dangerously.
95108
* (optional) for [`lightbeam count`](#count), optionally change the `separator` between `Records` and `Endpoint`. The default is a "tab" character.
96109
* (optional) for [`lightbeam fetch`](#fetch), optionally specify the number of records (`page_size`) to GET at a time. The default is 100, but if you're trying to extract lots of data from an API increase this to the largest allowed (which depends on the API, but is often 500 or even 5000).
110+
* (optional) for [`lightbeam validate`](#validate), optionally specify the list of validation `methods` to run (from `schema`, `descriptors`, `uniqueness`, and `references`). If validating `references`, specify a list of `selector`s to either `include` or `exclude` (`behavior`) when validating. Also optionally disable `remote` referece validation (enabled by default).
97111
* (optional) Skip the interactive confirmation prompt (for programmatic use) when using the [`delete`](#delete) command. The default is `False` (prompt).
98112
* (optional) Specify a `log_level` for output. Possible values are
99113
- `ERROR`: only output errors like missing required sources, invalid references, invalid [YAML configuration](#yaml-configuration), etc.
@@ -147,6 +161,12 @@ validate:
147161
- references # checks that references resolve, either locally or in the remote API
148162
# or
149163
# methods: "*"
164+
references:
165+
selector:
166+
- studentAssessments.studentReference
167+
- studentSchoolAssociations.schoolReference
168+
behavior: exclude # or `include`
169+
remote: False # default=True
150170
```
151171
Default `validate`.`methods` are `["schema", "descriptors", "uniqueness"]` (not `references`; see below). In addition to the above methods, `lighteam validate` will also (first) check that each payload is valid JSON.
152172

@@ -167,6 +187,8 @@ This is optional; if absent, references in every payload are checked, no matter
167187
* `fetch`ed data becoming stale over time
168188
* needing to track which data is your own vs. was `fetch`ed (all the data must coexist in the `config.data_dir` to be discoverable by `lightbeam validate`)
169189

190+
You may specify a `selector` list of the form `someEndpoint.path.to.someReference` to include or exclude (according to `behavior`) specific references from reference validation. You may also specity `remote: False` to only validate references against local data in your JSONL files.
191+
170192

171193
## `send`
172194
```bash
@@ -209,6 +231,17 @@ Running the `truncate` command will prompt you to type "yes" to confirm. This co
209231

210232
`truncate` is a convenience command which should be used sparingly, as it can generate large numbers of `deletes` records and cause performance issues when pulling from `deletes` endpoints. If you want to wipe an entire Ed-Fi ODS, a better approach may be to drop and recreate the database (and re-send Descriptors and other default resources as needed).
211233

234+
## `create`
235+
```bash
236+
lightbeam create -s students,schools,studentSchoolAssociations -c path/to/config.yaml
237+
```
238+
Creates a skeleton of an [`earthmover`](https://edanalytics.github.io/earthmover/) project for _creating_ JSONL Ed-Fi data which one can then `lightbeam send` to an Ed-Fi API. It uses the Ed-Fi API's [OpenAPI specification](https://spec.openapis.org/oas/latest.html) to determine the schema of the endpoints you select (`lightbeam create` should usually be used with the `-s` [selector](#selectors)). Then, in the current directory, it:
239+
* creates (if it doesn't already exist, otherwise adds/overwrites) a partial `earthmover.yml` configuration file with empty `sources` and `transformations` but `destinations` for each selected endpoint, plus comments indicating what column names and data types/values are required, and the required grain of the table (based on `isIdentity` flags in the OpenAPI definitions)
240+
* creates (or overwrites) `templates/*.jsont` for each selected endpoint, with skeleton Jinja-JSON that includes all the required fields (including nested ones), optional fields wrapped in conditionals, and comments with some of the property metadata from OpenAPI definitions, such as `type`, `description`, `isIdentity`, etc.
241+
242+
The purpose of `lightbeam create` is to save developers time if they want to use `earthmover` to create Ed-Fi-shaped data from other data sources. See the [`earthmover` documentation](https://edanalytics.github.io/earthmover/) for more information.
243+
244+
212245
## Other options
213246
See a help message with
214247
```bash
@@ -222,6 +255,14 @@ lightbeam -v
222255
lightbeam --version
223256
```
224257

258+
Override specific configurations in `lightbeam.yml` from the command-line using the `--set` flag
259+
```
260+
lightbeam fetch --set fetch.page_size 1000
261+
lightbeam validate --set log_level WARN show_stacktrace True
262+
lightbeam send --set connection.timeout 15
263+
```
264+
(`--set` must be followed by a set of key-value pairs.)
265+
225266
226267
# Features
227268
This tool includes several special features:

lightbeam/validate.py

Lines changed: 33 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,17 @@ def validate(self):
3939
if type(self.validation_methods)==str and (self.validation_methods=="*" or self.validation_methods.lower()=='all'):
4040
self.validation_methods = self.DEFAULT_VALIDATION_METHODS
4141
self.validation_methods.append("references")
42-
42+
self.validation_references_selector = self.lightbeam.config.get("validate",{}).get("references",{}).get("selector", [])
43+
for selector in self.validation_references_selector:
44+
if "." not in selector:
45+
self.logger.error(f"`config.validate.references.selector` {selector} is incorrectly formatted (should be `someEndpoint.someReference`, such as `studentSchoolAssociation.schoolReference`)")
46+
self.validation_references_behavior = self.lightbeam.config.get("validate",{}).get("references",{}).get("behavior", "exclude")
47+
if self.validation_references_behavior not in ["exclude", "include"]:
48+
self.logger.error(f"`config.validate.references.behavior` must be either `exclude` (default) or `include`)")
49+
self.validation_references_remote = self.lightbeam.config.get("validate",{}).get("references",{}).get("remote", True)
50+
if "references" in self.validation_methods and not self.validation_references_remote:
51+
self.logger.info(f"(references will only be validated against local data, since `config.validate.references.remote: False`)")
52+
4353
self.lightbeam.api.load_swagger_docs()
4454
self.logger.info(f"validating by methods {self.validation_methods}...")
4555
if "descriptors" in self.validation_methods:
@@ -305,7 +315,7 @@ async def do_validate_payload(self, endpoint, file_name, data, line_number):
305315
# check references values are valid
306316
if "references" in self.validation_methods and "Descriptor" not in endpoint: # Descriptors have no references
307317
self.lightbeam.api.do_oauth()
308-
error_message = self.has_invalid_references(payload, path="")
318+
error_message = self.has_invalid_references(endpoint, payload, path="")
309319
if error_message != "":
310320
self.log_validation_error(endpoint, file_name, line_number, "references", error_message)
311321

@@ -399,40 +409,46 @@ def has_invalid_descriptor_values(self, payload, path=""):
399409
return ""
400410

401411
# Validates descriptor values for a single payload (returns an error message or empty string)
402-
def has_invalid_references(self, payload, path=""):
412+
def has_invalid_references(self, endpoint, payload, path=""):
403413
for k in payload.keys():
404414
if isinstance(payload[k], dict) and not k.endswith("Reference"):
405-
value = self.has_invalid_references(payload[k], path+("." if path!="" else "")+k)
415+
value = self.has_invalid_references(endpoint, payload[k], path+("." if path!="" else "")+k)
406416
if value!="": return value
407417
elif isinstance(payload[k], list):
408418
for i in range(0, len(payload[k])):
409-
value = self.has_invalid_references(payload[k][i], path+("." if path!="" else "")+k+"["+str(i)+"]")
419+
value = self.has_invalid_references(endpoint, payload[k][i], path+("." if path!="" else "")+k+"["+str(i)+"]")
410420
if value!="": return value
411421
elif isinstance(payload[k], dict) and k.endswith("Reference"):
422+
check_this_reference = (
423+
(f"{endpoint}.{path}{k}" in self.validation_references_selector and self.validation_references_behavior=="include")
424+
or (f"{endpoint}.{path}{k}" not in self.validation_references_selector and self.validation_references_behavior=="exclude")
425+
)
426+
if not check_this_reference: continue
412427
is_valid_reference = False
413428
original_endpoint = self.resolve_reference_to_endpoint(k)
414429

430+
params = payload[k].copy()
431+
if "link" in params.keys(): del params["link"]
432+
415433
# this deals with the fact that an educationOrganizationReference may be to a school, LEA, etc.:
416434
endpoints_to_check = self.EDFI_GENERICS_TO_RESOURCES_MAPPING.get(original_endpoint, [original_endpoint])
417-
for endpoint in endpoints_to_check:
435+
for endpt in endpoints_to_check:
418436
# check if it's a local reference:
419-
if endpoint not in self.local_reference_cache.keys(): break
437+
if endpt not in self.local_reference_cache.keys(): break
420438
# construct cache_key for reference
421-
cache_key = self.get_cache_key(payload[k])
422-
if cache_key in self.local_reference_cache[endpoint]:
439+
cache_key = self.get_cache_key(params)
440+
if cache_key in self.local_reference_cache[endpt]:
423441
is_valid_reference = True
424442
break
425-
if not is_valid_reference: # not found in local data...
426-
for endpoint in endpoints_to_check:
443+
if not is_valid_reference and self.validation_references_remote: # not found in local data...
444+
for endpt in endpoints_to_check:
427445
# check if it's a remote reference:
428-
params = payload[k].copy()
429-
if "link" in params.keys(): del params["link"]
430-
value = self.remote_reference_exists(endpoint, params)
446+
value = self.remote_reference_exists(endpt, params)
431447
if value:
432448
is_valid_reference = True
433449
break
434-
if not is_valid_reference:
435-
return f"payload contains an invalid {k} " + (" (at "+path+"): " if path!="" else ": ") + json.dumps(params)
450+
if not is_valid_reference:
451+
return f"payload contains an invalid {k} " + (" (at "+path+"): " if path!="" else ": ") + json.dumps(params)
436452
return ""
437453

438454
@staticmethod
@@ -497,7 +513,7 @@ def remote_reference_exists(self, endpoint, params):
497513
else:
498514
pass # await asyncio.sleep(1)
499515
curr_token_version = int(str(self.lightbeam.token_version))
500-
elif status=='404':
516+
elif status=='404' or status=='400':
501517
return False
502518
elif status in ['200', '201']:
503519
# 200 response might still return zero matching records...

0 commit comments

Comments
 (0)