Conversation
edfi_api_client/edfi_endpoint.py
Outdated
|
|
||
|
|
||
| def get(self, limit: Optional[int] = None, *, params: Optional[dict] = None, **kwargs) -> List[dict]: | ||
| def get(self, url: Optional[str] = None, limit: Optional[int] = None, *, params: Optional[dict] = None, **kwargs) -> requests.Response: |
There was a problem hiding this comment.
We should keep the same List[dict] output as the current code.
Could we instead allow the token argument to be passed into this method?
There was a problem hiding this comment.
Option 1: Call session.get_response() separately in get and get_pages. (decouple)
Option 2: Pass token as an optional argument to get. If passed, change output type from List[dict] to Union[List[dict], Tuple[List[dict], Optional[str]]].
There was a problem hiding this comment.
I went with option 1 in my most recent updates for more output predictability and consistency + break get_pages into get_pages_offset and get_pages_cursor. I do recognize that there's a bit of repetition between the 2 getters, but since cursor-paging has different API requirements, headers used, compatibility rules, and fallback behaviors than off-set paging, I feel like this decomposition is justified. If you have any code design suggestion, that'd be great!
edfi_api_client/edfi_endpoint.py
Outdated
|
|
||
|
|
||
| elif cursor_paging and partitioning: | ||
| ods_version = tuple(map(int, self.client.get_ods_version().split(".")[:2])) |
There was a problem hiding this comment.
We should move the version-check to the top of the method and fail immediately if any disallowed arguments are passed.
There was a problem hiding this comment.
I guess with the bifurcation of get_pages(), our notes about disallowed arguments don't apply for now? I did put some built-in checkpoints for get_pages_cursor so that:
- If ODS version is not 7.3, the paging method will switch to reverse-offset (Up to debate whether this should be the desired behavior or if it should error out completely)
- Since deletes and key changes are not supported by cursor paging, if the client tries to fetch them via get_pages_cursor() → throws warning and switch to reverse-offset (Again, up to debate if we should require more manual input from client if their requests are incompatible with the new feature)
edfi_api_client/edfi_endpoint.py
Outdated
| token = result.headers.get("Next-Page-Token") | ||
| return results | ||
|
|
||
| results = [element for sublist in Parallel(n_jobs=len(paged_tokens), backend="threading")(delayed(partitioning_with_token)(token) for token in paged_tokens) for element in sublist] |
There was a problem hiding this comment.
I'll need to dive deeper into this approach. There is a future where we enable async in the edfi_api_client and will need this method to be compatible.
jayckaiser
left a comment
There was a problem hiding this comment.
I've left a bunch of comments for things to change, most minor. I want to make sure that we:
- maintain backwards compatibility
- minimize code-hopping
I'd also request going through and making sure whitespace is kept the same between this branch and main, just to minimize the number of lines changed.
edfi_api_client/edfi_endpoint.py
Outdated
| paged_params['limit'] = limit | ||
|
|
||
| # Fall back to reverse-offset paging if incompatible with cursor paging | ||
| def _fall_back_to_pages_by_offset(): |
There was a problem hiding this comment.
I'd rather put all conditional logic on choosing the internal helper method into get_rows() instead of defining a fallback. The choice of pagination should be made and declared immediately.
edfi_api_client/edfi_endpoint.py
Outdated
|
|
||
| ### Prepare pagination variables | ||
| ### First request should not have any `page_token` and `page_size` defined | ||
| paged_params.init_page_by_token(page_token = None, page_size = None) |
There was a problem hiding this comment.
Page_size should be defined regardless, since we always set a default value.
edfi_api_client/edfi_endpoint.py
Outdated
|
|
||
| ### Prepare pagination variables | ||
| ### First request should not have any `page_token` and `page_size` defined | ||
| paged_params.init_page_by_token(page_token = None, page_size = None) |
There was a problem hiding this comment.
Let's consolidate 317 and 332 and put it at the start of the while-loop. We can define page_token = None above the loop for the first pass.
edfi_api_client/edfi_endpoint.py
Outdated
| """ | ||
| # Override init params if passed | ||
| paged_params = EdFiParams(params or self.params).copy() | ||
| logging.info(f"[Get {self.component}] Endpoint: {self.url}") |
There was a problem hiding this comment.
We can remove this logging statement.
jayckaiser
left a comment
There was a problem hiding this comment.
@gnguyen87 A couple last notes as I review this for release.
| self.page_size = None | ||
| self.change_version_step_size = None | ||
| self.page_token = None | ||
| self.number = None |
There was a problem hiding this comment.
Where is this number attribute used?
| while True: | ||
| logger.info(f"[Get {self.component}] Parameters: {paged_params}") | ||
|
|
||
| result = self.client.session.get_response(self.url, params=paged_params, **kwargs) |
There was a problem hiding this comment.
@gnguyen87 One last question before we approve and merge this to main: what happens on the first pull before we set page-size? Does the first page retrieve the API default limit of records, or are no records returned until pageSize is explicitly set? If the former, can the first request include the pageSize parameter for consistency?
No description provided.