Adjusting data downloader and subscriber to use earthaccess login methods#191
Adjusting data downloader and subscriber to use earthaccess login methods#191jpl-btlunsfo wants to merge 2 commits into
Conversation
…moving old podaac_access methods
| # params[i] = ("token", token) | ||
| # #params['token'] = token | ||
| token = earthaccess.get_edl_token()["access_token"] | ||
| results = pa.get_search_results(params, args.verbose) |
There was a problem hiding this comment.
The 401 retry no longer refreshes anything — params still carries the stale token (Line 193 -237). Details below.
The commented-out loop was the key piece here. Since params is a list[tuple], the value CMR ultimately sees is the ("token", <token>) entry inside that list. Rebinding the local token variable alone does not update params, so the retry path still sends the same expired token and will likely hit another 401. The same pattern exists in podaac_data_downloader.py around lines 261–274 as well.
In addition, earthaccess.get_edl_token() appears to return an existing cached token in many cases, rather than guaranteeing a freshly minted one. The previous refresh_token() logic (delete_token + create_token) forced a refresh, and we'll still want that behavior to support automatic 401 retries.
To keep the retry flow working reliably, we could:
- re-authenticate (earthaccess.login(...))
- retrieve the refreshed token
- explicitly update the params list before retrying
for example:
except HTTPError as e:
if e.code == 401:
earthaccess.login(strategy="netrc")
token = earthaccess.get_edl_token()["access_token"]
params = [("token", token) if k == "token" else (k, v) for k, v in params]
results = pa.get_search_results(params, args.verbose)
else:
raise
Also worth noting: downstream calls such as get_cmr_collection_id() and create_citation_file() would still see the stale token after a downloader retry, since cmr_downloader currently only rebinds the local token variable. If we keep the refresh flow, it may help to propagate the updated token more consistently through the call path.
There was a problem hiding this comment.
Given the pretty common flow for login->get_edl_token (and ->update token in params), I moved this all into a different method in podaac_access rather than duplicate the code everywhere.
downstream calls such as get_cmr_collection_id() and create_citation_file()
Both of these call get_cmr_collections, so I worked the refresh_token flow into that method instead. I'm wondering if it would be optimal to move that same 401-check wrapping pa.get_search_results (when called in both _downloader and _subscriber) down into the search method itself?
There was a problem hiding this comment.
@viviant100 poking for another pass over the changes, re-requested review.
There was a problem hiding this comment.
@jpl-btlunsfo Yes, push it down. You've already done it for get_cmr_collections, so doing the same for get_search_results keeps the two CMR-query functions consistent and means callers never have to know about token-refresh semantics.
- moved token fetch back into podaac_access (still calling earthaccess), with a wrapper for updating param iterables - modified get_cmr_collections to allow refreshing token on 401 - added earthaccess login tests
earthaccesspackage, and adjusted python minimum requirementspodaac_data_{subscriber,downloader}.pyto use theearthaccessmodule's login methods and token fetching.podaac_accessmodule's old login and token methods, and the relevant pytest scripts.Should close #175, unless we want to make use of more of
earthaccess's search/download methods.