Skip to content

Commit ac1aab3

Browse files
Issues/37 Add function for returning an iterator instead of sequence (#91)
* Added function for returning an iterator instead of a sequence. Updated unit tests * fix botched merge * Update CHANGELOG.md Co-authored-by: Chuck Daniels <chuck@developmentseed.org> --------- Co-authored-by: Chuck Daniels <chuck@developmentseed.org>
1 parent 9936bfe commit ac1aab3

13 files changed

Lines changed: 32626 additions & 10207 deletions

CHANGELOG.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,14 @@ adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
88

99
## [Unreleased]
1010

11+
### Added
12+
13+
- Add method `Query.results` for returning results as an iterator instead of sequence ([#37](https://github.com/nasa/python_cmr/issues/37))
14+
15+
### Changed
16+
17+
- Deprecate methods `Query.get` and `Query.get_all` in favor of the new `Query.results` method. These deprecated methods will likely be removed for the 1.0.0 release. ([#37](https://github.com/nasa/python_cmr/issues/37))
18+
1119
## [0.13.0]
1220

1321
### Added

cmr/queries.py

Lines changed: 59 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@
77
from datetime import date, datetime, timezone
88
from inspect import getmembers, ismethod
99
from re import search
10+
from typing import Iterator
11+
1012
from typing_extensions import (
1113
Any,
1214
List,
@@ -20,7 +22,7 @@
2022
Tuple,
2123
TypeAlias,
2224
Union,
23-
override,
25+
override, deprecated,
2426
)
2527
from urllib.parse import quote
2628

@@ -58,6 +60,7 @@ def __init__(self, route: str, mode: str = CMR_OPS):
5860
self.concept_id_chars: Set[str] = set()
5961
self.headers: MutableMapping[str, str] = {}
6062

63+
@deprecated("Use the 'results' method instead, but note that it produces an iterator.")
6164
def get(self, limit: int = 2000) -> Sequence[Any]:
6265
"""
6366
Get all results up to some limit, even if spanning multiple pages.
@@ -115,6 +118,7 @@ def hits(self) -> int:
115118

116119
return int(response.headers["CMR-Hits"])
117120

121+
@deprecated("Use the 'results' method instead, but note that it produces an iterator.")
118122
def get_all(self) -> Sequence[Any]:
119123
"""
120124
Returns all of the results for the query. This will call hits() first to determine how many
@@ -123,8 +127,61 @@ def get_all(self) -> Sequence[Any]:
123127
124128
:returns: query results as a list
125129
"""
130+
131+
return list(self.get(self.hits()))
132+
133+
def results(self, page_size: int = 2000) -> Iterator[Any]:
134+
"""
135+
Return an iterator (generator) of all results matching the query
136+
criteria.
137+
138+
Because a query may produce a large number of results (perhaps
139+
10s or 100s of thousands), such results are fetched using
140+
multiple CMR requests, each returning a "page" of results, as
141+
returning all results in a single request would be impractical.
142+
The size of each page (in terms of the number of results
143+
in a page) is controlled by the `page_size` parameter. A smaller
144+
page size means fewer items in memory (per page), requiring
145+
more CMR queries to fetch all results (if desired). Conversely,
146+
a larger page size means more items in memory (per page)
147+
and fewer CMR queries.
148+
149+
When the query is configured to use the `"json"` format, each
150+
element produced by the returned iterator is a element of the
151+
"feed.entry" array (see
152+
<https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#json>).
153+
In this case, the iterator may produce as many elements as there
154+
are results matching the query criteria.
155+
156+
For all other formats, each element produced by the returned
157+
iterator is an unparsed (text) page of results (i.e., the caller
158+
is responsible for parsing the page of results into individual
159+
elements). In this case, the iterator will produce only as many
160+
pages as required (based on `page_size`) to produce all results
161+
matching the query criteria.
162+
163+
:param page_size: maximum number of results per page (min 1,
164+
max 2000 [default]) requested from the CMR
165+
:returns: query results as an iterator (generator)
166+
"""
167+
168+
url = self._build_url()
169+
headers = dict(self.headers or {})
170+
params = {"page_size": min(max(1, page_size), 2000)}
171+
172+
while True:
173+
response = requests.get(url, headers=headers, params=params)
174+
response.raise_for_status()
175+
176+
if self._format == "json":
177+
yield from response.json()["feed"]["entry"]
178+
else:
179+
yield response.text
180+
181+
if not (cmr_search_after := response.headers.get("cmr-search-after")):
182+
break
126183

127-
return self.get(self.hits())
184+
headers["cmr-search-after"] = cmr_search_after
128185

129186
def parameters(self, **kwargs: Any) -> Self:
130187
"""

tests/fixtures/vcr_cassettes/MOD02QKM_2000.yaml renamed to tests/fixtures/vcr_cassettes/TestMultipleQueries.test_get.yaml

Lines changed: 4015 additions & 4013 deletions
Large diffs are not rendered by default.

tests/fixtures/vcr_cassettes/TELLUS_GRAC.yaml renamed to tests/fixtures/vcr_cassettes/TestMultipleQueries.test_get_all_less_than_2k.yaml

Lines changed: 26 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,11 @@ interactions:
88
- gzip, deflate
99
Connection:
1010
- keep-alive
11-
User-Agent:
12-
- python-requests/2.31.0
1311
method: GET
1412
uri: https://cmr.earthdata.nasa.gov/search/granules.json?short_name=TELLUS_GRAC_L3_JPL_RL06_LND_v04&page_size=0
1513
response:
1614
body:
17-
string: '{"feed":{"updated":"2023-08-14T17:02:36.801Z","id":"https://cmr.earthdata.nasa.gov:443/search/granules.json?short_name=TELLUS_GRAC_L3_JPL_RL06_LND_v04&page_size=0","title":"ECHO
15+
string: '{"feed":{"updated":"2024-09-24T21:02:25.663Z","id":"https://cmr.earthdata.nasa.gov:443/search/granules.json?short_name=TELLUS_GRAC_L3_JPL_RL06_LND_v04&page_size=0","title":"ECHO
1816
granule metadata","entry":[]}}'
1917
headers:
2018
Access-Control-Allow-Origin:
@@ -25,37 +23,41 @@ interactions:
2523
CMR-Hits:
2624
- '163'
2725
CMR-Request-Id:
28-
- 5855d714-8aff-4d0f-b4cc-e556f02ef96a
26+
- a22de087-6217-4845-b19f-734cad960bce
2927
CMR-Took:
30-
- '52'
28+
- '275'
3129
Connection:
3230
- keep-alive
31+
Content-MD5:
32+
- 4c37cbd504ace09da5a3997968626ea5
33+
Content-SHA1:
34+
- 5416d1b30f3c052f18b3d65dc42b5ce01d69739c
3335
Content-Type:
3436
- application/json;charset=utf-8
3537
Date:
36-
- Mon, 14 Aug 2023 17:02:36 GMT
38+
- Tue, 24 Sep 2024 21:02:25 GMT
3739
Server:
3840
- ServerTokens ProductOnly
3941
Strict-Transport-Security:
40-
- max-age=31536000
42+
- max-age=31536000; includeSubDomains; preload
4143
Transfer-Encoding:
4244
- chunked
4345
Vary:
4446
- Accept-Encoding, User-Agent
4547
Via:
46-
- 1.1 cc58556a6e846289f4d3105969536e4c.cloudfront.net (CloudFront)
48+
- 1.1 4208ca8c7c521bdbe71d5b0a82523074.cloudfront.net (CloudFront)
4749
X-Amz-Cf-Id:
48-
- qj9VuAc1JQu-rnMVDg3mGwstR-jGQA4rd7MKVRAEpXeTDbKZT5p5jg==
50+
- 5SXRX7_nObc-1MC-o1YvNCocGFg33JIsTWClbrYt375rj1K5YAqxpg==
4951
X-Amz-Cf-Pop:
50-
- SFO53-C1
52+
- LAX50-C1
5153
X-Cache:
5254
- Miss from cloudfront
5355
X-Content-Type-Options:
5456
- nosniff
5557
X-Frame-Options:
5658
- SAMEORIGIN
5759
X-Request-Id:
58-
- qj9VuAc1JQu-rnMVDg3mGwstR-jGQA4rd7MKVRAEpXeTDbKZT5p5jg==
60+
- 5SXRX7_nObc-1MC-o1YvNCocGFg33JIsTWClbrYt375rj1K5YAqxpg==
5961
X-XSS-Protection:
6062
- 1; mode=block
6163
content-length:
@@ -72,13 +74,11 @@ interactions:
7274
- gzip, deflate
7375
Connection:
7476
- keep-alive
75-
User-Agent:
76-
- python-requests/2.31.0
7777
method: GET
7878
uri: https://cmr.earthdata.nasa.gov/search/granules.json?short_name=TELLUS_GRAC_L3_JPL_RL06_LND_v04&page_size=163
7979
response:
8080
body:
81-
string: '{"feed":{"updated":"2023-08-14T17:02:40.416Z","id":"https://cmr.earthdata.nasa.gov:443/search/granules.json?short_name=TELLUS_GRAC_L3_JPL_RL06_LND_v04&page_size=163","title":"ECHO
81+
string: '{"feed":{"updated":"2024-09-24T21:02:26.149Z","id":"https://cmr.earthdata.nasa.gov:443/search/granules.json?short_name=TELLUS_GRAC_L3_JPL_RL06_LND_v04&page_size=163","title":"ECHO
8282
granule metadata","entry":[{"boxes":["-89.5 0.5 89.5 180","-89.5 -180 89.5
8383
-0.5"],"time_start":"2002-04-04T00:00:00.000Z","updated":"2023-04-17T15:27:21.022Z","dataset_id":"JPL
8484
TELLUS GRACE Level-3 Monthly Land Water-Equivalent-Thickness Surface Mass
@@ -2045,39 +2045,43 @@ interactions:
20452045
CMR-Hits:
20462046
- '163'
20472047
CMR-Request-Id:
2048-
- 60eb29b2-95e1-453c-8efe-6e59cf649eb5
2048+
- a2563e97-3528-4d5a-9849-4dd17e703065
20492049
CMR-Search-After:
20502050
- '["pocloud",1495497600000,2658328520]'
20512051
CMR-Took:
2052-
- '4959'
2052+
- '332'
20532053
Connection:
20542054
- keep-alive
2055+
Content-MD5:
2056+
- 5199240b27f047a4f32ada5932c96a1b
2057+
Content-SHA1:
2058+
- f74df0ad201ceea97aaae8228a500c0392625b11
20552059
Content-Type:
20562060
- application/json;charset=utf-8
20572061
Date:
2058-
- Mon, 14 Aug 2023 17:02:42 GMT
2062+
- Tue, 24 Sep 2024 21:02:26 GMT
20592063
Server:
20602064
- ServerTokens ProductOnly
20612065
Strict-Transport-Security:
2062-
- max-age=31536000
2066+
- max-age=31536000; includeSubDomains; preload
20632067
Transfer-Encoding:
20642068
- chunked
20652069
Vary:
20662070
- Accept-Encoding, User-Agent
20672071
Via:
2068-
- 1.1 44933b72098305e9c31fc50b2e6554a0.cloudfront.net (CloudFront)
2072+
- 1.1 924eb6575c2679d663c17bd1e792d09a.cloudfront.net (CloudFront)
20692073
X-Amz-Cf-Id:
2070-
- 9TJ3JRMGc6mUxKegR4f2HSLC_1Cfwei5QHZuicg_aLsWEJS3T6XCNg==
2074+
- dMoCuQG8wTJ2-UTtNbyJ-2SJPskyH6fndTldIhVmZodoXMVr1BugXg==
20712075
X-Amz-Cf-Pop:
2072-
- SFO53-C1
2076+
- LAX50-C1
20732077
X-Cache:
20742078
- Miss from cloudfront
20752079
X-Content-Type-Options:
20762080
- nosniff
20772081
X-Frame-Options:
20782082
- SAMEORIGIN
20792083
X-Request-Id:
2080-
- 9TJ3JRMGc6mUxKegR4f2HSLC_1Cfwei5QHZuicg_aLsWEJS3T6XCNg==
2084+
- dMoCuQG8wTJ2-UTtNbyJ-2SJPskyH6fndTldIhVmZodoXMVr1BugXg==
20812085
X-XSS-Protection:
20822086
- 1; mode=block
20832087
content-length:

0 commit comments

Comments
 (0)