Skip to content

Commit 81dc0b5

Browse files
committed
update hydroshare wq example
1 parent 12320f3 commit 81dc0b5

1 file changed

Lines changed: 137 additions & 71 deletions

File tree

demos/hydroshare/USGS_dataretrieval_WaterSamples_Examples.ipynb

Lines changed: 137 additions & 71 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
"source": [
1111
"# USGS dataretrieval Python Package `get_qwdata()` Examples\n",
1212
"\n",
13-
"This notebook provides examples of using the Python dataretrieval package to retrieve water quality sample data for United States Geological Survey (USGS) monitoring sites. The dataretrieval package provides a collection of functions to get data from the USGS National Water Information System (NWIS) and other online sources of hydrology and water quality data, including the United States Environmental Protection Agency (USEPA)."
13+
"This notebook provides examples of using the Python dataretrieval package to retrieve water quality sample data for United States Geological Survey (USGS) monitoring sites. The dataretrieval package provides a collection of functions to get data from the USGS Samples database and other online sources of hydrology and water quality data, including the United States Environmental Protection Agency (USEPA)."
1414
]
1515
},
1616
{
@@ -60,7 +60,7 @@
6060
},
6161
"outputs": [],
6262
"source": [
63-
"from dataretrieval import nwis\n",
63+
"from dataretrieval import samples\n",
6464
"from IPython.display import display"
6565
]
6666
},
@@ -70,16 +70,119 @@
7070
"source": [
7171
"### Basic Usage\n",
7272
"\n",
73-
"The dataretrieval package has several functions that allow you to retrieve data from different web services. This examples uses the `get_qwdata()` function to retrieve water quality sample data for USGS monitoring sites from NWIS. The following arguments are supported:\n",
73+
"The dataretrieval package has several functions that allow you to retrieve data from different web services. This examples uses the `get_usgs_samples()` function to retrieve water quality sample data for USGS monitoring sites from Samples. The following arguments are supported:\n",
7474
"\n",
75-
"Arguments (Additional arguments, if supplied, will be used as query parameters)\n",
76-
"\n",
77-
"* **sites** (string or list of strings): A list of USGS site identifiers for which to retrieve data. If the qwdata parameter site_no is supplied, it will overwrite the sites parameter.\n",
78-
"* **parameterCd** (string or list of strings): A list of USGS parameter codes for which to retrieve data.\n",
79-
"* **start** (string): The beginning date for a period for which to retrieve data. If the qwdata parameter begin_date is supplied, it will overwrite the start parameter.\n",
80-
"* **end** (string): The ending date for a period for which to retrieve data. If the qwdata parameter end_date is supplied, it will overwrite the end parameter.\n",
81-
"* **datetime_index** (boolean): If True, create a datetime index\n",
82-
"* **wide_format** (boolean): If True, return data in wide format with multiple samples per row and one row per time."
75+
"* **ssl_check** : boolean, optional\n",
76+
" Check the SSL certificate.\n",
77+
"* **service** : string\n",
78+
" One of the available Samples services: \"results\", \"locations\", \"activities\",\n",
79+
" \"projects\", or \"organizations\". Defaults to \"results\".\n",
80+
"* **profile** : string\n",
81+
" One of the available profiles associated with a service. Options for each\n",
82+
" service are:\n",
83+
" results - \"fullphyschem\", \"basicphyschem\",\n",
84+
" \"fullbio\", \"basicbio\", \"narrow\",\n",
85+
" \"resultdetectionquantitationlimit\",\n",
86+
" \"labsampleprep\", \"count\"\n",
87+
" locations - \"site\", \"count\"\n",
88+
" activities - \"sampact\", \"actmetric\",\n",
89+
" \"actgroup\", \"count\"\n",
90+
" projects - \"project\", \"projectmonitoringlocationweight\"\n",
91+
" organizations - \"organization\", \"count\"\n",
92+
"* **activityMediaName** : string or list of strings, optional\n",
93+
" Name or code indicating environmental medium in which sample was taken.\n",
94+
" Check the `activityMediaName_lookup()` function in this module for all\n",
95+
" possible inputs.\n",
96+
" Example: \"Water\".\n",
97+
"* **activityStartDateLower** : string, optional\n",
98+
" The start date if using a date range. Takes the format YYYY-MM-DD.\n",
99+
" The logic is inclusive, i.e. it will also return results that\n",
100+
" match the date. If left as None, will pull all data on or before\n",
101+
" activityStartDateUpper, if populated.\n",
102+
"* **activityStartDateUpper** : string, optional\n",
103+
" The end date if using a date range. Takes the format YYYY-MM-DD.\n",
104+
" The logic is inclusive, i.e. it will also return results that\n",
105+
" match the date. If left as None, will pull all data after\n",
106+
" activityStartDateLower up to the most recent available results.\n",
107+
"* **activityTypeCode** : string or list of strings, optional\n",
108+
" Text code that describes type of field activity performed.\n",
109+
" Example: \"Sample-Routine, regular\".\n",
110+
"* **characteristicGroup** : string or list of strings, optional\n",
111+
" Characteristic group is a broad category of characteristics\n",
112+
" describing one or more results. Check the `characteristicGroup_lookup()`\n",
113+
" function in this module for all possible inputs.\n",
114+
" Example: \"Organics, PFAS\"\n",
115+
"* **characteristic** : string or list of strings, optional\n",
116+
" Characteristic is a specific category describing one or more results.\n",
117+
" Check the `characteristic_lookup()` function in this module for all\n",
118+
" possible inputs.\n",
119+
" Example: \"Suspended Sediment Discharge\"\n",
120+
"* **characteristicUserSupplied** : string or list of strings, optional\n",
121+
" A user supplied characteristic name describing one or more results.\n",
122+
"* **boundingBox**: list of four floats, optional\n",
123+
" Filters on the the associated monitoring location's point location\n",
124+
" by checking if it is located within the specified geographic area. \n",
125+
" The logic is inclusive, i.e. it will include locations that overlap\n",
126+
" with the edge of the bounding box. Values are separated by commas,\n",
127+
" expressed in decimal degrees, NAD83, and longitudes west of Greenwich\n",
128+
" are negative.\n",
129+
" The format is a string consisting of:\n",
130+
" - Western-most longitude\n",
131+
" - Southern-most latitude\n",
132+
" - Eastern-most longitude\n",
133+
" - Northern-most longitude \n",
134+
" Example: [-92.8,44.2,-88.9,46.0]\n",
135+
"* **countryFips** : string or list of strings, optional\n",
136+
" Example: \"US\" (United States)\n",
137+
"* **stateFips** : string or list of strings, optional\n",
138+
" Check the `stateFips_lookup()` function in this module for all\n",
139+
" possible inputs.\n",
140+
" Example: \"US:15\" (United States: Hawaii)\n",
141+
"* **countyFips** : string or list of strings, optional\n",
142+
" Check the `countyFips_lookup()` function in this module for all\n",
143+
" possible inputs.\n",
144+
" Example: \"US:15:001\" (United States: Hawaii, Hawaii County)\n",
145+
"* **siteTypeCode** : string or list of strings, optional\n",
146+
" An abbreviation for a certain site type. Check the `siteType_lookup()`\n",
147+
" function in this module for all possible inputs.\n",
148+
" Example: \"GW\" (Groundwater site)\n",
149+
"* **siteTypeName** : string or list of strings, optional\n",
150+
" A full name for a certain site type. Check the `siteType_lookup()`\n",
151+
" function in this module for all possible inputs.\n",
152+
" Example: \"Well\"\n",
153+
"* **usgsPCode** : string or list of strings, optional\n",
154+
" 5-digit number used in the US Geological Survey computerized\n",
155+
" data system, National Water Information System (NWIS), to\n",
156+
" uniquely identify a specific constituent. Check the \n",
157+
" `characteristic_lookup()` function in this module for all possible\n",
158+
" inputs.\n",
159+
" Example: \"00060\" (Discharge, cubic feet per second)\n",
160+
"* **hydrologicUnit** : string or list of strings, optional\n",
161+
" Max 12-digit number used to describe a hydrologic unit.\n",
162+
" Example: \"070900020502\"\n",
163+
"* **monitoringLocationIdentifier** : string or list of strings, optional\n",
164+
" A monitoring location identifier has two parts: the agency code\n",
165+
" and the location number, separated by a dash (-).\n",
166+
" Example: \"USGS-040851385\"\n",
167+
"* **organizationIdentifier** : string or list of strings, optional\n",
168+
" Designator used to uniquely identify a specific organization.\n",
169+
" Currently only accepting the organization \"USGS\".\n",
170+
"* **pointLocationLatitude** : float, optional\n",
171+
" Latitude for a point/radius query (decimal degrees). Must be used\n",
172+
" with pointLocationLongitude and pointLocationWithinMiles.\n",
173+
"* **pointLocationLongitude** : float, optional\n",
174+
" Longitude for a point/radius query (decimal degrees). Must be used\n",
175+
" with pointLocationLatitude and pointLocationWithinMiles.\n",
176+
"* **pointLocationWithinMiles** : float, optional\n",
177+
" Radius for a point/radius query. Must be used with\n",
178+
" pointLocationLatitude and pointLocationLongitude\n",
179+
"* **projectIdentifier** : string or list of strings, optional\n",
180+
" Designator used to uniquely identify a data collection project. Project\n",
181+
" identifiers are specific to an organization (e.g. USGS).\n",
182+
" Example: \"ZH003QW03\"\n",
183+
"* **recordIdentifierUserSupplied** : string or list of strings, optional\n",
184+
" Internal AQS record identifier that returns 1 entry. Only available\n",
185+
" for the \"results\" service."
83186
]
84187
},
85188
{
@@ -103,8 +206,8 @@
103206
},
104207
"outputs": [],
105208
"source": [
106-
"siteID = '10109000'\n",
107-
"wq_data = nwis.get_qwdata(sites=siteID)\n",
209+
"siteID = 'USGS-10109000'\n",
210+
"wq_data = samples.get_usgs_samples(monitoringLocationIdentifier=siteID)\n",
108211
"print('Retrieved data for ' + str(len(wq_data[0])) + ' samples.')"
109212
]
110213
},
@@ -114,7 +217,7 @@
114217
"source": [
115218
"### Interpreting the Result\n",
116219
"\n",
117-
"The result of calling the `get_qwdata()` function is an object that contains a Pandas data frame object and an associated metadata object. The Pandas data frame contains the water quality sample data for the requested site, and or observed variables and time frame.\n",
220+
"The result of calling the `get_usgs_samples()` function is an object that contains a Pandas data frame object and an associated metadata object. The Pandas data frame contains the water quality sample data for the requested site, and or observed variables and time frame.\n",
118221
"\n",
119222
"Once you've got the data frame, there's several useful things you can do to explore the data."
120223
]
@@ -127,7 +230,7 @@
127230
}
128231
},
129232
"source": [
130-
"Display the data frame as a table. The default data frame for this function is a wide, cross-tabulated table, with columns for each observed variable and a row for each sample date (wide_format=True)."
233+
"Display the data frame as a table. The default data frame for this function is a long, flat table, with a row for each observed variable at a given site and date/time."
131234
]
132235
},
133236
{
@@ -175,7 +278,7 @@
175278
"cell_type": "markdown",
176279
"metadata": {},
177280
"source": [
178-
"The other part of the result returned from the `get_qwdata()` function is a metadata object that contains information about the query that was executed to return the data. For example, you can access the URL that was assembled to retrieve the requested data from the USGS web service. The USGS web service responses contain a descriptive header that defines and can be helpful in interpreting the contents of the response."
281+
"The other part of the result returned from the `get_usgs_data()` function is a metadata object that contains information about the query that was executed to return the data. For example, you can access the URL that was assembled to retrieve the requested data from the USGS web service. The USGS web service responses contain a descriptive header that defines and can be helpful in interpreting the contents of the response."
179282
]
180283
},
181284
{
@@ -192,7 +295,7 @@
192295
},
193296
"outputs": [],
194297
"source": [
195-
"print('The query URL used to retrieve the data from NWIS was: ' + wq_data[1].url)"
298+
"print('The query URL used to retrieve the data from USGS Samples was: ' + wq_data[1].url)"
196299
]
197300
},
198301
{
@@ -218,27 +321,9 @@
218321
},
219322
"outputs": [],
220323
"source": [
221-
"site_ids = ['04024430', '04024000']\n",
324+
"site_ids = ['USGS-04024430', 'USGS-04024000']\n",
222325
"parameter_code = '00065'\n",
223-
"wq_multi_site = nwis.get_qwdata(sites=site_ids, parameterCd=parameter_code)\n",
224-
"print('Retrieved data for ' + str(len(wq_multi_site[0])) + ' samples.')\n",
225-
"display(wq_multi_site[0])"
226-
]
227-
},
228-
{
229-
"metadata": {},
230-
"cell_type": "markdown",
231-
"source": "The following example is the same as the previous example but with multi index turned off (multi_index=False)"
232-
},
233-
{
234-
"metadata": {},
235-
"cell_type": "code",
236-
"outputs": [],
237-
"execution_count": null,
238-
"source": [
239-
"site_ids = ['04024430', '04024000']\n",
240-
"parameter_code = '00065'\n",
241-
"wq_multi_site = nwis.get_qwdata(sites=site_ids, parameterCd=parameter_code, multi_index=False)\n",
326+
"wq_multi_site = samples.get_usgs_samples(monitoringLocationIdentifier=site_ids, usgsPCode=parameter_code)\n",
242327
"print('Retrieved data for ' + str(len(wq_multi_site[0])) + ' samples.')\n",
243328
"display(wq_multi_site[0])"
244329
]
@@ -251,7 +336,7 @@
251336
}
252337
},
253338
"source": [
254-
"#### Example 3: Retrieve water quality sample data for multiple sites, including a list of parameters, within a time period defined by start and end dates"
339+
"#### Example 3: Retrieve water quality sample data for multiple sites, including a list of parameters, within a time period defined by start date until present"
255340
]
256341
},
257342
{
@@ -268,44 +353,22 @@
268353
},
269354
"outputs": [],
270355
"source": [
271-
"site_ids = ['04024430', '04024000']\n",
356+
"site_ids = ['USGS-04024430', 'USGS-04024000']\n",
272357
"parameterCd = ['34247', '30234', '32104', '34220']\n",
273358
"startDate = '2012-01-01'\n",
274-
"endDate = ''\n",
275-
"wq_data2 = nwis.get_qwdata(sites=site_ids, parameterCd=parameterCd,\n",
276-
" start=startDate, end=endDate)\n",
359+
"wq_data2 = samples.get_usgs_samples(monitoringLocationIdentifier=site_ids, usgsPCode=parameterCd,\n",
360+
" activityStartDateLower=startDate)\n",
277361
"print('Retrieved data for ' + str(len(wq_multi_site[0])) + ' samples.')\n",
278362
"display(wq_data2[0])\n"
279363
]
280364
},
281-
{
282-
"metadata": {},
283-
"cell_type": "markdown",
284-
"source": "The following example is the same as the previous example but with multi index turned off (multi_index=False)"
285-
},
286-
{
287-
"metadata": {},
288-
"cell_type": "code",
289-
"outputs": [],
290-
"execution_count": null,
291-
"source": [
292-
"site_ids = ['04024430', '04024000']\n",
293-
"parameterCd = ['34247', '30234', '32104', '34220']\n",
294-
"startDate = '2012-01-01'\n",
295-
"endDate = ''\n",
296-
"wq_data2 = nwis.get_qwdata(sites=site_ids, parameterCd=parameterCd,\n",
297-
" start=startDate, end=endDate, multi_index=False)\n",
298-
"print('Retrieved data for ' + str(len(wq_multi_site[0])) + ' samples.')\n",
299-
"display(wq_data2[0])"
300-
]
301-
},
302365
{
303366
"cell_type": "markdown",
304367
"metadata": {},
305368
"source": [
306-
"#### Example 4: Retrieve water quality sample data for one site in serial format\n",
369+
"#### Example 4: Retrieve water quality sample data for one site and convert to a wide format\n",
307370
"\n",
308-
"Each row in the resulting table represents a single observation of a single parameters. Each sample may be analyzed for multiple parameters and so a single water quality sample can result in multiple rows in serial format."
371+
"Note that the USGS samples database returns multiple parameters in a \"long\" format: each row in the resulting table represents a single observation of a single parameters. Furthermore, every observation has 181 fields of metadata. However, if you wanted to place your water quality data into a \"wide\" format, where each column represents a water quality parameter code, the code below details one solution."
309372
]
310373
},
311374
{
@@ -314,16 +377,19 @@
314377
"metadata": {},
315378
"outputs": [],
316379
"source": [
317-
"siteID = '10109000'\n",
318-
"wq_data = nwis.get_qwdata(sites=siteID, wide_format=False)\n",
319-
"print('Retrieved data for ' + str(len(wq_data[0])) + ' sample results.')\n",
320-
"display(wq_data[0])"
380+
"siteID = 'USGS-10109000'\n",
381+
"wq_data,_ = samples.get_usgs_samples(monitoringLocationIdentifier=siteID)\n",
382+
"print('Retrieved data for ' + str(len(wq_data)) + ' sample results.')\n",
383+
"\n",
384+
"wq_data[\"characteristic_unit\"] = wq_data[\"Result_Characteristic\"] + \", \" + wq_data[\"Result_MeasureUnit\"]\n",
385+
"wq_data_wide = wq_data.pivot_table(index=['Location_Identifier', 'Activity_StartDate', 'Activity_StartTime'], columns=\"characteristic_unit\", values=\"Result_Measure\", aggfunc='first')\n",
386+
"display(wq_data_wide)\n"
321387
]
322388
}
323389
],
324390
"metadata": {
325391
"kernelspec": {
326-
"display_name": "Python 3 (ipykernel)",
392+
"display_name": "dr-test",
327393
"language": "python",
328394
"name": "python3"
329395
},
@@ -337,7 +403,7 @@
337403
"name": "python",
338404
"nbconvert_exporter": "python",
339405
"pygments_lexer": "ipython3",
340-
"version": "3.9.7"
406+
"version": "3.11.12"
341407
}
342408
},
343409
"nbformat": 4,

0 commit comments

Comments
 (0)