diff --git a/_episodes/03-starting-with-data.md b/_episodes/03-starting-with-data.md index ecfa2f9fd..1fe993bd2 100644 --- a/_episodes/03-starting-with-data.md +++ b/_episodes/03-starting-with-data.md @@ -544,12 +544,12 @@ is much larger than the wave heights classified as 'windsea'. > 2. What happens when you group by two columns using the following syntax and > then calculate mean values? > - `grouped_data2 = waves_df.groupby(['Seastate', 'Quadrant'])` -> - `grouped_data2.mean()` +> - `grouped_data2.mean(numeric_only=True)` > 3. Summarize Temperature values for swell and windsea states in your data. > >> ## Solution >> 1. The most complete answer is `waves_df.groupby("Quadrant").count()["record_id"][["north", "west"]]` - note that we could use any column that has a value in every row - but given that `record_id` is our index for the dataset it makes sense to use that ->> 2. It groups by 2nd column _within_ the results of the 1st column, and then calculates the mean (n.b. depending on your version of python, you might need `grouped_data2.mean(numeric_only=True)`) +>> 2. It groups by 2nd column _within_ the results of the 1st column, and then calculates the mean (n.b. older versions of python might need `grouped_data2.mean()` without the `numeric_only=True` parameter) >> 3. >> >> ~~~ diff --git a/_episodes/04-data-types-and-format.md b/_episodes/04-data-types-and-format.md index 0d7079699..f7ce34896 100644 --- a/_episodes/04-data-types-and-format.md +++ b/_episodes/04-data-types-and-format.md @@ -374,7 +374,7 @@ dates.apply(datetime.datetime.strftime, args=("%a",)) {: .language-python} >## Watch out for tuples! -> _Tuples_ are data structure similar to a list, but are _immutable_. They are created using parentheses, with items separated by commas: +> _Tuples_ are a data structure similar to a list, but are _immutable_. They are created using parentheses, with items separated by commas: > `my_tuple = (1, 2, 3)` > However, putting parentheses around a single object does not make it a tuple! Creating a tuple of length 1 still needs a trailing comma. > Test these: `type(("a"))` and `type(("a",))`. diff --git a/_episodes/06-merging-data.md b/_episodes/06-merging-data.md index d2a06b971..a250aa6ef 100644 --- a/_episodes/06-merging-data.md +++ b/_episodes/06-merging-data.md @@ -127,9 +127,9 @@ new_output = pd.read_csv('data/out.csv', keep_default_na=False, na_values=[""]) >> # group by buoy_id, and output some summary statistics >> combined_data.groupby("buoy_id").describe() >> # write to csv ->> combined_data.to_csv("combined_wave_data.csv", index=False) +>> combined_data.to_csv("data/combined_wave_data.csv", index=False) >> # read in the csv ->> cwd = pd.read_csv("combined_wave_data.csv", keep_default_na=False, na_values=[""]) +>> cwd = pd.read_csv("data/combined_wave_data.csv", keep_default_na=False, na_values=[""]) >> # check the results are the same >> cwd.groupby("buoy_id").describe() >> ~~~ diff --git a/_episodes/07-pandas-matplotlib.md b/_episodes/07-pandas-matplotlib.md index 4faa0298d..77f380cdc 100644 --- a/_episodes/07-pandas-matplotlib.md +++ b/_episodes/07-pandas-matplotlib.md @@ -108,8 +108,8 @@ import matplotlib.pyplot as plt Now, let's read data and plot it! ~~~ -waves = pd.read_csv("data/waves.csv") -my_plot = waves.plot("Tpeak", "Wave Height", kind="scatter") +waves_df = pd.read_csv("data/waves.csv") +my_plot = waves_df.plot("Tpeak", "Wave Height", kind="scatter") plt.show() # not necessary in Jupyter Notebooks ~~~ {: .language-python} @@ -229,7 +229,7 @@ provide, offering a consistent environment to make publication-quality visualiza ~~~ fig, ax1 = plt.subplots() # prepare a matplotlib figure -waves.plot("Tpeak", "Wave Height", kind="scatter", ax=ax1) +waves_df.plot("Tpeak", "Wave Height", kind="scatter", ax=ax1) # Provide further adaptations with matplotlib: ax1.set_xlabel("Tpeak (highest energy wave periodicity; seconds)") @@ -271,6 +271,10 @@ plt.show() # not necessary in Jupyter Notebooks What about plotting after joining DataFrames? Let's plot the water depths at each of the buoys ~~~ +# reload the buoys data just in case we don't have it loaded still +buoys_df = pd.read_csv("data/buoy_data.csv") + + # water depth in the buoys dataframe is currently a string (it's suffixed by "m") so we need to fix that def fix_depth_string(i, depth): if type(depth) == str: @@ -317,11 +321,11 @@ Note that the return type of `.unique` is a Numpy ndarray, even though the colum > > > > ~~~ > > fig, ax1 = plt.subplots() -> > waves[waves["buoy_id"] == 16].plot("Tpeak", "Wave Height", kind="scatter", ax=ax1) +> > waves_df[waves_df["buoy_id"] == 16].plot("Tpeak", "Wave Height", kind="scatter", ax=ax1) > > ax1.set_xlabel("Highest energy wave period") > > ax1.tick_params(labelsize=16, pad=8) -> > ax1.set_xbound(0, waves[waves["buoy_id"] == 16].Tpeak.max()+1) -> > ax1.set_ybound(0, waves[waves["buoy_id"] == 16]["Wave Height"].max()+1) +> > ax1.set_xbound(0, waves_df[waves_df["buoy_id"] == 16].Tpeak.max()+1) +> > ax1.set_ybound(0, waves_df[waves_df["buoy_id"] == 16]["Wave Height"].max()+1) > > fig.suptitle('Scatter plot of wave height versus Tpeak for West Hebrides', fontsize=15) > > ~~~ > > {: .language-python} @@ -335,7 +339,7 @@ Note that the return type of `.unique` is a Numpy ndarray, even though the colum > > ## Answers > > > > ~~~ -> > data = waves.groupby("buoy_id").max("Wave Height") +> > data = waves_df.groupby("buoy_id").max("Wave Height") > > x = data["Temperature"] > > y = data["Wave Height"] > > fig, plot = plt.subplots() # although we're not using the `fig` variable, subplots returns 2 objects @@ -354,8 +358,8 @@ Note that the return type of `.unique` is a Numpy ndarray, even though the colum > > > > ~~~ > > fig, ax = plt.subplots() -> > wh = waves[waves["buoy_id"] == 16] -> > pb = waves[waves["buoy_id"] == 11] +> > wh = waves_df[waves_df["buoy_id"] == 16] +> > pb = waves_df[waves_df["buoy_id"] == 11] > > > > ax.scatter(wh["Tpeak"], wh["Wave Height"]) > > ax.scatter(pb["Tpeak"], pb["Wave Height"], marker="*") diff --git a/_episodes/08-geopandas.md b/_episodes/08-geopandas.md index 86c539748..b3770032d 100644 --- a/_episodes/08-geopandas.md +++ b/_episodes/08-geopandas.md @@ -252,7 +252,7 @@ We can even display the Cairngorms data directly over the Scotland plot, which v ~~~ scotland_plot = scotland.explore() -cairngorms.explore(map=scotland_plot, style_kwds={"fillColor":"lime"}) +cairngorms.explore(m=scotland_plot, style_kwds={"fillColor":"lime"}) ~~~ {: .language-python}