Skip to content

Commit 00ae14b

Browse files
authored
Update analyze-baseball-stats-with-pandas-and-matplotlib.mdx
1 parent 59611b6 commit 00ae14b

1 file changed

Lines changed: 10 additions & 4 deletions

File tree

projects/analyze-baseball-stats-with-pandas-and-matplotlib/analyze-baseball-stats-with-pandas-and-matplotlib.mdx

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,11 @@ Download here: https://sabr.app.box.com/s/y1prhc795jk8zvmelfd3jq7tl389y6cd
4444

4545
In a moment, we'll introduce some questions that we want to answer, but for now, it can be helpful to click into a few files to get a sense of what data we're working with.
4646

47-
We'll primarily use the files **Batting.csv**, **People.csv**, and **Teams.csv**, but feel free to check out other files that might be interesting to you!
47+
We'll primarily use three files:
48+
49+
- **Batting.csv**
50+
- **People.csv**
51+
- **Teams.csv**
4852

4953
As a brief example, this is what the top of the **Batting.csv** file looks like:
5054

@@ -93,7 +97,9 @@ Alternatively, we could use `batting['playerID'].nunique()` to get the number im
9397

9498
## Filtering Out Inactive Players
9599

96-
It can be helpful to look through your data before jumping into any heavy analysis because there are often quirks to the data that can be hard to spot without subject matter expertise. For example, when we looked at `batting.head()`, the very first row showed the player `aardsda01` from the year `2004` had 0 at bats, 0 runs, 0 hits, 0 strike outs, and so on. It seems like this player was on the team, but never actually played in any games.
100+
It can be helpful to look through your data before jumping into any heavy analysis because there are often quirks to the data that can be hard to spot without subject matter expertise.
101+
102+
For example, when we looked at `batting.head()`, the very first row showed the player `aardsda01` from the year `2004` had 0 at bats, 0 runs, 0 hits, 0 strike outs, and so on. It seems like this player was on the team, but never actually played in any games.
97103

98104
If this is a common occurrence, then that might drastically alter some of these summary statistics. If there are a ton of players that are in the database but have `0`s for all their stats, then that will drag down all of the averages that we're looking at.
99105

@@ -193,7 +199,7 @@ First, let's find the total number of home runs per year. This will look very fa
193199
avg_hr_by_year = batting.groupby('yearID')['HR'].sum()
194200
```
195201

196-
We can now plot this using Matplotlib's `plot()` function. This function needs a list of X and Y values. In our case, we want the year to be on the X axis and the total number of home runs to be on the Y axis:
202+
We can now plot this using Matplotlib's `.plot()` function. This function needs a list of X and Y values. In our case, we want the year to be on the X axis and the total number of home runs to be on the Y axis:
197203

198204
```py
199205
import matplotlib.pyplot as plt
@@ -273,7 +279,7 @@ plt.ylabel('Home Runs')
273279
plt.legend()
274280

275281
plt.show()
276-
``
282+
```
277283

278284
As expected, the altitude in Denver has caused some pretty high home run numbers!
279285

0 commit comments

Comments
 (0)