You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<p>Determine whether each variable in the <code>paralympic_1500</code> dataset is numerical or categorical. For numerical variables, further classify them as continuous or discrete. For categorical variables, determine if the variable is ordinal.</p>
854
854
<hr>
855
-
<p>The numerical variables in the dataset are <code>year</code> (discrete), and <code>time_min</code> (continuous). The categorical variables are <code>city</code>, <code>country_of_games</code>, <code>division</code>, <code>type</code>, <code>name</code>, and <code>country_of_athlete</code>. The <code>time</code> variable is trickier to classify – we can think of it as numerical, but it is classified as categorical. The categorical classification is due to the colon <code>:</code> which separates the hours from the seconds. Sometimes the data dictionary (presented in <ahref="#tbl-paralympic-var-def" class="quarto-xref">Table <span>3.3</span></a>) isn’t sufficient for a complete analysis, and we need to go back to the data source and try to understand the data better before we can proceed with the analysis meaningfully.</p>
855
+
<p>The numerical variables in the dataset are <code>year</code> (discrete), and <code>time_min</code> (continuous). The categorical variables are <code>city</code>, <code>country_of_games</code>, <code>division</code>, <code>type</code>, <code>name</code>, and <code>country_of_athlete</code>. The <code>time</code> variable is trickier to classify – we can think of it as numerical, but it is classified as categorical. The categorical classification is due to the colon <code>:</code> which separates the minutes from the seconds. Sometimes the data dictionary (presented in <ahref="#tbl-paralympic-var-def" class="quarto-xref">Table <span>3.3</span></a>) isn’t sufficient for a complete analysis, and we need to go back to the data source and try to understand the data better before we can proceed with the analysis meaningfully.</p>
856
856
</div>
857
857
<p>Next, let’s try to get to know each variable a little bit better. For categorical variables, this involves figuring out what their levels are and how commonly represented they are in the data. <ahref="#fig-paralympic-cat" class="quarto-xref">Figure <span>3.1</span></a> shows the distributions of two of the categorical variables in this dataset. We can see that the United States has hosted the Games most often, but runners from Great Britain and Kenya have won the 1500m most often. There are a large number of countries who have had a single gold medal winner of the 1500m. Similarly, there are a large number of countries who have hosted the Games only once. Over the last century, the name describing the country for athletes from one particular region has changed and includes Russian Federation, Unified Team, and Russian Paralympic Committee. Both of the visualizations are bar plots, which you will learn more about in <ahref="explore-categorical.html" class="quarto-xref"><span>Chapter 4</span></a>.</p>
858
858
<p>Similarly, we can examine the distributions of the numerical variables as well. We already know that the 1500m times are mostly between 3.5min and 4.5min, based on <ahref="#tbl-paralympic-df-tail" class="quarto-xref">Table <span>3.1</span></a> and <ahref="#tbl-paralympic-df-head" class="quarto-xref">Table <span>3.2</span></a>. We can break down the 1500m time by division and type of race. <ahref="#tbl-paralympic-summary" class="quarto-xref">Table <span>3.4</span></a> shows the mean, minimum, and maximum 1500m times broken down by division and race type. Recall that the Men’s Olympic division has taken place since 1896, whereas the Men’s Paralympic division has happened only since 1960. The maximum race time, therefore, should be taken into context in terms of the year of the Games.</p>
<divid="fig-paralympic-cat-1" class="quarto-float quarto-figure quarto-figure-center anchored" alt="Two separate bar plots. The left panel shows a bar plot counting the number of gold medal athletes from each country. Great Britain has had 8 top finishers, Kenya has had 7 top finishers, and Tunisia and Algeria have both had 5. The right panel shows a bar plot counting the number of Games which have happened in each country. The USA has hosted 4 Games, the UK has hosted 3 Games, and each of Japan, Greece, Germany, France, and Australia have hosted the Games twice. ">
863
+
<divid="fig-paralympic-cat-1" class="quarto-float quarto-figure quarto-figure-center anchored" alt="Two separate bar plots. The left panel shows a bar plot counting the number of gold medal athletes from each country. Great Britain has had 8 top finishers, Kenya has had 7 top finishers, and Tunisia and Algeria have both had 5. The right panel shows a bar plot counting the number of Games which have happened in each country. The USA has hosted 4 Games, the UK has hosted 3 Games, and each of Japan, Greece, Germany, France, and Australia have hosted the Games twice. ">
<ahref="data-applications_files/figure-html/fig-paralympic-cat-1.png" class="lightbox" data-gallery="fig-paralympic-cat" title="Figure 3.1 (a): Country of origin of the athlete"><imgsrc="data-applications_files/figure-html/fig-paralympic-cat-1.png" class="img-fluid figure-img" style="width:90.0%" data-ref-parent="fig-paralympic-cat" alt="Two separate bar plots. The left panel shows a bar plot counting the number of gold medal athletes from each country. Great Britain has had 8 top finishers, Kenya has had 7 top finishers, and Tunisia and Algeria have both had 5. The right panel shows a bar plot counting the number of Games which have happened in each country. The USA has hosted 4 Games, the UK has hosted 3 Games, and each of Japan, Greece, Germany, France, and Australia have hosted the Games twice. "></a>
865
+
<ahref="data-applications_files/figure-html/fig-paralympic-cat-1.png" class="lightbox" data-gallery="fig-paralympic-cat" title="Figure 3.1 (a): Country of origin of the athlete"><imgsrc="data-applications_files/figure-html/fig-paralympic-cat-1.png" class="img-fluid figure-img" style="width:90.0%" data-ref-parent="fig-paralympic-cat" alt="Two separate bar plots. The left panel shows a bar plot counting the number of gold medal athletes from each country. Great Britain has had 8 top finishers, Kenya has had 7 top finishers, and Tunisia and Algeria have both had 5. The right panel shows a bar plot counting the number of Games which have happened in each country. The USA has hosted 4 Games, the UK has hosted 3 Games, and each of Japan, Greece, Germany, France, and Australia have hosted the Games twice. "></a>
<divid="fig-paralympic-cat-2" class="quarto-float quarto-figure quarto-figure-center anchored" alt="Two separate bar plots. The left panel shows a bar plot counting the number of gold medal athletes from each country. Great Britain has had 8 top finishers, Kenya has had 7 top finishers, and Tunisia and Algeria have both had 5. The right panel shows a bar plot counting the number of Games which have happened in each country. The USA has hosted 4 Games, the UK has hosted 3 Games, and each of Japan, Greece, Germany, France, and Australia have hosted the Games twice. ">
873
+
<divid="fig-paralympic-cat-2" class="quarto-float quarto-figure quarto-figure-center anchored" alt="Two separate bar plots. The left panel shows a bar plot counting the number of gold medal athletes from each country. Great Britain has had 8 top finishers, Kenya has had 7 top finishers, and Tunisia and Algeria have both had 5. The right panel shows a bar plot counting the number of Games which have happened in each country. The USA has hosted 4 Games, the UK has hosted 3 Games, and each of Japan, Greece, Germany, France, and Australia have hosted the Games twice. ">
<ahref="data-applications_files/figure-html/fig-paralympic-cat-2.png" class="lightbox" data-gallery="fig-paralympic-cat" title="Figure 3.1 (b): Country in which the Games gook place"><imgsrc="data-applications_files/figure-html/fig-paralympic-cat-2.png" class="img-fluid figure-img" style="width:90.0%" data-ref-parent="fig-paralympic-cat" alt="Two separate bar plots. The left panel shows a bar plot counting the number of gold medal athletes from each country. Great Britain has had 8 top finishers, Kenya has had 7 top finishers, and Tunisia and Algeria have both had 5. The right panel shows a bar plot counting the number of Games which have happened in each country. The USA has hosted 4 Games, the UK has hosted 3 Games, and each of Japan, Greece, Germany, France, and Australia have hosted the Games twice. "></a>
875
+
<ahref="data-applications_files/figure-html/fig-paralympic-cat-2.png" class="lightbox" data-gallery="fig-paralympic-cat" title="Figure 3.1 (b): Country in which the Games gook place"><imgsrc="data-applications_files/figure-html/fig-paralympic-cat-2.png" class="img-fluid figure-img" style="width:90.0%" data-ref-parent="fig-paralympic-cat" alt="Two separate bar plots. The left panel shows a bar plot counting the number of gold medal athletes from each country. Great Britain has had 8 top finishers, Kenya has had 7 top finishers, and Tunisia and Algeria have both had 5. The right panel shows a bar plot counting the number of Games which have happened in each country. The USA has hosted 4 Games, the UK has hosted 3 Games, and each of Japan, Greece, Germany, France, and Australia have hosted the Games twice. "></a>
<p>Let’s start by considering how the 1500m gold medal race times have changed over year. <ahref="#fig-paralympic-ungrouped" class="quarto-xref">Figure <span>3.3</span></a> shows a scatterplot describing 1500m race times and year for Men’s Olympic and Paralympic (T11) athletes with a line of best fit (to the entire dataset) superimposed (see <ahref="model-slr.html" class="quarto-xref"><span>Chapter 7</span></a> where we will present fitting a line to a scatterplot). Notice that the line of best fit shows a <em>positive</em> relationship between race time and year. That is, for later years, the predicted gold medal time is higher than in earlier years.</p>
984
984
<divclass="cell">
985
985
<divclass="cell-output-display">
986
-
<divid="fig-paralympic-ungrouped" class="quarto-float quarto-figure quarto-figure-center anchored" alt="A scatterplot with year on the x-axis and gold medal 1500m time on the y-axis. A line of best fit is drawn over the points. ">
986
+
<divid="fig-paralympic-ungrouped" class="quarto-float quarto-figure quarto-figure-center anchored" alt="A scatterplot with year on the x-axis and gold medal 1500m time on the y-axis. A line of best fit is drawn over the points. ">
<ahref="data-applications_files/figure-html/fig-paralympic-ungrouped-1.png" class="lightbox" data-gallery="quarto-lightbox-gallery-4" title="Figure 3.3: 1500m race time for Men’s Olympic and Paralympic (T11) athletes. The line represents a line of best fit to the entire dataset."><imgsrc="data-applications_files/figure-html/fig-paralympic-ungrouped-1.png" class="img-fluid figure-img" style="width:90.0%" alt="A scatterplot with year on the x-axis and gold medal 1500m time on the y-axis. A line of best fit is drawn over the points. "></a>
988
+
<ahref="data-applications_files/figure-html/fig-paralympic-ungrouped-1.png" class="lightbox" data-gallery="quarto-lightbox-gallery-4" title="Figure 3.3: 1500m race time for Men’s Olympic and Paralympic (T11) athletes. The line represents a line of best fit to the entire dataset."><imgsrc="data-applications_files/figure-html/fig-paralympic-ungrouped-1.png" class="img-fluid figure-img" style="width:90.0%" alt="A scatterplot with year on the x-axis and gold medal 1500m time on the y-axis. A line of best fit is drawn over the points. "></a>
<p>In this case study, we introduced you to the very first steps a data scientist takes when they start working with a new dataset. In the next few chapters, we will introduce exploratory data analysis, and you’ll learn more about the various types of data visualizations and summary statistics you can make to get to know your data better.</p>
1017
1017
<p>Before you move on, we encourage you to think about whether the following questions can be answered with this dataset, and if yes, how you might go about answering them? It’s okay if your answer is “I’m not sure”, we simply want to get your exploratory juices flowing to prime you for what’s to come!</p>
1018
1018
<oltype="1">
1019
-
<li>Has there every been a year when a visually impaired paralympic gold medal athlete beat the Olympic gold medal athlete?</li>
1020
-
<li>When comparing the paralympic and Olympic 1500m gold medal athletes, does Simpson’s paradox hold in the Women’s division?</li>
1019
+
<li>Has there ever been a year when a visually impaired Paralympic gold medal athlete beat the Olympic gold medal athlete?</li>
1020
+
<li>When comparing the Paralympic and Olympic 1500m gold medal athletes, does Simpson’s paradox hold in the Women’s division?</li>
1021
1021
<li>Is there a biological boundary which establishes a time under which no human could run 1500m?</li>
0 commit comments