Skip to content

Commit 9d8163f

Browse files
Merge pull request #748 from ElectionDataAnalysis/issue738-bar-charts
Issue738 bar charts
2 parents 903b022 + dc77140 commit 9d8163f

10 files changed

Lines changed: 159 additions & 173 deletions

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ If you are interested in contributing, or just staying updated on the progress o
3939
See [documentation directory](docs), which includes
4040
* for users
4141
* [Installation instructions](docs/Installation.md)
42-
* Instructions for a [sample dataloading session](docs/Sample_Dataloading_Session.md)
42+
* Instructions for a [sample dataloading session](docs/Sample_Session.md)
4343
* Detailed [User Guide](docs/User_Guide.md)
4444
* for developers
4545
* [Information about the code](docs/About_the_Code.md)
Lines changed: 29 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Sample Dataloading Session
22

3-
This document walks the reader through a simple example, from setting up project directories, through loading data and performing analyses. We assume that the package has been installed in an environment with all the necessary components (as described in [Installation.md](Installation.md). As an example, we will load the xml results file from Georgia in the repository at [tests/000_data_for_pytest/2020-General/Georgia/GA_detail_20201120_1237.xml](../tests/000_data_for_pytest/2020-General/Georgia/GA_detail_20201120_1237.xml).
3+
This document walks the reader through a simple example, from setting up project directories, through loading data and performing analyses. We assume that the package has been installed in an environment with all the necessary components (as described in [Installation.md](Installation.md)). As an example, we will load the xml results file from Georgia in the repository at [tests/000_data_for_pytest/2020-General/Georgia/GA_detail_20201120_1237.xml](../tests/000_data_for_pytest/2020-General/Georgia/GA_detail_20201120_1237.xml).
44

55
## Directory and File Structure
66
The package offers a fair amount of flexibility in the directory structures used. For this sample session, we assume the user will call the program from a working directory with the following structure and files:
@@ -34,7 +34,7 @@ password=
3434
You may wish to check that these postgresql credentials will work on your system via the command `psql -h localhost -p 5432 -U postgres postgres`. If this command fails, or if it prompts you for a password, you will need to find the correct connection parameters specific to your postgresql instance. (Note that the `dbname` parameter is arbitrary, and determines only the name of the postgresql database created by the package.)
3535

3636
### Contents of `GA_detail_20201120_1237.xml`
37-
Copy the file of the same name in the repository: 000_template.mungerGA_detail_20201120_1237.xml](../tests/000_data_for_pytest/2020-General/Georgia/GA_detail_20201120_1237.xml)
37+
Copy the file of the same name in the repository: [000_template.mungerGA_detail_20201120_1237.xml](../tests/000_data_for_pytest/2020-General/Georgia/GA_detail_20201120_1237.xml)
3838

3939
## Load Data
4040
```
@@ -62,7 +62,7 @@ After this command executes successfully, you will see a database in your postgr
6262
* a subdirectory `compare_to_Georgia_xxxx_xxxx` with the results of the comparison to the [reference results](../src/reference_results/Georgia.tsv)
6363
* a subdirectory `load_or_reload_all_xxxx_xxxx` with warnings from the data uploading. You may wish to look at the file `Georgia_jurisdiction_dictionary.txt.warnings`, which lists the contests present in the xml results file that were not recognized during processing. If these contests and their candidates had been added to the Georgia-specific information in the repository (see "Creating Jurisdiction files" below, or in the [User Guide](User_Guide.md)), these warnings would not appear.
6464

65-
## Exporting and analyzing data
65+
## Export and analyze data
6666
To pull data out, you will need to use the Analyzer class:
6767
```
6868
>>> an = ed.Analyzer()
@@ -98,13 +98,18 @@ The program (v.2.0.1 and higher) can also produce a string of data in the NIST C
9898
>>> an.export_nist_json_as_string("2020 General", "Georgia")
9999
```
100100

101-
### Scatter Plots
102-
To draw pictures automatically, you will need [`orca` installed on your system](https://github.com/plotly/orca). If `orca` is not installed, you can still pull the information necessary to make plots
101+
### Plots
102+
To draw pictures automatically, you will need [`orca` installed on your system](https://github.com/plotly/orca). Plots will be exported to the `reports_and_plots_directory` and may also appear in a browser window.
103103

104+
If `orca` is not installed, the system will not automatically create pictures. Note that in any case the routines return text information sufficient to create plots in any system you may wish to use.
105+
106+
#### Scatter Plots
104107
You can create scatter plots of results by county. For example, create a jpeg comparing Biden's vote totals to Trump's vote totals with:
105108
```
106109
>>> biden_v_trump = an.scatter("Georgia","2020 General","Candidate total","Joseph R. Biden","2020 General","Candidate total","Donald J. Trump",fig_type="jpeg")
107110
```
111+
![Biden vs. Trump scatter plot of total votes by county for Georgia, 2020 General Election](images/scatter_Joseph-R-Biden-2020-General-US-President-GA-_Donald-J-Trump-2020-General-US-President-GA.jpeg)
112+
108113
Or compare Biden's votes on election day with votes on absentee mail ballots:
109114
```
110115
>>> biden_eday_v_abs = an.scatter("Georgia","2020 General","Candidate election-day","Joseph R. Biden","2020 General","Candidate absentee-mail","Joseph R. Biden",fig_type="jpeg")
@@ -146,7 +151,24 @@ Use any category name in place of "Party absentee-mail" to see counts available
146151
147152
Categories starting with "Contest" give number of votes tallied in that contest in each county, lumping all candidates together. Categories starting with "Party" give number of votes tallied for members of that party in a particular contest type (e.g., "Libertarian congressional").
148153
149-
### Analysis
154+
#### Curated One-County Outlier Bar Charts
155+
156+
The system attempts to find interesting one-county outliers within the election results. The specific algorithm is described in [an article by Singer, Srungavarapu & Tsai in _MAA Focus_, Feb/March 2021, pp. 10-13](http://digitaleditions.walsworthprintgroup.com/publication/?m=7656&i=694516&p=10&ver=html5)
157+
158+
For example:
159+
```
160+
>>> outliers = an.bar("2020 General","Georgia",contest_type="congressional",fig_type="png")
161+
```
162+
This will export up to three bar charts (to `reports_and_plots_directory`) showing a pair of candidates for which the vote shares in one county, for some type of ballot, differs significantly from the vote shares in other counties in the same district. The output of the command above includes: a chart showing how Clarke County differs from other Georgia counties in the 9th US House District.
163+
164+
![Chart showing Pandy outperforming Clyde in Clarke County GA on early ballots, while results in all other counties favor Clyde](images/Andrew-Clyde-R-_Devin-Pandy-D-_early_US-House-GA-District-9.png)
165+
166+
The output variable `outliers` contains even more information, including the contest margin and an estimate of the votes at stake if the outlier were brought in line with the other counties.
167+
168+
Options for `contest_type` for this data set are: `congressional`, `state` (for statewide contests), `state-house` and `state-senate`.
169+
170+
171+
#### Difference-in-Difference Analysis
150172
The program offers difference-in-difference analysis where results are available by vote type, following [Herron's analysis of congressional contests](https://www.liebertpub.com/doi/full/10.1089/elj.2019.0544). The following code will create a tab-separated file `GA_diffs.tsv` in the working directory:
151173
```
152174
>>> (did_frame, missing) = an.diff_in_diff_dem_vs_rep("2020 General")
@@ -155,7 +177,7 @@ The program offers difference-in-difference analysis where results are available
155177
(Note: the variable `missing` is a list of diff-in-diff comparisons that failed.)
156178
157179
## Optional steps
158-
The sample session above uses the information already in the repository about Georgia and the particular results file. If you wish to create these files yourself from scratch, follow thses optional steps.
180+
The sample session above uses the information already in the repository about Georgia and the particular results file. If you wish to create these files yourself from scratch, follow these optional steps.
159181
160182
### Specify contests totals for data quality check
161183
The data loading process includes checking contest totals against reference totals in [src/reference_results/Georgia.tsv](../src/reference_results/Georgia.tsv). You may wish to add some reference totals to this file.

docs/Testing_Code_with_pytest.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -20,15 +20,15 @@ The dataloading tests rely on having some raw results data to load. And the resu
2020
+-- reports_and_plots
2121
+-- run_time.ini
2222
```
23-
The file `run_time.ini` can be the same as in the [Sample Dataloading Session](Sample_Dataloading_Session.md).
23+
The file `run_time.ini` can be the same as in the [Sample Dataloading Session](Sample_Session.md).
2424

2525
## Note on dataloading tests
2626
The tests in [test_dataloading_by_ej.py](../tests/dataloading_tests/test_dataloading_by_ej.py) will attempt to load all raw results files in `input_results` that are specified by some file in the [`ini_file_for_results` directory](../src/ini_files_for_results). You can check which jurisdictions had files loaded:
2727
* if the test is successful, look at the `compare_*` directories in the `reports_and_plots` directory.
2828
* if the test fails, look in the output from the test.
2929

3030
## Running the tests
31-
You will need pytest to be installed on your system (see [pytest installation instructions](https://docs.pytest.org/en/6.2.x/getting-started.html) if necessary). Commands are run from the shell
32-
* dataloading routines: `pytest ~/PycharmProjects/electiondata/tests/dataloading_tests`
33-
* jurisdiction prep routines: `pytest ~/PycharmProjects/electiondata/tests/jurisdiction_prepper_tests/`
34-
* analysis routines: `pytest ~/PycharmProjects/electiondata/tests/analyzer_tests/ `
31+
You will need pytest to be installed on your system (see [pytest installation instructions](https://docs.pytest.org/en/6.2.x/getting-started.html) if necessary). Commands are run from the shell, referencing the local path to the repository
32+
* dataloading routines: `pytest path/to/repo/tests/dataloading_tests`
33+
* jurisdiction prep routines: `pytest path/to/repo/tests/jurisdiction_prepper_tests/`
34+
* analysis routines: `pytest path/to/repo/tests/analyzer_tests/ `

0 commit comments

Comments
 (0)