You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/Sample_Session.md
+29-7Lines changed: 29 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Sample Dataloading Session
2
2
3
-
This document walks the reader through a simple example, from setting up project directories, through loading data and performing analyses. We assume that the package has been installed in an environment with all the necessary components (as described in [Installation.md](Installation.md). As an example, we will load the xml results file from Georgia in the repository at [tests/000_data_for_pytest/2020-General/Georgia/GA_detail_20201120_1237.xml](../tests/000_data_for_pytest/2020-General/Georgia/GA_detail_20201120_1237.xml).
3
+
This document walks the reader through a simple example, from setting up project directories, through loading data and performing analyses. We assume that the package has been installed in an environment with all the necessary components (as described in [Installation.md](Installation.md)). As an example, we will load the xml results file from Georgia in the repository at [tests/000_data_for_pytest/2020-General/Georgia/GA_detail_20201120_1237.xml](../tests/000_data_for_pytest/2020-General/Georgia/GA_detail_20201120_1237.xml).
4
4
5
5
## Directory and File Structure
6
6
The package offers a fair amount of flexibility in the directory structures used. For this sample session, we assume the user will call the program from a working directory with the following structure and files:
@@ -34,7 +34,7 @@ password=
34
34
You may wish to check that these postgresql credentials will work on your system via the command `psql -h localhost -p 5432 -U postgres postgres`. If this command fails, or if it prompts you for a password, you will need to find the correct connection parameters specific to your postgresql instance. (Note that the `dbname` parameter is arbitrary, and determines only the name of the postgresql database created by the package.)
35
35
36
36
### Contents of `GA_detail_20201120_1237.xml`
37
-
Copy the file of the same name in the repository: 000_template.mungerGA_detail_20201120_1237.xml](../tests/000_data_for_pytest/2020-General/Georgia/GA_detail_20201120_1237.xml)
37
+
Copy the file of the same name in the repository: [000_template.mungerGA_detail_20201120_1237.xml](../tests/000_data_for_pytest/2020-General/Georgia/GA_detail_20201120_1237.xml)
38
38
39
39
## Load Data
40
40
```
@@ -62,7 +62,7 @@ After this command executes successfully, you will see a database in your postgr
62
62
* a subdirectory `compare_to_Georgia_xxxx_xxxx` with the results of the comparison to the [reference results](../src/reference_results/Georgia.tsv)
63
63
* a subdirectory `load_or_reload_all_xxxx_xxxx` with warnings from the data uploading. You may wish to look at the file `Georgia_jurisdiction_dictionary.txt.warnings`, which lists the contests present in the xml results file that were not recognized during processing. If these contests and their candidates had been added to the Georgia-specific information in the repository (see "Creating Jurisdiction files" below, or in the [User Guide](User_Guide.md)), these warnings would not appear.
64
64
65
-
## Exporting and analyzing data
65
+
## Export and analyze data
66
66
To pull data out, you will need to use the Analyzer class:
67
67
```
68
68
>>> an = ed.Analyzer()
@@ -98,13 +98,18 @@ The program (v.2.0.1 and higher) can also produce a string of data in the NIST C
To draw pictures automatically, you will need [`orca` installed on your system](https://github.com/plotly/orca). If `orca` is not installed, you can still pull the information necessary to make plots
101
+
### Plots
102
+
To draw pictures automatically, you will need [`orca` installed on your system](https://github.com/plotly/orca). Plots will be exported to the `reports_and_plots_directory` and may also appear in a browser window.
103
103
104
+
If `orca` is not installed, the system will not automatically create pictures. Note that in any case the routines return text information sufficient to create plots in any system you may wish to use.
105
+
106
+
#### Scatter Plots
104
107
You can create scatter plots of results by county. For example, create a jpeg comparing Biden's vote totals to Trump's vote totals with:
105
108
```
106
109
>>> biden_v_trump = an.scatter("Georgia","2020 General","Candidate total","Joseph R. Biden","2020 General","Candidate total","Donald J. Trump",fig_type="jpeg")
107
110
```
111
+

112
+
108
113
Or compare Biden's votes on election day with votes on absentee mail ballots:
109
114
```
110
115
>>> biden_eday_v_abs = an.scatter("Georgia","2020 General","Candidate election-day","Joseph R. Biden","2020 General","Candidate absentee-mail","Joseph R. Biden",fig_type="jpeg")
@@ -146,7 +151,24 @@ Use any category name in place of "Party absentee-mail" to see counts available
146
151
147
152
Categories starting with "Contest" give number of votes tallied in that contest in each county, lumping all candidates together. Categories starting with "Party" give number of votes tallied for members of that party in a particular contest type (e.g., "Libertarian congressional").
148
153
149
-
### Analysis
154
+
#### Curated One-County Outlier Bar Charts
155
+
156
+
The system attempts to find interesting one-county outliers within the election results. The specific algorithm is described in [an article by Singer, Srungavarapu & Tsai in _MAA Focus_, Feb/March 2021, pp. 10-13](http://digitaleditions.walsworthprintgroup.com/publication/?m=7656&i=694516&p=10&ver=html5)
This will export up to three bar charts (to `reports_and_plots_directory`) showing a pair of candidates for which the vote shares in one county, for some type of ballot, differs significantly from the vote shares in other counties in the same district. The output of the command above includes: a chart showing how Clarke County differs from other Georgia counties in the 9th US House District.
163
+
164
+

165
+
166
+
The output variable `outliers` contains even more information, including the contest margin and an estimate of the votes at stake if the outlier were brought in line with the other counties.
167
+
168
+
Options for `contest_type` for this data set are: `congressional`, `state` (for statewide contests), `state-house` and `state-senate`.
169
+
170
+
171
+
#### Difference-in-Difference Analysis
150
172
The program offers difference-in-difference analysis where results are available by vote type, following [Herron's analysis of congressional contests](https://www.liebertpub.com/doi/full/10.1089/elj.2019.0544). The following code will create a tab-separated file `GA_diffs.tsv` in the working directory:
@@ -155,7 +177,7 @@ The program offers difference-in-difference analysis where results are available
155
177
(Note: the variable `missing` is a list of diff-in-diff comparisons that failed.)
156
178
157
179
## Optional steps
158
-
The sample session above uses the information already in the repository about Georgia and the particular results file. If you wish to create these files yourself from scratch, follow thses optional steps.
180
+
The sample session above uses the information already in the repository about Georgia and the particular results file. If you wish to create these files yourself from scratch, follow these optional steps.
159
181
160
182
### Specify contests totals for data quality check
161
183
The data loading process includes checking contest totals against reference totals in [src/reference_results/Georgia.tsv](../src/reference_results/Georgia.tsv). You may wish to add some reference totals to this file.
Copy file name to clipboardExpand all lines: docs/Testing_Code_with_pytest.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,15 +20,15 @@ The dataloading tests rely on having some raw results data to load. And the resu
20
20
+-- reports_and_plots
21
21
+-- run_time.ini
22
22
```
23
-
The file `run_time.ini` can be the same as in the [Sample Dataloading Session](Sample_Dataloading_Session.md).
23
+
The file `run_time.ini` can be the same as in the [Sample Dataloading Session](Sample_Session.md).
24
24
25
25
## Note on dataloading tests
26
26
The tests in [test_dataloading_by_ej.py](../tests/dataloading_tests/test_dataloading_by_ej.py) will attempt to load all raw results files in `input_results` that are specified by some file in the [`ini_file_for_results` directory](../src/ini_files_for_results). You can check which jurisdictions had files loaded:
27
27
* if the test is successful, look at the `compare_*` directories in the `reports_and_plots` directory.
28
28
* if the test fails, look in the output from the test.
29
29
30
30
## Running the tests
31
-
You will need pytest to be installed on your system (see [pytest installation instructions](https://docs.pytest.org/en/6.2.x/getting-started.html) if necessary). Commands are run from the shell
You will need pytest to be installed on your system (see [pytest installation instructions](https://docs.pytest.org/en/6.2.x/getting-started.html) if necessary). Commands are run from the shell, referencing the local path to the repository
0 commit comments