You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+14-7Lines changed: 14 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,30 +2,36 @@
2
2
3
3
4
4
# Overview
5
-
This repository hopes to provide reliable tools for consolidation and analysis of raw election results from the most reliable sources -- the election agencies themselves.
5
+
This repository provides tools for consolidation and analysis of raw election results from the most reliable sources -- the election agencies themselves.
6
6
* Consolidation: take as input election results files from a wide variety of sources and load the data into a relational database
7
-
* Export: create tab-separated flat export files of results sets rolled up to any desired intermediate geography (e.g., by county, or by congressional district)
8
-
* Analysis: provide a variety of analysis tools
9
-
* Visualization: provide a variety of visualization tools.
7
+
* Export: create consistent-format export files of results sets rolled up to any desired intermediate geography
8
+
* tabular (tab-separated text)
9
+
* xml (following NIST Election Results Reporting Common Data Format V2)
10
+
* json (following NIST Election Results Reporting Common Data Format V2)
11
+
* Analysis:
12
+
* Curates one-county outliers of interest
13
+
* Calculates difference-in-difference for reaults available by vote type
14
+
* Visualization:
15
+
* Scatter plots
16
+
* Bar charts
10
17
11
18
# Target Audience
12
19
This system is intended to be of use to candidates and campaigns, election officials, students of politics and elections, and anyone else who is interested in assembling and understanding election results.
13
20
14
21
# How to Contribute Code
15
-
Please contribute code that works in python 3.7, with the package versions specified in [requirements.txt](requirements.txt). We follow the [black](https://pypi.org/project/black/) format.
22
+
Please contribute code that works in python 3.9, with the package versions specified in [requirements.txt](requirements.txt). We follow the [black](https://pypi.org/project/black/) format.
16
23
17
24
# How to Help in Other Ways
18
25
If you have skills to contribute to building the system, we can definitely use your help:
19
26
* Creating visualizations
20
-
* Importing and exporting data via xml feeds
21
27
* Preparing for intake of specific states' results files
22
28
* Managing collection of data files in real time
23
29
* Writing documentation
24
30
* Merging other data sets of interest (e.g., demographics)
25
31
* Building our open source community
26
32
* What else? Let us know!
27
33
28
-
If you are a potential end user -- an election official, political scientist or campaign consultant, for instance -- we would love to talk with you about what you want to from this system.
34
+
If you are a potential end user -- an election official, political scientist or campaign consultant, for instance -- let us know what you want to from this system.
29
35
30
36
If you are interested in contributing, or just staying updated on the progress of this project, please [contact Stephanie Singer](http://symmetrysinger.com/index.php?id=contact).
31
37
@@ -45,6 +51,7 @@ Detailed instructions can be found [here](docs/User_Guide.md).
45
51
Funding provided October 2019 - September 2021 by the National Science Foundation
46
52
* Award #1936809, "EAGER: Data Science for Election Verification"
47
53
* Award #2027089, "RAPID: Election Result Anomaly Detection for 2020"
54
+
Data collection and consolidation for the 2020 US General Election funded in part by the Verified Voting Foundation.
Copy file name to clipboardExpand all lines: docs/User_Guide.md
+16-8Lines changed: 16 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -34,7 +34,7 @@ If the munger for the format of your results file doesn't already exist:
34
34
35
35
### \[format\]
36
36
There are two required format parameters: `file_type` and `count_location`.
37
-
The `file_type` parameter controls which function from the python `pandas` module reads the file contents. Related optional and required parameters must be given under the `[format]` header.
37
+
The `file_type` parameter controls which function from the python `pandas` module reads the file contents. Related optional and required parameters must be given under the `[format]` header. Acceptable values are 'flat_text', 'excel', 'xml', 'json-nested'. The `count_location` parameter indicates where the vote counts are to be found. For 'flat_text' or 'excel' file types, either `count_location=by_name:<list of names of columns containing vote counts>` or `count_location=by_number:<list of positions of columns containing vote counts`.
38
38
* 'flat_text': Any tab-, comma-, or other-separated table in a plain tabular text file.
39
39
* (required) a field delimiter `flat_text_delimiter` to be specified (usually `flat_text_delimiter=,` for csv or `flat_text_delimiter=tab` for .txt)
40
40
@@ -47,6 +47,10 @@ If the munger for the format of your results file doesn't already exist:
47
47
* (required if `count_location=by_name`) specify location of field names for count columns. with integer `count_field_name_row` (NB: top row not skipped is 0, next row is 1, etc.)
48
48
* (required):
49
49
* Either `all_rows=data` or designate row containing column names for the candidate, reporting unit, etc. with the `noncount_header_row` parameter. (NB: top row not skipped is 0, next row is 1, etc.)
50
+
51
+
* 'xml'
52
+
53
+
* 'json-nested'
50
54
51
55
Available if appropriate for any file type, under the `[format]` header:
52
56
* (required if any munging information needs to be read from the `<results>.ini` file) `constant_over_file`, a comma-separated list of elements to be read, e.g., `constant_over_file=CandidateContest,CountItemType`.
@@ -398,22 +402,24 @@ analyzer.export_election_to_tsv("tabular_results.tsv", "2020 General", "South Ca
398
402
399
403
This code will produce all South Carolina data from the 2018 general election, grouped by contest, county, and vote type (total, early, absentee, etc).
400
404
401
-
### NIST Common Data Format
402
-
This package also provides functionality to export the data to xml according to the [NIST election results reporting schema (Version 2)](https://github.com/usnistgov/ElectionResultsReporting/raw/version2/NIST_V2_election_results_reporting.xsd). This is as simple as identifying an election and jurisdiction of interest:
405
+
### NIST Common Data Format Export
406
+
This package provides functionality to export the data to xml or json according to the [NIST election results reporting schema (Version 2)](https://github.com/usnistgov/ElectionResultsReporting/raw/version2/NIST_V2_election_results_reporting.xsd).
407
+
408
+
This is as simple as identifying an election and jurisdiction of interest. For xml:
The output is a string, the contents of the json file.
416
-
Both of these can take an optional `major_subdivision` parameter to control the level to which results are rolled up. The default is to roll up to the subdivision type indicated in the [`000_major_subjurisdiction_type.txt file](../jurisdictions/000_major_subjurisdiction_types.txt).
422
+
The subdivision type for the roll-up is determined by the [`000_major_subjurisdiction_type.txt file](../jurisdictions/000_major_subjurisdiction_types.txt).
417
423
418
424
419
425
## Unload and reload data with `reload_juris_election()`
@@ -518,7 +524,9 @@ If there are hidden columns in an Excel file, you may need to omit the hidden co
518
524
### NIST Common Data Format imports
519
525
To import results from a file that is valid NIST V2 xml -- that can be formally validated against the [NIST election results reporting schema (Version 2)](https://github.com/usnistgov/ElectionResultsReporting/raw/version2/NIST_V2_election_results_reporting.xsd) -- use the file_type 'nist_v2_xml'
520
526
521
-
Some xml files (e.g., Ohio 2020 General) use the older Version 1 common data format. Our convention is that if the munger name contains "nist" and the file_type is xml, then the system will look for a namespace declaration.
527
+
Some xml files (e.g., Ohio 2020 General) use the older Version 1 common data format. For these files use the
528
+
529
+
Our convention is that if the munger name contains "nist" and the file_type is xml, then the system will look for a namespace declaration.
522
530
523
531
### Difference-in-Difference calculations
524
532
The system provides a way to calculate difference-in-difference statistics. For any particular election, `Analyzer.diff_in_diff_dem_vs_rep` produces a dataframe of values for any county with results by vote type, with Democratic or Republican candidates, and any comparable pair of contests both on some ballots in the county. Contests are considered "comparable" if their districts are of the same geographical district type -- e.g., both statewide, or both state-house, etc. The method also returns a list of jurisdictions for which vote counts were zero or missing.
results_note=revised by hand to disambiguate counties & towns with same name (Carroll, Grafton, Hillsborough, Sullivan). Also, candidate Andrew Olding. As of 8/27/2021, the electiondata code throws a (seemingly harmless) warning when processing this file ( /usr/local/lib/python3.9/site-packages/openpyxl/worksheet/header_footer.py:48: UserWarning: Cannot parse header or footer so it will be ignored
10
-
warn("""Cannot parse header or footer so it will be ignored"""))
0 commit comments