Skip to content

Commit 83a8943

Browse files
Merge pull request #725 from ElectionDataAnalysis/issue723-no-NIST-v1
Issue723 no nist v1
2 parents 2596f88 + 548c724 commit 83a8943

20 files changed

Lines changed: 259 additions & 87 deletions

README.md

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,30 +2,36 @@
22

33

44
# Overview
5-
This repository hopes to provide reliable tools for consolidation and analysis of raw election results from the most reliable sources -- the election agencies themselves.
5+
This repository provides tools for consolidation and analysis of raw election results from the most reliable sources -- the election agencies themselves.
66
* Consolidation: take as input election results files from a wide variety of sources and load the data into a relational database
7-
* Export: create tab-separated flat export files of results sets rolled up to any desired intermediate geography (e.g., by county, or by congressional district)
8-
* Analysis: provide a variety of analysis tools
9-
* Visualization: provide a variety of visualization tools.
7+
* Export: create consistent-format export files of results sets rolled up to any desired intermediate geography
8+
* tabular (tab-separated text)
9+
* xml (following NIST Election Results Reporting Common Data Format V2)
10+
* json (following NIST Election Results Reporting Common Data Format V2)
11+
* Analysis:
12+
* Curates one-county outliers of interest
13+
* Calculates difference-in-difference for reaults available by vote type
14+
* Visualization:
15+
* Scatter plots
16+
* Bar charts
1017

1118
# Target Audience
1219
This system is intended to be of use to candidates and campaigns, election officials, students of politics and elections, and anyone else who is interested in assembling and understanding election results.
1320

1421
# How to Contribute Code
15-
Please contribute code that works in python 3.7, with the package versions specified in [requirements.txt](requirements.txt). We follow the [black](https://pypi.org/project/black/) format.
22+
Please contribute code that works in python 3.9, with the package versions specified in [requirements.txt](requirements.txt). We follow the [black](https://pypi.org/project/black/) format.
1623

1724
# How to Help in Other Ways
1825
If you have skills to contribute to building the system, we can definitely use your help:
1926
* Creating visualizations
20-
* Importing and exporting data via xml feeds
2127
* Preparing for intake of specific states' results files
2228
* Managing collection of data files in real time
2329
* Writing documentation
2430
* Merging other data sets of interest (e.g., demographics)
2531
* Building our open source community
2632
* What else? Let us know!
2733

28-
If you are a potential end user -- an election official, political scientist or campaign consultant, for instance -- we would love to talk with you about what you want to from this system.
34+
If you are a potential end user -- an election official, political scientist or campaign consultant, for instance -- let us know what you want to from this system.
2935

3036
If you are interested in contributing, or just staying updated on the progress of this project, please [contact Stephanie Singer](http://symmetrysinger.com/index.php?id=contact).
3137

@@ -45,6 +51,7 @@ Detailed instructions can be found [here](docs/User_Guide.md).
4551
Funding provided October 2019 - September 2021 by the National Science Foundation
4652
* Award #1936809, "EAGER: Data Science for Election Verification"
4753
* Award #2027089, "RAPID: Election Result Anomaly Detection for 2020"
54+
Data collection and consolidation for the 2020 US General Election funded in part by the Verified Voting Foundation.
4855

4956
# License
5057
See [LICENSE.md](./LICENSE.md)

docs/User_Guide.md

Lines changed: 16 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ If the munger for the format of your results file doesn't already exist:
3434

3535
### \[format\]
3636
There are two required format parameters: `file_type` and `count_location`.
37-
The `file_type` parameter controls which function from the python `pandas` module reads the file contents. Related optional and required parameters must be given under the `[format]` header.
37+
The `file_type` parameter controls which function from the python `pandas` module reads the file contents. Related optional and required parameters must be given under the `[format]` header. Acceptable values are 'flat_text', 'excel', 'xml', 'json-nested'. The `count_location` parameter indicates where the vote counts are to be found. For 'flat_text' or 'excel' file types, either `count_location=by_name:<list of names of columns containing vote counts>` or `count_location=by_number:<list of positions of columns containing vote counts`.
3838
* 'flat_text': Any tab-, comma-, or other-separated table in a plain tabular text file.
3939
* (required) a field delimiter `flat_text_delimiter` to be specified (usually `flat_text_delimiter=,` for csv or `flat_text_delimiter=tab` for .txt)
4040

@@ -47,6 +47,10 @@ If the munger for the format of your results file doesn't already exist:
4747
* (required if `count_location=by_name`) specify location of field names for count columns. with integer `count_field_name_row` (NB: top row not skipped is 0, next row is 1, etc.)
4848
* (required):
4949
* Either `all_rows=data` or designate row containing column names for the candidate, reporting unit, etc. with the `noncount_header_row` parameter. (NB: top row not skipped is 0, next row is 1, etc.)
50+
51+
* 'xml'
52+
53+
* 'json-nested'
5054

5155
Available if appropriate for any file type, under the `[format]` header:
5256
* (required if any munging information needs to be read from the `<results>.ini` file) `constant_over_file`, a comma-separated list of elements to be read, e.g., `constant_over_file=CandidateContest,CountItemType`.
@@ -398,22 +402,24 @@ analyzer.export_election_to_tsv("tabular_results.tsv", "2020 General", "South Ca
398402

399403
This code will produce all South Carolina data from the 2018 general election, grouped by contest, county, and vote type (total, early, absentee, etc).
400404

401-
### NIST Common Data Format
402-
This package also provides functionality to export the data to xml according to the [NIST election results reporting schema (Version 2)](https://github.com/usnistgov/ElectionResultsReporting/raw/version2/NIST_V2_election_results_reporting.xsd). This is as simple as identifying an election and jurisdiction of interest:
405+
### NIST Common Data Format Export
406+
This package provides functionality to export the data to xml or json according to the [NIST election results reporting schema (Version 2)](https://github.com/usnistgov/ElectionResultsReporting/raw/version2/NIST_V2_election_results_reporting.xsd).
407+
408+
This is as simple as identifying an election and jurisdiction of interest. For xml:
403409
```
404410
import electiondata as ea
405411
analyzer = ea.Analyzer()
406-
election_report = analyzer.export_nist_v2("2020 General", "Georgia")
412+
election_report = analyzer.export_nist_xml_as_string("2020 General", "Georgia")
407413
```
408414
The output is a string, the contents of the xml file.
409415

410-
There is also an export in the NIST V1 json format:
416+
And for json:
411417
```
412418
analyzer = ea.Analyzer()
413-
analyzer.export_nist_v1_json("2020 General","Georgia")
419+
analyzer.export_nist_json_as_string("2020 General","Georgia")
414420
```
415421
The output is a string, the contents of the json file.
416-
Both of these can take an optional `major_subdivision` parameter to control the level to which results are rolled up. The default is to roll up to the subdivision type indicated in the [`000_major_subjurisdiction_type.txt file](../jurisdictions/000_major_subjurisdiction_types.txt).
422+
The subdivision type for the roll-up is determined by the [`000_major_subjurisdiction_type.txt file](../jurisdictions/000_major_subjurisdiction_types.txt).
417423

418424

419425
## Unload and reload data with `reload_juris_election()`
@@ -518,7 +524,9 @@ If there are hidden columns in an Excel file, you may need to omit the hidden co
518524
### NIST Common Data Format imports
519525
To import results from a file that is valid NIST V2 xml -- that can be formally validated against the [NIST election results reporting schema (Version 2)](https://github.com/usnistgov/ElectionResultsReporting/raw/version2/NIST_V2_election_results_reporting.xsd) -- use the file_type 'nist_v2_xml'
520526

521-
Some xml files (e.g., Ohio 2020 General) use the older Version 1 common data format. Our convention is that if the munger name contains "nist" and the file_type is xml, then the system will look for a namespace declaration.
527+
Some xml files (e.g., Ohio 2020 General) use the older Version 1 common data format. For these files use the
528+
529+
Our convention is that if the munger name contains "nist" and the file_type is xml, then the system will look for a namespace declaration.
522530

523531
### Difference-in-Difference calculations
524532
The system provides a way to calculate difference-in-difference statistics. For any particular election, `Analyzer.diff_in_diff_dem_vs_rep` produces a dataframe of values for any county with results by vote type, with Democratic or Republican candidates, and any comparable pair of contests both on some ballots in the county. Contests are considered "comparable" if their districts are of the same geographical district type -- e.g., both statewide, or both state-house, etc. The method also returns a list of jurisdictions for which vote counts were zero or missing.

src/electiondata/__init__.py

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -3010,15 +3010,15 @@ def export_nist(
30103010
election: str,
30113011
jurisdiction,
30123012
) -> Union[str, Dict[str, Any]]:
3013-
"""picks either version 1.0 (json) or version 2.0 (xml) based on value of constants.nist_version"""
3014-
if electiondata.constants.nist_version == "1.0":
3015-
return self.export_nist_v1_json(election, jurisdiction)
3016-
elif electiondata.constants.nist_version == "2.0":
3017-
return self.export_nist_v2(election, jurisdiction)
3013+
"""picks either json or xml based on value of constants.nist_version"""
3014+
if electiondata.constants.default_nist_format == "json":
3015+
return self.export_nist_json(election,jurisdiction)
3016+
elif electiondata.constants.default_nist_format == "xml":
3017+
return self.export_nist_xml_as_string(election,jurisdiction)
30183018
else:
30193019
return ""
30203020

3021-
def export_nist_v1_json(self, election: str, jurisdiction: str) -> Dict[str, Any]:
3021+
def export_nist_json(self,election: str,jurisdiction: str) -> Dict[str,Any]:
30223022
election_id = db.name_to_id(self.session, "Election", election)
30233023
jurisdiction_id = db.name_to_id(self.session, "ReportingUnit", jurisdiction)
30243024

@@ -3045,16 +3045,16 @@ def export_nist_v1_json(self, election: str, jurisdiction: str) -> Dict[str, Any
30453045

30463046
return election_report
30473047

3048-
def export_nist_v1(
3048+
def export_nist_json_as_string(
30493049
self,
30503050
election: str,
30513051
jurisdiction: str,
30523052
) -> str:
3053-
"""exports NIST v1 json string"""
3054-
json_string = json.dumps(self.export_nist_v1_json(election, jurisdiction))
3053+
"""exports NIST v2 json string"""
3054+
json_string = json.dumps(self.export_nist_json(election,jurisdiction))
30553055
return json_string
30563056

3057-
def export_nist_v2(
3057+
def export_nist_xml_as_string(
30583058
self,
30593059
election: str,
30603060
jurisdiction: str,
@@ -3716,7 +3716,7 @@ def compare_to_results_file(
37163716
)
37173717
if not not_found_in_db.empty:
37183718
nfid_str = (
3719-
f"\nSome expected constests not found. For details, see {sub_dir}"
3719+
f"\nSome expected contests not found. For details, see {sub_dir}"
37203720
)
37213721
err = ui.add_new_error(
37223722
err,
@@ -3925,7 +3925,7 @@ def load_results_df(
39253925
err,
39263926
"jurisdiction",
39273927
juris_true_name,
3928-
f"No contest-selection pairs recognized via munger {munger_name}",
3928+
f"No contest-selection pairs recognized in file {file_name} via munger {munger_name}",
39293929
)
39303930
return err
39313931

src/electiondata/constants/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -602,7 +602,7 @@ def jurisdiction_wide_contests(abbr: str) -> List[str]:
602602

603603
# constants dictated by NIST
604604
if 1:
605-
nist_version = "1.0"
605+
default_nist_format = "json" # other option is "xml"
606606
default_issuer = (
607607
"unspecified user of code base at github.com/ElectionDataAnalysis/electiondata"
608608
)

src/electiondata/munge/__init__.py

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -624,6 +624,8 @@ def melt_to_one_count_column(
624624
if "in_count_headers" in p["munge_field_types"]:
625625
# split header_0 column into separate columns
626626
# # get header_rows
627+
# TODO: the following throws PerformanceError for Kansas House of Representatives 2020g. Rather than
628+
# assigning values, need to use melted = pd.concat([melted, <new_columns>])
627629
melted[
628630
[f"count_header_{idx}" for idx in p["count_header_row_numbers"]]
629631
] = pd.DataFrame(melted["header_0"].str.split(";:;", expand=True).values)[
@@ -691,8 +693,8 @@ def add_contest_id(
691693
working, new_err = replace_raw_with_internal_ids(
692694
working,
693695
juris_true_name,
694-
file_name,
695696
munger_name,
697+
file_name,
696698
df_for_type[c_type],
697699
f"{c_type}Contest",
698700
"Name",
@@ -741,7 +743,8 @@ def add_contest_id(
741743
# fail if fatal errors or no contests recognized (in reverse order, just for fun
742744
if working_temp.empty:
743745
err = ui.add_new_error(
744-
err, "jurisdiction", juris_true_name, f"No contests recognized."
746+
err, "jurisdiction", juris_true_name,
747+
f"No contests recognized from file {file_name} with munger {munger_name}."
745748
)
746749
else:
747750
working = working_temp
@@ -1979,7 +1982,8 @@ def to_standard_count_frame(
19791982
)
19801983

19811984
# loop through dataframes in list
1982-
standard[sheet] = pd.DataFrame()
1985+
# create list of standard-form dataframes from dataframes in list
1986+
standard_list = list()
19831987
for n in range(len(df_list)):
19841988
raw = df_list[n]
19851989
working = raw.copy()
@@ -2050,9 +2054,12 @@ def to_standard_count_frame(
20502054
# clean Unnamed:... out of any values
20512055
working = blank_out(working, constants.pandas_default_pattern)
20522056

2053-
# append data from the nth dataframe to the standard-form dataframe
2057+
# append standard-forme data from the nth dataframe to the list
20542058
## NB: if df_list[n] fails it should not reach this statement
2055-
standard[sheet] = pd.concat([standard[sheet], working])
2059+
standard_list.append(working)
2060+
2061+
# put all the good standard-form dataframes together into one
2062+
standard[sheet] = pd.concat(standard_list)
20562063

20572064
# if even one df lacks a fatal error, consider all errors non-fatal for this sheet
20582065
non_fatal_dfs = [

src/ini_files_for_results/Kansas/ks_20g_ks_house_official.ini

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[election_results]
22
results_file=Kansas/2020_General_Election_Kansas_House_of_Representatives_results_by_precinct.xlsx
3-
munger_list=ks_gen_main,ks_gen_johnson_count_from_B,ks_gen_shawnee_count_from_B,ks_gen_sedgwick,ks_gen_wyandotte_4_line_header_first_count_col_3
3+
munger_list=ks_gen_main,ks_gen_johnson_count_from_B,ks_gen_shawnee_count_from_B,ks_gen_sedgwick,ks_gen_wyandotte_4_line_header_first_count_col_3,ks_gen_wyandotte_3_line_header_first_count_col_3,ks_gen_wyandotte_4_line_header_first_count_col_3_merged_rows
44
jurisdiction=Kansas
55
election=2020 General
66
results_short_name=ks_20g_kshouse

src/ini_files_for_results/New-Hampshire/nh20g_CD2_official.ini

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@ results_short_name=nh20g_cd2
77
results_download_date=2020-12-22
88
results_source=https://sos.nh.gov/elections/elections/election-results/
99
results_note=revised by hand to disambiguate counties & towns with same name (Carroll, Grafton, Hillsborough, Sullivan). Also, candidate Andrew Olding. As of 8/27/2021, the electiondata code throws a (seemingly harmless) warning when processing this file ( /usr/local/lib/python3.9/site-packages/openpyxl/worksheet/header_footer.py:48: UserWarning: Cannot parse header or footer so it will be ignored
10-
warn("""Cannot parse header or footer so it will be ignored"""))
11-
CountItemType=total
1210
is_preliminary=False
11+
CountItemType=total
1312

src/jurisdictions/Kansas/Candidate.txt

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -125,15 +125,12 @@ Rick Kloos
125125
Rachel Willis
126126
Brenda S. Dietrich
127127
Anthony Hensley
128-
Under Votes
129-
Over Votes
130128
Laura McConwell
131129
Ethan Corson
132130
Diana Whittington
133131
Cindy Holscher
134132
Vail Fruechting
135133
Ty Masterson
136-
Total Votes Cast
137134
Timothy Don Fry II
138135
Mary Ware
139136
Dan Kerschen
@@ -356,3 +353,6 @@ Vic (T-Bone) Miller
356353
Vicki Schmidt
357354
Virgil Weigel
358355
Wendy Bingesser
356+
Jordan Michael Mackey
357+
Greg Conchola
358+
Rick Parsons

src/jurisdictions/Kansas/dictionary.txt

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -167,7 +167,6 @@ Candidate Molly Baumgardner Baumgardner, Molly
167167
Candidate Monica Murnan Murnan, Monica
168168
Candidate Nancy J. Ingle Ingle, Nancy J.
169169
Candidate Other Other
170-
Candidate Over Votes Over Votes
171170
Candidate Pat Pettey Pat Pettey
172171
Candidate Pat Proctor Proctor, Pat
173172
Candidate Patrick Penn Penn, Patrick
@@ -224,13 +223,11 @@ Candidate Todd Maddox Maddox, Todd
224223
Candidate Tom Hawk Hawk, Tom
225224
Candidate Tom Holland Holland, Tom
226225
Candidate Tory Marie Arnberger Arnberger, Tory Marie
227-
Candidate Total Votes Cast Total Votes Cast
228226
Candidate Tracey Mann Mann, Tracey
229227
Candidate Trevor Jacobs Jacobs, Trevor
230228
Candidate Troy L. Waymaster Waymaster, Troy L.
231229
Candidate Ty Masterson Masterson, Ty
232230
Candidate Ty Masterson Ty Masterson
233-
Candidate Under Votes Under Votes
234231
Candidate Vail Fruechting Vail Fruechting
235232
Candidate Virgil Peck Peck, Virgil
236233
Candidate W. Michael Shimeall Shimeall, W. Michael
@@ -6761,3 +6758,6 @@ ReportingUnit Kansas;Wilson County Kansas;Wilson
67616758
ReportingUnit Kansas;Woodson County Kansas;Woodson
67626759
ReportingUnit Kansas;Wyandotte County Kansas;Wyandotte
67636760
CandidateContest KS Attorney General KS;Attorney General;statewide
6761+
Candidate Jordan Michael Mackey Jordan Michael Mackey
6762+
Candidate Greg Conchola Greg Conchola
6763+
Candidate Rick Parsons Rick Parsons

src/mungers/ks_gen_johnson_count_from_B.munger

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -49,14 +49,11 @@ CandidateContest=<count_header_0>
4949
Party=<count_header_2>
5050

5151

52-
53-
54-
55-
5652
# Values to ignore (optional) #
5753
[ignore]
5854
## E.g: Candidate=Total Votes Cast,Registered Voters ##
5955
ReportingUnit=JOHNSON;COUNTY TOTALS,Johnson;COUNTY TOTALS
56+
Candidate=Write-in,Under Votes,Over Votes
6057

6158
# Lookup formula sections #
6259
## Required when foreign keys are used in munge formulas and ##

0 commit comments

Comments
 (0)