Skip to content

Commit fcfb792

Browse files
committed
documentation and set major subdivision types when Analyzer is created
1 parent 43e7b78 commit fcfb792

9 files changed

Lines changed: 343 additions & 192 deletions

File tree

docs/User_Guide.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ You will need a main parameter file to specify paths and database connection inf
1717
See the [template file](../src/parameter_file_templates/run_time.ini.template) for required parameters. Avoid percent signs and line breaks in the parameter values.
1818

1919
### Other recommended files
20-
To avoid the overhead of deriving the major subdivision type for each jurisdiction from the database, make sure that your repository has a [000_major_subjurisdiction_types.txt](../src/jurisdictions/000_major_subjurisdiction_types.txt) in the [jurisdictions directory](../src/jurisdictions/). This file allows the user to specify other major subdivisions. For example, it may make sense to consider towns as the major subdivisions in Connecticut rather than counties. Or a user may wish to use congressional districts as the major subdivision -- though such a user should not assume that the nesting relationships (say, of precincts within congressional districts) have been coded in the [`ReportingUnit.txt` file](../src/jurisdictions/Connecticut/ReportingUnit.txt) or the database.
20+
To avoid the overhead of deriving the major subdivision type for each jurisdiction from the database, make sure that your repository has a [000_major_subjurisdiction_types.txt](../src/jurisdictions/000_for_all_jurisdictions/000_major_subjurisdiction_types.txt) in the [jurisdictions directory](../src/jurisdictions/). This file allows the user to specify other major subdivisions. For example, it may make sense to consider towns as the major subdivisions in Connecticut rather than counties. Or a user may wish to use congressional districts as the major subdivision -- though such a user should not assume that the nesting relationships (say, of precincts within congressional districts) have been coded in the [`ReportingUnit.txt` file](../src/jurisdictions/Connecticut/ReportingUnit.txt) or the database.
2121

2222
## Determining a Munger
2323
Election result data comes in a variety of file formats. Even when the basic format is the same, file columns may have different interpretations. The code is built to ease -- as much as possible -- the chore of processing and interpreting each format. Following the [Jargon File](http://catb.org/jargon/html/M/munge.html), which gives one meaning of "munge" as "modify data in some way the speaker doesn't need to go into right now or cannot describe succinctly," we call each set of basic information about interpreting an election result file a "munger".
@@ -231,7 +231,7 @@ Texas;Harrison County county
231231
```
232232
Counties must be added by hand.
233233

234-
NB: in some jurisdictions, the major subdivision type is not 'county. For instance, Louisiana's major subdivisions are called 'parish'. In the `elections.analyze` module, several routines roll up results to the major subdivision -- usually counties. The ReportingUnitType of the major subdivision is read from the file `src/jurisdictions/000_major_subjurisdiction_types.txt` if possible; if that file is missing, or does not provide a subdivision type for the particular jurisdiction in question, the system will try to deduce the major subdivision type from the database.
234+
NB: in some jurisdictions, the major subdivision type is not 'county. For instance, Louisiana's major subdivisions are called 'parish'. In the `elections.analyze` module, several routines roll up results to the major subdivision -- usually counties. By default, the ReportingUnitType of the major subdivision is read from the file [major_subjurisdiction_types.txt](../src/jurisdictions/000_for_all_jurisdictions/major_subjurisdiction_types.txt) if possible; if that file is missing, or does not provide a subdivision type for the particular jurisdiction in question, the system will try to deduce the major subdivision type from the database. A different file of subdivision types can be specified with the optional `major_subdivision_file` parameter in `Analyzer()` or `DataLoader()`
235235

236236
The system assumes that internal database names of ReportingUnits carry information about the nesting of the basic ReportingUnits (e.g., counties, towns, wards, etc., but not congressional districts) via semicolons. For example: `
237237
* `Pennsylvania;Philadelphia;Ward 8;Division 6` is a precinct in

src/electiondata/__init__.py

Lines changed: 283 additions & 96 deletions
Large diffs are not rendered by default.

src/electiondata/constants/__init__.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -127,9 +127,11 @@
127127
default_subdivision_type = "county"
128128
subdivision_reference_file_path = os.path.join(
129129
"jurisdictions",
130-
"000_major_subjurisdiction_types.txt",
130+
"000_for_all_jurisdictions",
131+
"major_subjurisdiction_types.txt",
131132
)
132133

134+
133135
def jurisdiction_wide_contests(abbr: str) -> List[str]:
134136
"""
135137
Inputs:
@@ -148,6 +150,7 @@ def jurisdiction_wide_contests(abbr: str) -> List[str]:
148150
f"{abbr} Secretary of State",
149151
]
150152

153+
151154
# display information
152155
if 1:
153156
"""maps ReportingUnitType of election district of contest to the user-facing label for that type of contest

src/electiondata/database/__init__.py

Lines changed: 33 additions & 68 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
import sqlalchemy
66
import sqlalchemy as sa
77
import sqlalchemy.orm
8+
from sqlalchemy.orm import Session
89
from sqlalchemy import (
910
MetaData,
1011
Table,
@@ -21,7 +22,6 @@
2122
TIMESTAMP,
2223
Boolean,
2324
) # these are used, even if syntax-checker can't tell
24-
from sqlalchemy.orm import Session
2525
import io
2626
import csv
2727
import inspect
@@ -947,6 +947,30 @@ def vote_type_list(
947947
return vt_list, err_str
948948

949949

950+
def jurisdiction_id_list(session: Session) -> List[int]:
951+
"""
952+
Required inputs:
953+
session: Session,
954+
955+
Returns:
956+
List[int], list of jurisdiction ids for jurisdictions with data in db
957+
referenced by <session>
958+
"""
959+
q = """
960+
SELECT DISTINCT "ReportingUnit_Id" FROM _datafile;
961+
"""
962+
connection = session.bind.raw_connection()
963+
cursor = connection.cursor()
964+
cursor.execute(q)
965+
results = cursor.fetchall()
966+
juris_id_list = [x[0] for x in results]
967+
if cursor:
968+
cursor.close()
969+
if connection:
970+
connection.close()
971+
return juris_id_list
972+
973+
950974
def data_file_list_cursor(
951975
cursor: psycopg2.extensions.cursor,
952976
election_id: int,
@@ -1127,15 +1151,14 @@ def get_relevant_election(session: Session, filters: List[str]) -> pd.DataFrame:
11271151

11281152

11291153
def get_relevant_contests(
1130-
session: Session, filters: List[str], repository_content_root: str
1154+
session: Session, filters: List[str], major_subdivision_dictionary: Dict[str, str]
11311155
) -> pd.DataFrame:
11321156
"""
11331157
Required inputs:
11341158
session: Session, sqlalchemy database session
11351159
filters: List[str], list containing one jurisdiction name and one election name
11361160
(and possibly other strings as well)
1137-
repository_content_root: str, path to repository content root directory (so that major subdivision can
1138-
be found)
1161+
major_subdivision_dictionary: Dict[str, str], for finding major subdivision by jurisdiction
11391162
11401163
Returns:
11411164
pd.DataFrame, dataframe of all contests that have results for the first election and first jurisdiction
@@ -1144,25 +1167,16 @@ def get_relevant_contests(
11441167
11451168
Notes:
11461169
<filters> is expected to have exactly one election and exactly one jurisdiction. If there are more than
1147-
one, only the first of each will be used.
1148-
counts for ReportingUnits that don't roll up to a major subdivision (e.g., PR legislative results by district
1170+
one, only the first of each will be used.
1171+
Counts for ReportingUnits that don't roll up to a major subdivision (e.g., PR legislative results by district
11491172
when major subdivision is municipality) will not be included.
11501173
"""
11511174

11521175
election_id = list_to_id(session, "Election", filters)
11531176
jurisdiction_id = list_to_id(session, "ReportingUnit", filters)
1154-
jurisdiction = name_from_id(
1155-
session, "ReportingUnit", jurisdiction_id
1156-
)
1157-
subdivision_type = get_major_subdiv_type(
1158-
session,
1159-
jurisdiction,
1160-
file_path=os.path.join(
1161-
repository_content_root,
1162-
"jurisdictions",
1163-
"000_major_subjurisdiction_types.txt",
1164-
),
1165-
)
1177+
jurisdiction = name_from_id(session, "ReportingUnit", jurisdiction_id)
1178+
subdivision_type = major_subdivision_dictionary[jurisdiction]
1179+
11661180
working = unsummed_vote_counts_with_rollup_subdivision_id(
11671181
session,
11681182
election_id,
@@ -1188,55 +1202,6 @@ def get_relevant_contests(
11881202
return result_df
11891203

11901204

1191-
def get_major_subdiv_type(
1192-
session: Session,
1193-
jurisdiction: str,
1194-
file_path: Optional[str] = None,
1195-
content_root: Optional[str] = None,
1196-
) -> Optional[str]:
1197-
"""Returns the type of the major subdivision, if found. Tries first from <file_path> (if given);
1198-
if that fails, or no file_path given, tries from a particular file in the repository
1199-
if the content root is given; if
1200-
that fails, tries to deduce from database. If nothing found, returns None"""
1201-
# if file is given,
1202-
if file_path:
1203-
# try to get the major subdivision type from the file
1204-
subdiv_from_file = get_major_subdiv_from_file(file_path, jurisdiction)
1205-
if subdiv_from_file:
1206-
return subdiv_from_file
1207-
elif content_root:
1208-
# try from file in repo
1209-
subdiv_from_repo = get_major_subdiv_from_file(
1210-
os.path.join(
1211-
content_root,
1212-
constants.subdivision_reference_file_path,
1213-
),
1214-
jurisdiction,
1215-
)
1216-
if subdiv_from_repo:
1217-
return subdiv_from_repo
1218-
# if not found in file or repo, calculate major subdivision type from the db
1219-
jurisdiction_id = name_to_id(session, "ReportingUnit", jurisdiction)
1220-
subdiv_type = get_jurisdiction_hierarchy(session, jurisdiction_id)
1221-
return subdiv_type
1222-
1223-
1224-
def get_major_subdiv_from_file(f_path: str, jurisdiction: str) -> Optional[str]:
1225-
"""return major subdivision of <jurisdiction> from file <f_path> with columns
1226-
jurisdiction, major_sub_jurisdiction_type.
1227-
If anything goes wrong, return None"""
1228-
try:
1229-
df = pd.read_csv(f_path, sep="\t")
1230-
mask = df.jurisdiction == jurisdiction
1231-
if mask.any():
1232-
subdiv_type = df.loc[mask, "major_sub_jurisdiction_type"].unique()[0]
1233-
else:
1234-
subdiv_type = None
1235-
except:
1236-
subdiv_type = None
1237-
return subdiv_type
1238-
1239-
12401205
def get_jurisdiction_hierarchy(session: Session, jurisdiction_id: int) -> Optional[str]:
12411206
"""get reporting unit type id of reporting unit one level down from jurisdiction.
12421207
Omit particular types that are contest types, not true reporting unit types
@@ -1776,7 +1741,7 @@ def read_external(
17761741
) -> pd.DataFrame:
17771742
"""returns a dataframe with columns <fields>,
17781743
where each field is in the ExternalDataSet table.
1779-
If <major_subdivisions_only> is True, returns only major sub-divisions
1744+
If <subdivision_type> is given, returns only reporting units of that subdivision_type
17801745
(typically counties)"""
17811746
if restrict_by_label:
17821747
label_restriction = f""" AND "Label" = '{restrict_by_label}'"""

src/electiondata/juris/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -93,9 +93,9 @@ def check_dictionary(dictionary_path: str) -> Optional[dict]:
9393
dictionary_dir = Path(dictionary_path).parent.name
9494

9595
# dedupe the dictionary
96-
clean_and_dedupe(dictionary_path,clean_candidates=True)
96+
clean_and_dedupe(dictionary_path, clean_candidates=True)
9797
# check that no entry is null
98-
df = pd.read_csv(dictionary_path,**constants.standard_juris_csv_reading_kwargs)
98+
df = pd.read_csv(dictionary_path, **constants.standard_juris_csv_reading_kwargs)
9999
null_mask = df.T.isnull().any()
100100
if null_mask.any():
101101
# drop null rows and report error

src/electiondata/munge/__init__.py

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -136,7 +136,9 @@ def add_regex_column(
136136
# replace via regex if possible; otherwise msg
137137
# # put informative error message in new_col (to be overwritten if no error)
138138
old = working[old_col].copy()
139-
working[new_col] = working[old_col] + f"{constants.regex_failure_string} {pattern_str}"
139+
working[new_col] = (
140+
working[old_col] + f"{constants.regex_failure_string} {pattern_str}"
141+
)
140142

141143
# # where regex succeeds, replace error message with good value
142144
mask = working[old_col].str.match(p)
@@ -216,7 +218,9 @@ def add_column_from_formula(
216218

217219
# add column to <working> dataframe via the concatenation formula
218220
if last_text:
219-
working = add_constant_column(working, new_col, last_text[0], dtype="string")
221+
working = add_constant_column(
222+
working, new_col, last_text[0], dtype="string"
223+
)
220224
else:
221225
err = ui.add_new_error(
222226
err,
@@ -321,11 +325,16 @@ def replace_raw_with_internal_name(
321325
dictionary = raw_to_internal_dictionary_df(dictionary_df, element)
322326

323327
# report values not matched by regex
324-
regex_fail_mask = working[f"{element}_raw"].str.contains(constants.regex_failure_string)
328+
regex_fail_mask = working[f"{element}_raw"].str.contains(
329+
constants.regex_failure_string
330+
)
325331
if regex_fail_mask.any():
326332
failed = "\n".join(sorted(working[regex_fail_mask][f"{element}_raw"].unique()))
327333
err = ui.add_new_error(
328-
err, "warn-munger", munger_name, f"\nSome raw {element} values in {file_name} not matched by regular expression:\n{failed}"
334+
err,
335+
"warn-munger",
336+
munger_name,
337+
f"\nSome raw {element} values in {file_name} not matched by regular expression:\n{failed}",
329338
)
330339
if drop_unmatched:
331340
working = working[~regex_fail_mask]
@@ -353,7 +362,7 @@ def replace_raw_with_internal_name(
353362
# lines where regex failed don't count as dictionary failures
354363
unmatched_raw = [
355364
x for x in unmatched_raw if constants.regex_failure_string.strip() not in x
356-
] # TODO redundant with calculation above
365+
] # TODO redundant with calculation above
357366
if len(unmatched_raw) > 0 and element != "BallotMeasureContest":
358367
unmatched_str = "\n".join(unmatched_raw)
359368
e = f"\n{element}s (found with munger {munger_name}) not found in dictionary.txt :\n{unmatched_str}\n\n"

src/electiondata/nist/__init__.py

Lines changed: 4 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,7 @@ def nist_v2_xml_export_tree(
1717
session: Session,
1818
election: str,
1919
jurisdiction: str,
20-
rollup: bool = False,
21-
major_subdivision: Optional[str] = None,
22-
sub_div_type_file: Optional[str] = None,
20+
rollup_subdivision_type: Optional[str] = None,
2321
issuer: str = constants.default_issuer,
2422
issuer_abbreviation: str = constants.default_issuer_abbreviation,
2523
status: str = constants.default_status,
@@ -29,9 +27,7 @@ def nist_v2_xml_export_tree(
2927
from the given election and jurisdiction. Note that all available results will
3028
be exported. I.e., if database has precinct-level results, the tree will
3129
contain precinct-level results.
32-
Major subdivision for rollup is <major_subdivision> if that's given;
33-
otherwise major subdivision is read from <sub_div_type_file> if given;
34-
otherwise pulled from db.
30+
Major subdivision for rollup is <rollup_subdivision_type> ;
3531
"""
3632
err = None
3733
# set up
@@ -50,16 +46,9 @@ def nist_v2_xml_export_tree(
5046
# include jurisdiction id in gp unit ids
5147
gpu_idxs = {jurisdiction_id}
5248

53-
if rollup:
54-
# get major subdivision type if not provided
55-
if not major_subdivision:
56-
major_subdivision = db.get_major_subdiv_type(
57-
session, jurisdiction, file_path=sub_div_type_file
58-
)
59-
60-
# get vote count data
49+
# get vote count data (if rollup_subdivision_type is None, no rollup will happen)
6150
results_df = db.read_vote_count_nist(
62-
session, election_id, jurisdiction_id, rollup_ru_type=major_subdivision
51+
session, election_id, jurisdiction_id, rollup_ru_type=rollup_subdivision_type
6352
)
6453

6554
# collect ids for gp units that have vote counts, gp units that are election districts

src/electiondata/userinterface/__init__.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -717,9 +717,7 @@ def report(
717717
for nk in only_warns:
718718
# prepare output string
719719
nk_name = Path(nk).name
720-
out_str = (
721-
f"{et.title()} warnings ({nk_name}):\n{msg[(f'warn-{et}', nk)]}\n"
722-
)
720+
out_str = f"{et.title()} warnings ({nk_name}):\n{msg[(f'warn-{et}', nk)]}\n"
723721

724722
# write output
725723
# write info to a .warnings file named for the error-type and name_key

src/jurisdictions/000_major_subjurisdiction_types.txt renamed to src/jurisdictions/000_for_all_jurisdictions/major_subjurisdiction_types.txt

File renamed without changes.

0 commit comments

Comments
 (0)