Skip to content

Commit 97a25c8

Browse files
Merge branch 'main' into SCC-3775/get-request-retries
2 parents 63341c7 + f32ff08 commit 97a25c8

12 files changed

Lines changed: 327 additions & 20 deletions

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
.DS_Store
22
dist/
33
__pycache__/
4+
.vscode/
45
*env/
56
*.py[cod]
67
*$py.class

.python-version

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
3.9.16

CHANGELOG.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
# Changelog
2-
## v1.1.0
2+
## v1.1.2
3+
- Update config_helper to accept list environment variables
4+
5+
## v1.1.0/v1.1.1
36
- Add retries for empty responses in oauth2 client. This was added to address a known quirk in the Sierra API where this response is returned:
47
```
58
> GET / HTTP/1.1
@@ -8,6 +11,16 @@
811
> Accept: */*
912
>
1013
```
14+
- Due to an accidental deployment, v1.1.0 and v1.1.1 were both released but are identical
15+
16+
## v1.0.4 - 6/28/23
17+
- Enforce Kinesis stream 1000 records/second write limit
18+
19+
## v1.0.3 - 5/19/23
20+
- Add research_catalog_identifier_helper function
21+
22+
## v1.0.2 - 5/18/23
23+
- Identical to v1.0.1 -- this was mistakenly deployed to QA without any changes
1124

1225
## v1.0.1 - 4/3/23
1326
- Add transaction support to RedshiftClient

README.md

Lines changed: 44 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -14,20 +14,40 @@ This package contains common Python utility classes and functions.
1414
* Making requests to the Oauth2 authenticated APIs such as NYPL Platform API and Sierra
1515

1616
## Functions
17-
* Reading a YAML config file and putting the contents in os.environ
17+
* Reading a YAML config file and putting the contents in os.environ -- see `config/sample.yaml` for an example of how the config file should be formatted
1818
* Creating a logger in the appropriate format
1919
* Obfuscating a value using bcrypt
20+
* Parsing/building Research Catalog identifiers
21+
22+
## Usage
23+
```python
24+
# test_file.py
25+
from nypl_py_utils.classes.kinesis_client import KinesisClient
26+
from nypl_py_utils.functions.config_helper import load_env_file
27+
28+
load_env_file(...)
29+
kinesis_client = KinesisClient(...)
30+
```
31+
32+
```bash
33+
# requirements.txt
34+
35+
# Do not use any version below 1.0.0
36+
# All available optional dependencies can be found in pyproject.toml.
37+
# See the "Managing dependencies" section below for more details.
38+
nypl-py-utils[kinesis-client,config-helper]==1.1.2
39+
```
2040

2141
## Developing locally
2242
In order to use the local version of the package instead of the global version, use a virtual environment. To set up a virtual environment and install all the necessary dependencies, run:
2343

2444
```
25-
python3 -m venv testenv
26-
source testenv/bin/activate
45+
python3 -m venv .venv
46+
source .venv/bin/activate
2747
pip install --upgrade pip
2848
pip install .
2949
pip install '.[development]'
30-
deactivate && source testenv/bin/activate
50+
deactivate && source .venv/bin/activate
3151
```
3252

3353
## Managing dependencies
@@ -37,8 +57,23 @@ When a new client or helper file is created, a new optional dependency set shoul
3757

3858
The optional dependency sets also give the developer the option to manually list out the dependencies of the clients rather than relying upon what the package thinks is required, which can be beneficial in certain circumstances. For instance, AWS lambda functions come with `boto3` and `botocore` pre-installed, so it's not necessary to include these (rather hefty) dependencies in the lambda deployment package.
3959

40-
### Troubleshooting
41-
If running `main.py` in this virtual environment produces the following error:
60+
## Troubleshooting
61+
### Using PostgreSQLClient in an AWS Lambda
62+
Because `psycopg` requires a statically linked version of the `libpq` library, the `PostgreSQLClient` cannot be installed as-is in an AWS Lambda function. Instead, it must be packaged as follows:
63+
```bash
64+
pip install --target ./package nypl-py-utils[postgresql-client]==1.1.2
65+
66+
pip install \
67+
--platform manylinux2014_x86_64 \
68+
--target=./package \
69+
--implementation cp \
70+
--python 3.9 \
71+
--only-binary=:all: --upgrade \
72+
'psycopg[binary]'
73+
```
74+
75+
### Using PostgreSQLClient locally
76+
If using the `PostgreSQLClient` produces the following error locally:
4277
```
4378
ImportError: no pq wrapper available.
4479
Attempts made:
@@ -48,7 +83,7 @@ Attempts made:
4883
```
4984

5085
then try running:
51-
```
86+
```bash
5287
pip uninstall psycopg
5388
pip install "psycopg[c]"
5489
```
@@ -62,6 +97,7 @@ This repo uses the [Main-QA-Production](https://github.com/NYPL/engineering-gene
6297
- Cut a feature branch off of `main`
6398
- Commit changes to your feature branch
6499
- File a pull request against `main` and assign a reviewer (who must be an owner)
100+
- Include relevant updates to pyproject.toml and README
65101
- In order for the PR to be accepted, it must pass all unit tests, have no lint issues, and update the CHANGELOG (or contain the `Skip-Changelog` label in GitHub)
66102
- After the PR is accepted, merge into `main`
67103
- Merge `main` > `qa`
@@ -70,4 +106,4 @@ This repo uses the [Main-QA-Production](https://github.com/NYPL/engineering-gene
70106
- Deploy app to production on GitHub and confirm it works
71107

72108
## Deployment
73-
The utils repo is deployed as a PyPI package [here](https://pypi.org/project/nypl-py-utils/) and as a Test PyPI package for QA purposes [here](https://test.pypi.org/project/nypl-py-utils/). In order to be deployed, the version listed in `pyproject.toml` **must be updated**. To deploy to Test PyPI, create a new release in GitHub and tag it `qa-vX.X.X`. The GitHub Actions deploy-qa workflow will then build and publish the package. To deploy to production PyPI, create a release and tag it `production-vX.X.X`.
109+
The utils repo is deployed as a PyPI package [here](https://pypi.org/project/nypl-py-utils/) and as a Test PyPI package for QA purposes [here](https://test.pypi.org/project/nypl-py-utils/). In order to be deployed, the version listed in `pyproject.toml` **must be updated**. To deploy to Test PyPI, [create a new release](https://github.com/NYPL/python-utils/releases) in GitHub and tag it `qa-vX.X.X`. The GitHub Actions deploy-qa workflow will then build and publish the package. To deploy to production PyPI, create a release and tag it `production-vX.X.X`.

config/sample.yaml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
---
2+
PLAINTEXT_VARIABLES:
3+
STRING_VAR: string-var
4+
INT_VAR: 1
5+
LIST_VAR:
6+
- string-var2
7+
- 2
8+
ENCRYPTED_VARIABLES:
9+
ENCRYPTED_STRING_VAR: AQECAHh7ea2tyZ6phZgT4B9BDKwguhlFtRC6hgt+7HbmeFsrsgAAAGowaAYJKoZIhvcNAQcGoFswWQIBADBUBgkqhkiG9w0BBwEwHgYJYIZIAWUDBAEuMBEEDCvE8Pc8PiUEiCGpEAIBEIAnf8fz6YXH959A0ygrM4S95giFnwvp9dYFzp/2ViAIlD5GZ1S04vay
10+
ENCRYPTED_INT_VAR: AQECAHh7ea2tyZ6phZgT4B9BDKwguhlFtRC6hgt+7HbmeFsrsgAAAF8wXQYJKoZIhvcNAQcGoFAwTgIBADBJBgkqhkiG9w0BBwEwHgYJYIZIAWUDBAEuMBEEDFQdg7ua7D8XH7UZGgIBEIAcpkIN6+56sbR3Vbk12NX2QDY28dnL8IWgVdnBRA==
11+
ENCRYPTED_LIST_VAR:
12+
- AQECAHh7ea2tyZ6phZgT4B9BDKwguhlFtRC6hgt+7HbmeFsrsgAAAGswaQYJKoZIhvcNAQcGoFwwWgIBADBVBgkqhkiG9w0BBwEwHgYJYIZIAWUDBAEuMBEEDMTe0jJyHxGaiy0PHQIBEIAo4+qpfJp/gfZqhl1GtN/q9ebn2isiVOn5QLK/fcUtWeG182jiKPdOFA==
13+
- AQECAHh7ea2tyZ6phZgT4B9BDKwguhlFtRC6hgt+7HbmeFsrsgAAAF8wXQYJKoZIhvcNAQcGoFAwTgIBADBJBgkqhkiG9w0BBwEwHgYJYIZIAWUDBAEuMBEEDGsG/1m7nes884q8vQIBEIAc9eBDgUgVzVsK3lyebNmc09kGfP7Gzwm6ESJAiA==
14+
...

pyproject.toml

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "nypl_py_utils"
7-
version = "1.0.1"
7+
version = "1.1.2"
88
authors = [
99
{ name="Aaron Friedman", email="aaronfriedman@nypl.org" },
1010
]
@@ -63,8 +63,11 @@ config-helper = [
6363
obfuscation-helper = [
6464
"bcrypt>=4.0.1"
6565
]
66+
research-catalog-identifier-helper = [
67+
"requests>=2.28.1"
68+
]
6669
development = [
67-
"nypl_py_utils[avro-encoder,kinesis-client,kms-client,mysql-client,oauth2-api-client,postgresql-client,postgresql-pool-client,redshift-client,s3-client,config-helper,obfuscation-helper]",
70+
"nypl_py_utils[avro-encoder,kinesis-client,kms-client,mysql-client,oauth2-api-client,postgresql-client,postgresql-pool-client,redshift-client,s3-client,config-helper,obfuscation-helper,research-catalog-identifier-helper]",
6871
"flake8>=6.0.0",
6972
"freezegun>=1.2.2",
7073
"mock>=4.0.3",

src/nypl_py_utils/classes/kinesis_client.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,14 +39,22 @@ def close(self):
3939
def send_records(self, records):
4040
"""
4141
Sends list of records (usually represented as Avro-encoded byte
42-
strings) to Kinesis in batches of size self.batch_size.
42+
strings) to Kinesis in batches of size self.batch_size. Kinesis can
43+
only handle 1000 records per second, so this method waits a second
44+
between each 1000 records.
4345
"""
46+
records_sent_since_pause = 0
4447
for i in range(0, len(records), self.batch_size):
4548
encoded_batch = records[i:i + self.batch_size]
4649
kinesis_records = [{'Data': record, 'PartitionKey':
4750
str(int(time.time() * 1000000000))}
4851
for record in encoded_batch]
52+
53+
if records_sent_since_pause + len(encoded_batch) > 1000:
54+
records_sent_since_pause = 0
55+
time.sleep(1)
4956
self._send_kinesis_format_records(kinesis_records, 1)
57+
records_sent_since_pause += len(encoded_batch)
5058

5159
def _send_kinesis_format_records(self, kinesis_records, call_count):
5260
"""

src/nypl_py_utils/functions/config_helper.py

Lines changed: 22 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
import json
12
import os
23
import yaml
34

@@ -11,10 +12,20 @@ def load_env_file(run_type, file_string):
1112
"""
1213
This method loads a YAML config file containing environment variables,
1314
decrypts whichever are encrypted, and puts them all into os.environ as
14-
strings.
15+
strings. For a YAML variable containing a list of values, the list is
16+
exported into os.environ as a json string and should be loaded as such.
1517
1618
It requires the YAML file to be split into a 'PLAINTEXT_VARIABLES' section
17-
and an 'ENCRYPTED_VARIABLES' section.
19+
and an 'ENCRYPTED_VARIABLES' section. See config/sample.yaml for an example
20+
config file.
21+
22+
Parameters
23+
----------
24+
run_type: str
25+
The name of the config file to use, e.g. 'sample'
26+
file_string: str
27+
The path to the config files with the filename as a variable to be
28+
interpolated, e.g. 'config/{}.yaml'
1829
"""
1930

2031
env_dict = None
@@ -35,11 +46,18 @@ def load_env_file(run_type, file_string):
3546

3647
if env_dict:
3748
for key, value in env_dict.get('PLAINTEXT_VARIABLES', {}).items():
38-
os.environ[key] = str(value)
49+
if type(value) is list:
50+
os.environ[key] = json.dumps(value)
51+
else:
52+
os.environ[key] = str(value)
3953

4054
kms_client = KmsClient()
4155
for key, value in env_dict.get('ENCRYPTED_VARIABLES', {}).items():
42-
os.environ[key] = kms_client.decrypt(value)
56+
if type(value) is list:
57+
decrypted_list = [kms_client.decrypt(v) for v in value]
58+
os.environ[key] = json.dumps(decrypted_list)
59+
else:
60+
os.environ[key] = kms_client.decrypt(value)
4361
kms_client.close()
4462

4563

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
import os
2+
import re
3+
import requests
4+
from requests.exceptions import JSONDecodeError, RequestException
5+
6+
CACHE = {}
7+
8+
9+
def parse_research_catalog_identifier(identifier: str):
10+
"""
11+
Given a RC identifier (e.g. "b1234", "pb9876", "pi4567"), returns a dict
12+
defining:
13+
- nyplSource: One of sierra-nypl, recap-pul, recap-cul, or recap-hl (at
14+
writing)
15+
- nyplType: One of bib, holding, or item
16+
- id: The numeric string id
17+
"""
18+
if not isinstance(identifier, str):
19+
raise ResearchCatalogIdentifierError(
20+
f'Invalid RC identifier: {identifier}')
21+
22+
# Extract prefix from the identifier:
23+
match = re.match(r'^([a-z]+)', identifier)
24+
if match is None:
25+
raise ResearchCatalogIdentifierError(
26+
f'Invalid RC identifier: {identifier}')
27+
prefix = match[0]
28+
29+
# The id is the identifier without the prefix:
30+
id = identifier.replace(prefix, '')
31+
nyplType = None
32+
nyplSource = None
33+
34+
# Look up nyplType and nyplSource in nypl-core based on the prefix:
35+
for _nyplSource, mapping in nypl_core_source_mapping().items():
36+
if mapping.get('bibPrefix') == prefix:
37+
nyplType = 'bib'
38+
elif mapping.get('itemPrefix') == prefix:
39+
nyplType = 'item'
40+
elif mapping.get('holdingPrefix') == prefix:
41+
nyplType = 'holding'
42+
if nyplType is not None:
43+
nyplSource = _nyplSource
44+
break
45+
46+
if nyplSource is None:
47+
raise ResearchCatalogIdentifierError(
48+
f'Invalid RC identifier: {identifier}')
49+
50+
return {
51+
'nyplSource': nyplSource,
52+
'nyplType': nyplType,
53+
'id': id
54+
}
55+
56+
57+
def research_catalog_id_prefix(nyplSource: str, nyplType='bib'):
58+
"""
59+
Given a nyplSource (e.g. 'sierra-nypl') and nyplType (e.g. 'item'), returns
60+
the relevant prefix used in the RC identifier (e.g. 'i')
61+
"""
62+
if nypl_core_source_mapping().get(nyplSource) is None:
63+
raise ResearchCatalogIdentifierError(
64+
f'Invalid nyplSource: {nyplSource}')
65+
66+
if not isinstance(nyplType, str):
67+
raise ResearchCatalogIdentifierError(
68+
f'Invalid nyplType: {nyplType}')
69+
70+
prefixKey = f'{nyplType}Prefix'
71+
if nypl_core_source_mapping()[nyplSource].get(prefixKey) is None:
72+
raise ResearchCatalogIdentifierError(f'Invalid nyplType: {nyplType}')
73+
74+
return nypl_core_source_mapping()[nyplSource][prefixKey]
75+
76+
77+
def nypl_core_source_mapping():
78+
"""
79+
Builds a nypl-source-mapping by retrieving the mapping from NYPL-Core
80+
"""
81+
name = 'nypl-core-source-mapping'
82+
if not CACHE.get(name) is None:
83+
return CACHE[name]
84+
85+
url = os.environ.get('NYPL_CORE_SOURCE_MAPPING_URL',
86+
'https://raw.githubusercontent.com/NYPL/nypl-core/master/mappings/recap-discovery/nypl-source-mapping.json') # noqa
87+
try:
88+
response = requests.get(url)
89+
response.raise_for_status()
90+
except RequestException as e:
91+
raise ResearchCatalogIdentifierError(
92+
'Failed to retrieve nypl-core source-mapping file from {url}:'
93+
' {errorType} {errorMessage}'
94+
.format(url=url, errorType=type(e), errorMessage=e)) from None
95+
96+
try:
97+
CACHE[name] = response.json()
98+
return CACHE[name]
99+
except (JSONDecodeError, KeyError) as e:
100+
raise ResearchCatalogIdentifierError(
101+
'Failed to parse nypl-core source-mapping file: {errorType}'
102+
' {errorMessage}'
103+
.format(errorType=type(e), errorMessage=e)) from None
104+
105+
106+
class ResearchCatalogIdentifierError(Exception):
107+
def __init__(self, message=None):
108+
self.message = message

0 commit comments

Comments
 (0)