Skip to content

Commit f710774

Browse files
author
Mohan Kishore
committed
Mostly documentation - but few minor fixes as well
1 parent c1099d0 commit f710774

9 files changed

Lines changed: 756 additions & 175 deletions

File tree

README.rst

Lines changed: 38 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,9 @@ Python DynamoDB Lock
1616

1717

1818

19-
Python library that emulates the java-based dynamo-db-client from awslabs
20-
19+
This is a general purpose distributed locking library built on top of DynamoDB. It is heavily
20+
"inspired" by the java-based AmazonDynamoDBLockClient (https://github.com/awslabs/dynamodb-lock-client)
21+
library, and supports both coarse-grained and fine-grained locking.
2122

2223
* Free software: Apache Software License 2.0
2324
* Documentation: https://python-dynamodb-lock.readthedocs.io.
@@ -26,12 +27,43 @@ Python library that emulates the java-based dynamo-db-client from awslabs
2627
Features
2728
--------
2829

29-
* TODO
30+
* Acquire named locks - with configurable retry semantics
31+
* Periodic heartbeat/update for the locks to keep them alive
32+
* Auto-release the locks if there is no heartbeat for a configurable lease-duration
33+
* Notify an app-callback function if the lock is stolen, or gets too close to lease expiry
34+
* Store arbitrary application data along with the locks
35+
* Uses monotonically increasing clock to avoid issues due to clock skew and/or DST etc.
36+
* Auto-delete the database entries after a configurable expiry-period
37+
38+
39+
Consistency Notes
40+
-----------------
41+
42+
Note that while the lock itself can offer fairly strong consistency guarantees, it does NOT
43+
participate in any kind of distributed transaction.
44+
45+
For example, you may wish to acquire a lock for some customer-id "xyz", and then make some changes
46+
to the corresponding database entry for this customer-id, and then release the lock - thereby
47+
guaranteeing that only one process changes any given customer-id at a time.
48+
49+
While the happy path looks okay, consider a case where the application changes take a long time,
50+
and some errors/gc-pauses prevent the heartbeat from updating the lock. Then, some other client
51+
can assume the lock to be abandoned, and start processing the same customer in parallel. The original
52+
lock-client will recognize that its lock has been "stolen" and will let the app know through a callback
53+
event, but the app may have already committed its changes to the database. This can only be solved by
54+
having the application changes and the lock-release be part of a single distributed transaction - which,
55+
as indicated earlier, is NOT supported.
56+
57+
That said, in most cases, where the heartbeat is not expected to get delayed beyond the lock's lease
58+
duration, the implementation should work just fine.
59+
60+
Refer to an excellent post by Martin Kleppmann on this subject:
61+
https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html
3062

3163
Credits
3264
-------
3365

34-
This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.
66+
* AmazonDynamoDBLockClient: https://github.com/awslabs/dynamodb-lock-client
67+
* Cookiecutter: https://github.com/audreyr/cookiecutter
68+
* Cookiecutter Python: https://github.com/audreyr/cookiecutter-pypackage
3569

36-
.. _Cookiecutter: https://github.com/audreyr/cookiecutter
37-
.. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage

docs/_static/empty_dir_marker_file

Whitespace-only changes.

docs/conf.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@
3737
# autodoc config
3838
autoclass_content = 'both'
3939
add_module_names = False
40+
autodoc_member_order = 'bysource'
4041

4142
# Add any paths that contain templates here, relative to this directory.
4243
templates_path = ['_templates']

docs/modules.rst

Lines changed: 0 additions & 7 deletions
This file was deleted.

docs/python_dynamodb_lock.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ python\_dynamodb\_lock package
88

99

1010
python\_dynamodb\_lock module
11-
----------------------------------------------------
11+
-----------------------------
1212

1313
.. automodule:: python_dynamodb_lock.python_dynamodb_lock
1414
:members:

docs/usage.rst

Lines changed: 260 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,265 @@
1-
=====
21
Usage
32
=====
43

54
To use Python DynamoDB Lock in a project::
65

7-
import python_dynamodb_lock
6+
from python_dynamodb_lock.python_dynamodb_lock import *
7+
8+
9+
Basic Usage
10+
-----------
11+
12+
You would typically create (and shutdown) the DynamoDBLockClient at the application startup
13+
and shutdown::
14+
15+
# get a reference to the DynamoDB resource
16+
dynamodb_resource = boto3.resource('dynamodb')
17+
18+
# create the lock-client
19+
lock_client = DynamoDBLockClient(dynamodb_resource)
20+
21+
...
22+
23+
# close the lock_client
24+
lock_client.close()
25+
26+
27+
Then, you would wrap the lock acquisition and release around the code-block that needs to be
28+
protected by a mutex::
29+
30+
# acquire the lock
31+
lock = lock_client.acquire_lock('my_key')
32+
33+
# ... app logic that requires the lock ...
34+
35+
# release the lock after you are done
36+
lock.release()
37+
38+
39+
Both the lock_client constructor and the acquire_lock method support numerous arguments to help
40+
control/customize the behavior. Please look at the :doc:`API documentation <./python_dynamodb_lock>`
41+
for more details.
42+
43+
44+
Context Management
45+
------------------
46+
The DynamoDBLock class implements the context-management interface and you can auto-release the
47+
lock by doing something like this::
48+
49+
with lock_client.acquire_lock('my_key'):
50+
# ... app logic that requires the lock ...
51+
52+
53+
Table Creation
54+
--------------
55+
The DynamoDBLockClient provides a helper class-method to create the table in DynamoDB::
56+
57+
# get a reference to the DynamoDB client
58+
ddb_client = boto3.client('dynamodb')
59+
60+
# create the table
61+
DynamoDBLockClient.create_dynamodb_table(ddb_client)
62+
63+
The above code snippet will create a table with the default name, partition/sort-key column-names,
64+
read/write througput, but the method supports optional parameters to configure all of these.
65+
66+
That said, you can always create the table offline (e.g. using the AWS console) and use whatever
67+
table and column names you wish. Please do remember to setup the TTL attribute to enable auto-deleting
68+
of old/abandoned locks.
69+
70+
71+
Error-Handling
72+
--------------
73+
74+
There are a lot of things that can go wrong when dealing with distributed systems - the library
75+
tries to strike the right balance between hiding these errors, and allowing the library to handle
76+
specific kinds of errors as needed. Let's go through the different use-cases one at a time.
77+
78+
79+
Lock Acquisition
80+
~~~~~~~~~~~~~~~~
81+
82+
This is a synchronous use-case where the caller is waiting till it receives a lock. In this case,
83+
most of the errors are wrapped inside a DynamoDBError and raised up to the caller. The key error
84+
scenarios are the following:
85+
86+
* **Some other client holds the lock**
87+
* This is not treated as real error scenario. This client would just wait for a configurable
88+
retry_period, and then try to acquire the lock again.
89+
* **Race-condition amongst multiple lock-clients waiting to acquire lock**
90+
* Whenever the "old" lock is released (or expires), there may be multiple "new" clients trying
91+
to grab the lock - in which case, one of those would succeed, and the rest of them would get
92+
a DynamoDB's ConditionalUpdateException. This is also not treated as a real error scenario, and
93+
the client would just wait for the retry_period and then try again.
94+
* **This client goes over the configurable retry_timeout period**
95+
* After repeated retry attempts, this client might eventually go over the retry_timeout period
96+
(as provided by the caller) - then, a DynamoDBLockError with code == ACQUIRE_TIMEOUT will be thrown.
97+
* **Any other error/exception**
98+
* Any other error would be wrapped inside a DynamoDBLockError with code == UNKNOWN_ERROR and raised
99+
to the caller.
100+
101+
102+
Lock Release
103+
~~~~~~~~~~~~
104+
105+
While this is also a synchronous use-case, in most cases, by the time this method is called, the caller
106+
would have already committed his application-data changes, and would not have real rollback options.
107+
Therefore, this method defaults to the best_effort mode, where it will try to release the lock properly,
108+
but will log and swallow any exceptions encountered in the process. But, for the callers that are interested
109+
in being notified of the errors, they can pass in best_effort=False and have all the errors wrapped inside
110+
a DynamoDBLockError and raised up to them. The specific error scenarios could be one of the below:
111+
112+
* **This client does not own the lock**
113+
* This can happen if the caller tries to use this client to release a lock owned by some other client.
114+
The client will raise a DynamoDBLockError with code == LOCK_NOT_OWNED.
115+
* **The lock was stolen by some other client**
116+
* This should typically not happen unless someone messes with the back-end DynamoDB table directly. The
117+
client will raise a DynamoDBLockError with code == LOCK_STOLEN.
118+
* **Any other error/exception**
119+
* Any other error would be wrapped inside a DynamoDBLockError with code == UNKNOWN_ERROR and raised
120+
to the caller.
121+
122+
123+
Lock Heartbeat
124+
~~~~~~~~~~~~~~
125+
126+
This is an asynchronous use-case, where the caller is not directly available to handle any errors. To handle
127+
any error scenarios encountered while sending a heartbeat for a given lock, the client allows the caller to
128+
pass in an app_callback function at the time of acquiring the lock.
129+
130+
* **The lock was stolen by some other client**
131+
* This should typically not happen unless someone messes with the back-end DynamoDB table directly. The
132+
client will call the app_callback with code == LOCK_STOLEN. The callback is expected to terminate the
133+
related application processing and rollback any changes made under this lock's protection.
134+
* **The lock has entered the danger zone**
135+
* If the send_heartbeat call for a given lock fails multiple times, the lock could go over the configurable
136+
safe_period. The client will call the app_callback with code == LOCK_IN_DANGER. The callback is expected
137+
to complete/terminate the related application processing, and call the lock.release() as soon as possible.
138+
139+
Note: it is worth noting that the client spins up two separate threads - one to send out the heartbeats, and
140+
another one to check the lock-statuses. For whatever reason, if the send_heartbeat calls start hanging or
141+
taking too long, the other thread will allow the client to notify the app about the locks getting into the
142+
danger-zone. The actual app_callbacks are executed on a dedicated ThreadPoolExecutor.
143+
144+
145+
Client Close
146+
~~~~~~~~~~~~
147+
148+
By default, the lock_client.close() will NOT release all the locks - as releasing the locks prematurely while the
149+
application is still making changes assuming that it has the lock can be dangerous. As soon as a lock is released
150+
by this client, some other client may pick it up, and the associated app may start processing the underlying
151+
business entity in parallel.
152+
153+
It is highly recommended that the application manage its shutdown-lifecycle such that all the worker threads
154+
operating under these locks are first terminated (committed or rolled-back), the corresponding locks released
155+
(one at a time - by each worker thread), and then the lock_client.close() method is called. Alternatively, consider
156+
letting the process die without releasing all the locks - they will be auto-released when their lease runs out
157+
after a while.
158+
159+
That said, if the caller does wish to release all locks when closing the lock_client, it can pass in release_locks=True
160+
argument when invoking the close() method. Please note that all the locks are released in the best_effort mode -
161+
i.e. all the errors will be logged and swallowed.
162+
163+
164+
Process Termination
165+
~~~~~~~~~~~~~~~~~~~
166+
167+
A sudden process termination would leave the locks frozen with the values as of their last heartbeat. These locks
168+
will go through one of the following scenarios:
169+
170+
* **Eventual expiry - as per the TTL attribute**
171+
* Each lock has a TTL attribute (named 'expiry_time' by default) - which stores the timestamp (as epoch) after
172+
which it is eligible for auto-deletion by DynamoDB. This deletion does not have a fixed SLA - but will likley
173+
happen over the next 24 hours after the lock expires.
174+
* **Some other client tries to acquire the lock**
175+
* The client will treat the lock as an active lock - and will wait for a period equal to its lease_duration from
176+
the point it first sees the lock. This does need the acquire_lock call to be made with a retry_period larger
177+
than the lease_duration of the lock - otherwise, the acquire_lock call will timeout before the lease expires.
178+
179+
180+
Throughput Provisioning
181+
-----------------------
182+
183+
Whenever using DynamoDB, you have to think about how much read and write throughput you need to provision for your
184+
table. The DynamoDBLockClient makes the following calls to DynamoDB:
185+
186+
* **acquire_lock**
187+
* ``get_item``: at least once per lock, and more often if there is lock contention and the lock_client needs to
188+
retry multiple times before acquiring the lock.
189+
* ``put_item``: typically once per lock - whenever the lock becomes available.
190+
* ``update_item``: should be fairly rare - only needed when this client needs to take over an abandoned lock.
191+
* So, the write throughput should be directly proportional to the applications need to acquire locks, but the
192+
read throughput is a little harder to predict - it can be more sensitive to the lock contention at runtime.
193+
* **release_lock**
194+
* ``delete_item``: once per lock
195+
* So, assuming that every lock that is acquired will be released, this is also directly proportional to the
196+
application's lock acquition TPS.
197+
* **send_heartbeat**
198+
* ``update_item``: the lock client supports a deterministic model where the caller can pass in a TPS value, and
199+
the client will honor the same when making the heartbeat calls. Alternatively, the client also supports an
200+
"adaptive" mode (the default), where it will take all the active locks at the beginning of each heartbeat_period
201+
and spread their individual heartbeat calls evenly across the whole period.
202+
203+
204+
Differences from Java implementation
205+
------------------------------------
206+
207+
As indicated before, this library derives most of its design from the
208+
`dynamo-db-lock <https://github.com/awslabs/dynamodb-lock-client>`_ (Java) module. This section goes over few details
209+
where this library goes a slightly different way:
210+
211+
* **Added suport for DynadmoDB TTL attribute**
212+
* Since Feb 2017, DynamoDB supports having the tables designate one of the attributes as a TTL attribute -
213+
containing an epoch timestamp value. Once the current time goes past that value, that row becomes eligible
214+
for automated deletion by DynamoDB. These deletes do not incur any additional costs and help keep the table
215+
clean of old/stale entries.
216+
* **Dropped support for lock retention after release**
217+
* The java library supports an additional lock-attribute called "deleteOnRelease" - which allows the caller to
218+
control whether the lock, on its release, should be deleted or just marked as released. This python module
219+
drops that flexibility, and always deletes the lock on release. The idea is to not try and treat the lock
220+
table as a general purpose data-store, and treat it as a persistent representation of the "currently active
221+
locks".
222+
* **Dropped support for BLOB data field**
223+
* The java library supports a byte[] field called 'data' in addition to supporting arbitrary named fields to
224+
be stored along with any lock. This python module drops that additional data field - with the understanding
225+
that any additional data that the app wishes to store, can be passed in as part of the additional_attributes
226+
map/dict that is already supported.
227+
* **Separate lock classes to represent local vs remote locks**
228+
* The java library uses the same LockItem class to represent both the locks created/acquired by this client as
229+
well as the locks loaded from the database (currently held by other clients). This results in confusing
230+
overloading of fields e.g. the "lookupTime" is overloaded to store the "lastUpdatedTime" for the locks owned
231+
by this client, and the "lastLookupTime" for the locks owned by other clients.
232+
* **Added support for explicit and adaptive heartbeat TPS**
233+
* The java library would fire off the heartbeat updates for all the active locks one-after-another - as fast as
234+
it can, and then wait till the end ot the heartbeat_period, and then do the same thing over. This can result
235+
in significant write TPS is the application has a lot (say ~100) active locks. This python module allows the
236+
caller to specific an explicit TPS value, or use an adaptive mode - where the heartbeats are evenly spread
237+
over the whole heatbeat_period.
238+
* **Different callback model**
239+
* The java library creates a different thread for each lock that wishes to support "session-monitors". This
240+
python module uses a single thread (separate from the one used to send heartbeats) to periodically check that
241+
the locks are being "heartbeat"-ed and if needed, use a ThreadPoolExecutor to invoke the app_callbacks.
242+
* **Uses retry_period/retry_timeout arguments instead of refreshPeriod/additionalTimeToWait**
243+
* Though the logic is pretty much the same, the names are a little clearer about the intent - the "retry_period"
244+
controls how long the client waits before retrying a previously failed lock acquisition, and "retry_timeout"
245+
controls how long the client keeps retrying before giving up and raising an error.
246+
* **Simplified sort-key handling**
247+
* The java library goes to great lengths to support the caller's ability to use a simple hash-partitioned table
248+
as well as a hash-and-range partitioned table. This python module drops the support for hash-partitioned
249+
tables, and instead chooses to use a default sort-key of '-' to simplify the implementation.
250+
* **Lock release best_effort mode**
251+
* The java library defaults to best_effort == False, whereas this python module defaults to True. i.e. trying
252+
to release a lock without choosing an explicit "best_effort" setting, could result in Exceptions being
253+
thrown in Java, but would be silently logged+swallowed in Python.
254+
* **Releasing all locks on client code**
255+
* The java library will always try to release all locks when closing the lock_client. This python module will
256+
default to NOT releasing the locks on lock_client closure - but does support an optional argument called
257+
"release_locks" that will allow the caller to request lock releases. The idea behind this is that it is not
258+
a safe operation to release the locks without considering the application threads that could continue to
259+
process under the assumption that they hold a lock on the underlying business entity. Making the caller
260+
request the lock-release explicitly is meant to encourage them to try and wind up the application processing
261+
first and release the locks first, before trying to close the lock_client.
262+
* **Dropped/Missing support for AWS RequestMetricCollector**
263+
* The java library has pervasive support for collecting the AWS request metrics. This python module does not
264+
(yet) support this capability.
265+

0 commit comments

Comments
 (0)