44This is a general purpose distributed locking library built on top of DynamoDB. It is heavily
55"inspired" by the java-based AmazonDynamoDBLockClient library, and supports both coarse-grained
66and fine-grained locking.
7+
8+ Note that while the lock itself can offer fairly strong consistency guarantees, it does NOT
9+ participate in any kind of distributed transaction.
10+
11+ For example, you may wish to acquire a lock for some customer-id "xyz", and then make some changes
12+ to the corresponding database entry for this customer-id, and then release the lock - thereby
13+ guaranteeing that only one process changes any given customer-id at a time.
14+
15+ While the happy path looks okay, consider a case where the application changes take a long time,
16+ and some errors/gc-pauses prevent the heartbeat from updating the lock. Then, some other client
17+ can assume the lock to be abandoned, and start processing the same customer in parallel. The original
18+ lock-client will recognize that its lock has been "stolen" and will let the app know through a callback
19+ event, but the app may have already committed its changes to the database. This can only be solved by
20+ having the application changes and the lock-release be part of a single distributed transaction - which,
21+ as indicated earlier, is NOT supported.
22+
23+ That said, in most cases, where the heartbeat is not expected to get delayed beyond the lock's lease
24+ duration, the implementation should work just fine.
725"""
826
927from botocore .exceptions import ClientError
2341class DynamoDBLockClient :
2442 """
2543 Provides distributed locks using DynamoDB's support for conditional reads/writes.
26-
27- Note that while the lock itself can offer fairly strong consistency guarantees, it does NOT
28- participate in any kind of distributed transaction. For example, you may wish to acquire a lock
29- for some customer-id "xyz", and then make some changes to the corresponding database entry for this
30- customer-id, and then release the lock - thereby guaranteeing that only one process changes any
31- given customer-id at a time. While the happy path looks okay, consider a case where the application
32- changes take a long time, and some errors/gc-pauses prevent the heartbeat from updating the lock -
33- then, some other client can assume the lock to be abandoned, and start processing the same customer
34- in parallel. The original lock-client will recognize that its lock has been "stolen" and will let
35- the app know through a callback event, but the app may have already commited its changes to the
36- database. This can only be solved by having the application changes and the lock-release be part
37- of a single distributed transaction - which, as indicated earlier, is NOT supported.
38-
39- That said, in most cases, where the heartbeat is not expected to get delayed beyond the lock's lease
40- duration, the implementation should work just fine.
4144 """
4245
4346 # default values for class properties
@@ -134,7 +137,7 @@ def run(self):
134137 Keeps renewing the leases for the locks owned by this client - till the client is closed.
135138
136139 The method has a while loop that wakes up on a periodic basis (as defined by the heartbeat_period)
137- and invokes the _send_heartbeat () method on each lock.
140+ and invokes the send_heartbeat () method on each lock.
138141 """
139142 # e.g. 5 TPS => each loop should take an average of 0.2 seconds (200ms)
140143 avg_loop_time = 1.0 / self .heartbeat_tps
@@ -146,7 +149,7 @@ def run(self):
146149
147150 for uid , lock in self ._locks .copy ().items ():
148151 count += 1
149- self ._send_heartbeat (lock )
152+ self .send_heartbeat (lock )
150153 # After each lock, sleep a little (if needed) to honor the heartbeat_tps
151154 curr_loop_end_time = time .time ()
152155 next_loop_start_time = start_time + count * avg_loop_time
@@ -161,7 +164,7 @@ def run(self):
161164 time .sleep ( next_start_time - end_time )
162165
163166
164- def _send_heartbeat (self , lock ):
167+ def send_heartbeat (self , lock ):
165168 """
166169 Renews the lease for the given lock.
167170
@@ -174,12 +177,14 @@ def _send_heartbeat(self, lock):
174177 (lock requestor) app know when there are significant events in the lock lifecycle. There
175178 are two such events:
176179
177- 1) LOCK_STOLEN: When the heartbeat process finds that someone else has taken over the lock,
180+ 1) LOCK_STOLEN
181+ When the heartbeat process finds that someone else has taken over the lock,
178182 or it has been released/deleted without the lock-client's knowledge. In this case, the
179183 app_callback should just try to abort its processing and roll back any changes it had
180184 made with the assumption that it owned the lock. This is not a normal occurrance and
181185 should only happen if someone manually changes/deletes the data in DynamoDB.
182- 2) LOCK_IN_DANGER: When the heartbeat for a given lock has failed multiple times, and it is
186+ 2) LOCK_IN_DANGER
187+ When the heartbeat for a given lock has failed multiple times, and it is
183188 now in danger of going past its lease-duration without a successful heartbeat - at which
184189 point, any other client waiting to acquire the lock will consider it abandoned and take
185190 over. In this case, the app_callback should try to expedite the processing, either
@@ -270,21 +275,24 @@ def acquire_lock(self,
270275 If the lock is currently held by a different client, then this client will keep retrying on
271276 a periodic basis. In that case, a few different things can happen:
272277
273- 1) The other client releases the lock, which would basically delete it from the database -
274- allowing this client to try and insert its own record instead.
275- 2) The other client dies, and the lock stops getting updated by the heartbeat thread. While
276- waiting for a lock, this client keeps track of the local-time whenever it sees the lock's
278+ 1) The other client releases the lock - basically deleting it from the database
279+ Which would allow this client to try and insert its own record instead.
280+ 2) The other client dies, and the lock stops getting updated by the heartbeat thread.
281+ While waiting for a lock, this client keeps track of the local-time whenever it sees the lock's
277282 record-version-number change. From that point-in-time, it needs to wait for a period of time
278283 equal to the lock's lease duration before concluding that the lock has been abandoned and try
279284 to overwrite the database entry with its own lock.
280- 3) While waiting for the other client to release the lock (or for the lock's lease to expire), this
281- client may go over the max-retry-period (i.e. the retry_timeout) allowed by the caller - in
282- which case, a DynamoDBLockError with code == ACQUIRE_TIMEOUT will be thrown.
283- 4) Whenever the "old" lock is released (or expires), there may be multiple "new" clients trying
285+ 3) This client goes over the max-retry-timeout-period
286+ While waiting for the other client to release the lock (or for the lock's lease to expire), this
287+ client may go over the retry_timeout period (as provided by the caller) - in which case, a
288+ DynamoDBLockError with code == ACQUIRE_TIMEOUT will be thrown.
289+ 4) Race-condition amongst multiple lock-clients waiting to acquire lock
290+ Whenever the "old" lock is released (or expires), there may be multiple "new" clients trying
284291 to grab the lock - in which case, one of those would succeed, and the rest of them would get
285292 a "conditional-update-exception". This is just logged and swallowed internally - and the
286293 client moves on to another sleep-retry cycle.
287- 5) Any other error/exception - wrapped inside a DynamoDBLockError and raised to the caller.
294+ 5) Any other error/exception
295+ Would be wrapped inside a DynamoDBLockError and raised to the caller.
288296
289297 :param str partition_key: The primary lock identifier
290298 :param str sort_key: Forms a "composite identifier" along with the partition_key. Defaults to '-'
0 commit comments