|
1 | | -===== |
2 | 1 | Usage |
3 | 2 | ===== |
4 | 3 |
|
5 | 4 | To use Python DynamoDB Lock in a project:: |
6 | 5 |
|
7 | | - import python_dynamodb_lock |
| 6 | + from python_dynamodb_lock.python_dynamodb_lock import * |
| 7 | + |
| 8 | + |
| 9 | +Basic Usage |
| 10 | +----------- |
| 11 | + |
| 12 | +You would typically create (and shutdown) the DynamoDBLockClient at the application startup |
| 13 | +and shutdown:: |
| 14 | + |
| 15 | + # get a reference to the DynamoDB resource |
| 16 | + dynamodb_resource = boto3.resource('dynamodb') |
| 17 | + |
| 18 | + # create the lock-client |
| 19 | + lock_client = DynamoDBLockClient(dynamodb_resource) |
| 20 | + |
| 21 | + ... |
| 22 | + |
| 23 | + # close the lock_client |
| 24 | + lock_client.close() |
| 25 | + |
| 26 | + |
| 27 | +Then, you would wrap the lock acquisition and release around the code-block that needs to be |
| 28 | +protected by a mutex:: |
| 29 | + |
| 30 | + # acquire the lock |
| 31 | + lock = lock_client.acquire_lock('my_key') |
| 32 | + |
| 33 | + # ... app logic that requires the lock ... |
| 34 | + |
| 35 | + # release the lock after you are done |
| 36 | + lock.release() |
| 37 | + |
| 38 | + |
| 39 | +Both the lock_client constructor and the acquire_lock method support numerous arguments to help |
| 40 | +control/customize the behavior. Please look at the :doc:`API documentation <./python_dynamodb_lock>` |
| 41 | +for more details. |
| 42 | + |
| 43 | + |
| 44 | +Context Management |
| 45 | +------------------ |
| 46 | +The DynamoDBLock class implements the context-management interface and you can auto-release the |
| 47 | +lock by doing something like this:: |
| 48 | + |
| 49 | + with lock_client.acquire_lock('my_key'): |
| 50 | + # ... app logic that requires the lock ... |
| 51 | + |
| 52 | + |
| 53 | +Table Creation |
| 54 | +-------------- |
| 55 | +The DynamoDBLockClient provides a helper class-method to create the table in DynamoDB:: |
| 56 | + |
| 57 | + # get a reference to the DynamoDB client |
| 58 | + ddb_client = boto3.client('dynamodb') |
| 59 | + |
| 60 | + # create the table |
| 61 | + DynamoDBLockClient.create_dynamodb_table(ddb_client) |
| 62 | + |
| 63 | +The above code snippet will create a table with the default name, partition/sort-key column-names, |
| 64 | +read/write througput, but the method supports optional parameters to configure all of these. |
| 65 | + |
| 66 | +That said, you can always create the table offline (e.g. using the AWS console) and use whatever |
| 67 | +table and column names you wish. Please do remember to setup the TTL attribute to enable auto-deleting |
| 68 | +of old/abandoned locks. |
| 69 | + |
| 70 | + |
| 71 | +Error-Handling |
| 72 | +-------------- |
| 73 | + |
| 74 | +There are a lot of things that can go wrong when dealing with distributed systems - the library |
| 75 | +tries to strike the right balance between hiding these errors, and allowing the library to handle |
| 76 | +specific kinds of errors as needed. Let's go through the different use-cases one at a time. |
| 77 | + |
| 78 | + |
| 79 | +Lock Acquisition |
| 80 | +~~~~~~~~~~~~~~~~ |
| 81 | + |
| 82 | +This is a synchronous use-case where the caller is waiting till it receives a lock. In this case, |
| 83 | +most of the errors are wrapped inside a DynamoDBError and raised up to the caller. The key error |
| 84 | +scenarios are the following: |
| 85 | + |
| 86 | +* **Some other client holds the lock** |
| 87 | + * This is not treated as real error scenario. This client would just wait for a configurable |
| 88 | + retry_period, and then try to acquire the lock again. |
| 89 | +* **Race-condition amongst multiple lock-clients waiting to acquire lock** |
| 90 | + * Whenever the "old" lock is released (or expires), there may be multiple "new" clients trying |
| 91 | + to grab the lock - in which case, one of those would succeed, and the rest of them would get |
| 92 | + a DynamoDB's ConditionalUpdateException. This is also not treated as a real error scenario, and |
| 93 | + the client would just wait for the retry_period and then try again. |
| 94 | +* **This client goes over the configurable retry_timeout period** |
| 95 | + * After repeated retry attempts, this client might eventually go over the retry_timeout period |
| 96 | + (as provided by the caller) - then, a DynamoDBLockError with code == ACQUIRE_TIMEOUT will be thrown. |
| 97 | +* **Any other error/exception** |
| 98 | + * Any other error would be wrapped inside a DynamoDBLockError with code == UNKNOWN_ERROR and raised |
| 99 | + to the caller. |
| 100 | + |
| 101 | + |
| 102 | +Lock Release |
| 103 | +~~~~~~~~~~~~ |
| 104 | + |
| 105 | +While this is also a synchronous use-case, in most cases, by the time this method is called, the caller |
| 106 | +would have already committed his application-data changes, and would not have real rollback options. |
| 107 | +Therefore, this method defaults to the best_effort mode, where it will try to release the lock properly, |
| 108 | +but will log and swallow any exceptions encountered in the process. But, for the callers that are interested |
| 109 | +in being notified of the errors, they can pass in best_effort=False and have all the errors wrapped inside |
| 110 | +a DynamoDBLockError and raised up to them. The specific error scenarios could be one of the below: |
| 111 | + |
| 112 | +* **This client does not own the lock** |
| 113 | + * This can happen if the caller tries to use this client to release a lock owned by some other client. |
| 114 | + The client will raise a DynamoDBLockError with code == LOCK_NOT_OWNED. |
| 115 | +* **The lock was stolen by some other client** |
| 116 | + * This should typically not happen unless someone messes with the back-end DynamoDB table directly. The |
| 117 | + client will raise a DynamoDBLockError with code == LOCK_STOLEN. |
| 118 | +* **Any other error/exception** |
| 119 | + * Any other error would be wrapped inside a DynamoDBLockError with code == UNKNOWN_ERROR and raised |
| 120 | + to the caller. |
| 121 | + |
| 122 | + |
| 123 | +Lock Heartbeat |
| 124 | +~~~~~~~~~~~~~~ |
| 125 | + |
| 126 | +This is an asynchronous use-case, where the caller is not directly available to handle any errors. To handle |
| 127 | +any error scenarios encountered while sending a heartbeat for a given lock, the client allows the caller to |
| 128 | +pass in an app_callback function at the time of acquiring the lock. |
| 129 | + |
| 130 | +* **The lock was stolen by some other client** |
| 131 | + * This should typically not happen unless someone messes with the back-end DynamoDB table directly. The |
| 132 | + client will call the app_callback with code == LOCK_STOLEN. The callback is expected to terminate the |
| 133 | + related application processing and rollback any changes made under this lock's protection. |
| 134 | +* **The lock has entered the danger zone** |
| 135 | + * If the send_heartbeat call for a given lock fails multiple times, the lock could go over the configurable |
| 136 | + safe_period. The client will call the app_callback with code == LOCK_IN_DANGER. The callback is expected |
| 137 | + to complete/terminate the related application processing, and call the lock.release() as soon as possible. |
| 138 | + |
| 139 | +Note: it is worth noting that the client spins up two separate threads - one to send out the heartbeats, and |
| 140 | +another one to check the lock-statuses. For whatever reason, if the send_heartbeat calls start hanging or |
| 141 | +taking too long, the other thread will allow the client to notify the app about the locks getting into the |
| 142 | +danger-zone. The actual app_callbacks are executed on a dedicated ThreadPoolExecutor. |
| 143 | + |
| 144 | + |
| 145 | +Client Close |
| 146 | +~~~~~~~~~~~~ |
| 147 | + |
| 148 | +By default, the lock_client.close() will NOT release all the locks - as releasing the locks prematurely while the |
| 149 | +application is still making changes assuming that it has the lock can be dangerous. As soon as a lock is released |
| 150 | +by this client, some other client may pick it up, and the associated app may start processing the underlying |
| 151 | +business entity in parallel. |
| 152 | + |
| 153 | +It is highly recommended that the application manage its shutdown-lifecycle such that all the worker threads |
| 154 | +operating under these locks are first terminated (committed or rolled-back), the corresponding locks released |
| 155 | +(one at a time - by each worker thread), and then the lock_client.close() method is called. Alternatively, consider |
| 156 | +letting the process die without releasing all the locks - they will be auto-released when their lease runs out |
| 157 | +after a while. |
| 158 | + |
| 159 | +That said, if the caller does wish to release all locks when closing the lock_client, it can pass in release_locks=True |
| 160 | +argument when invoking the close() method. Please note that all the locks are released in the best_effort mode - |
| 161 | +i.e. all the errors will be logged and swallowed. |
| 162 | + |
| 163 | + |
| 164 | +Process Termination |
| 165 | +~~~~~~~~~~~~~~~~~~~ |
| 166 | + |
| 167 | +A sudden process termination would leave the locks frozen with the values as of their last heartbeat. These locks |
| 168 | +will go through one of the following scenarios: |
| 169 | + |
| 170 | +* **Eventual expiry - as per the TTL attribute** |
| 171 | + * Each lock has a TTL attribute (named 'expiry_time' by default) - which stores the timestamp (as epoch) after |
| 172 | + which it is eligible for auto-deletion by DynamoDB. This deletion does not have a fixed SLA - but will likley |
| 173 | + happen over the next 24 hours after the lock expires. |
| 174 | +* **Some other client tries to acquire the lock** |
| 175 | + * The client will treat the lock as an active lock - and will wait for a period equal to its lease_duration from |
| 176 | + the point it first sees the lock. This does need the acquire_lock call to be made with a retry_period larger |
| 177 | + than the lease_duration of the lock - otherwise, the acquire_lock call will timeout before the lease expires. |
| 178 | + |
| 179 | + |
| 180 | +Throughput Provisioning |
| 181 | +----------------------- |
| 182 | + |
| 183 | +Whenever using DynamoDB, you have to think about how much read and write throughput you need to provision for your |
| 184 | +table. The DynamoDBLockClient makes the following calls to DynamoDB: |
| 185 | + |
| 186 | +* **acquire_lock** |
| 187 | + * ``get_item``: at least once per lock, and more often if there is lock contention and the lock_client needs to |
| 188 | + retry multiple times before acquiring the lock. |
| 189 | + * ``put_item``: typically once per lock - whenever the lock becomes available. |
| 190 | + * ``update_item``: should be fairly rare - only needed when this client needs to take over an abandoned lock. |
| 191 | + * So, the write throughput should be directly proportional to the applications need to acquire locks, but the |
| 192 | + read throughput is a little harder to predict - it can be more sensitive to the lock contention at runtime. |
| 193 | +* **release_lock** |
| 194 | + * ``delete_item``: once per lock |
| 195 | + * So, assuming that every lock that is acquired will be released, this is also directly proportional to the |
| 196 | + application's lock acquition TPS. |
| 197 | +* **send_heartbeat** |
| 198 | + * ``update_item``: the lock client supports a deterministic model where the caller can pass in a TPS value, and |
| 199 | + the client will honor the same when making the heartbeat calls. Alternatively, the client also supports an |
| 200 | + "adaptive" mode (the default), where it will take all the active locks at the beginning of each heartbeat_period |
| 201 | + and spread their individual heartbeat calls evenly across the whole period. |
| 202 | + |
| 203 | + |
| 204 | +Differences from Java implementation |
| 205 | +------------------------------------ |
| 206 | + |
| 207 | +As indicated before, this library derives most of its design from the |
| 208 | +`dynamo-db-lock <https://github.com/awslabs/dynamodb-lock-client>`_ (Java) module. This section goes over few details |
| 209 | +where this library goes a slightly different way: |
| 210 | + |
| 211 | +* **Added suport for DynadmoDB TTL attribute** |
| 212 | + * Since Feb 2017, DynamoDB supports having the tables designate one of the attributes as a TTL attribute - |
| 213 | + containing an epoch timestamp value. Once the current time goes past that value, that row becomes eligible |
| 214 | + for automated deletion by DynamoDB. These deletes do not incur any additional costs and help keep the table |
| 215 | + clean of old/stale entries. |
| 216 | +* **Dropped support for lock retention after release** |
| 217 | + * The java library supports an additional lock-attribute called "deleteOnRelease" - which allows the caller to |
| 218 | + control whether the lock, on its release, should be deleted or just marked as released. This python module |
| 219 | + drops that flexibility, and always deletes the lock on release. The idea is to not try and treat the lock |
| 220 | + table as a general purpose data-store, and treat it as a persistent representation of the "currently active |
| 221 | + locks". |
| 222 | +* **Dropped support for BLOB data field** |
| 223 | + * The java library supports a byte[] field called 'data' in addition to supporting arbitrary named fields to |
| 224 | + be stored along with any lock. This python module drops that additional data field - with the understanding |
| 225 | + that any additional data that the app wishes to store, can be passed in as part of the additional_attributes |
| 226 | + map/dict that is already supported. |
| 227 | +* **Separate lock classes to represent local vs remote locks** |
| 228 | + * The java library uses the same LockItem class to represent both the locks created/acquired by this client as |
| 229 | + well as the locks loaded from the database (currently held by other clients). This results in confusing |
| 230 | + overloading of fields e.g. the "lookupTime" is overloaded to store the "lastUpdatedTime" for the locks owned |
| 231 | + by this client, and the "lastLookupTime" for the locks owned by other clients. |
| 232 | +* **Added support for explicit and adaptive heartbeat TPS** |
| 233 | + * The java library would fire off the heartbeat updates for all the active locks one-after-another - as fast as |
| 234 | + it can, and then wait till the end ot the heartbeat_period, and then do the same thing over. This can result |
| 235 | + in significant write TPS is the application has a lot (say ~100) active locks. This python module allows the |
| 236 | + caller to specific an explicit TPS value, or use an adaptive mode - where the heartbeats are evenly spread |
| 237 | + over the whole heatbeat_period. |
| 238 | +* **Different callback model** |
| 239 | + * The java library creates a different thread for each lock that wishes to support "session-monitors". This |
| 240 | + python module uses a single thread (separate from the one used to send heartbeats) to periodically check that |
| 241 | + the locks are being "heartbeat"-ed and if needed, use a ThreadPoolExecutor to invoke the app_callbacks. |
| 242 | +* **Uses retry_period/retry_timeout arguments instead of refreshPeriod/additionalTimeToWait** |
| 243 | + * Though the logic is pretty much the same, the names are a little clearer about the intent - the "retry_period" |
| 244 | + controls how long the client waits before retrying a previously failed lock acquisition, and "retry_timeout" |
| 245 | + controls how long the client keeps retrying before giving up and raising an error. |
| 246 | +* **Simplified sort-key handling** |
| 247 | + * The java library goes to great lengths to support the caller's ability to use a simple hash-partitioned table |
| 248 | + as well as a hash-and-range partitioned table. This python module drops the support for hash-partitioned |
| 249 | + tables, and instead chooses to use a default sort-key of '-' to simplify the implementation. |
| 250 | +* **Lock release best_effort mode** |
| 251 | + * The java library defaults to best_effort == False, whereas this python module defaults to True. i.e. trying |
| 252 | + to release a lock without choosing an explicit "best_effort" setting, could result in Exceptions being |
| 253 | + thrown in Java, but would be silently logged+swallowed in Python. |
| 254 | +* **Releasing all locks on client code** |
| 255 | + * The java library will always try to release all locks when closing the lock_client. This python module will |
| 256 | + default to NOT releasing the locks on lock_client closure - but does support an optional argument called |
| 257 | + "release_locks" that will allow the caller to request lock releases. The idea behind this is that it is not |
| 258 | + a safe operation to release the locks without considering the application threads that could continue to |
| 259 | + process under the assumption that they hold a lock on the underlying business entity. Making the caller |
| 260 | + request the lock-release explicitly is meant to encourage them to try and wind up the application processing |
| 261 | + first and release the locks first, before trying to close the lock_client. |
| 262 | +* **Dropped/Missing support for AWS RequestMetricCollector** |
| 263 | + * The java library has pervasive support for collecting the AWS request metrics. This python module does not |
| 264 | + (yet) support this capability. |
| 265 | + |
0 commit comments