DataONEorg · datadavev · Jun 1, 2026 · Dec 4, 2025 · Dec 4, 2025 · Dec 4, 2025
diff --git a/.github/workflows/poetry-package-test.yml b/.github/workflows/poetry-package-test.yml
diff --git a/.github/workflows/uv-package-test.yml b/.github/workflows/uv-package-test.yml
@@ -0,0 +1,28 @@
+name: Python CI with uv and pytest
+on:
+  workflow_dispatch:
+  push:
+    branches: ["main", "develop"]
+  pull_request:
+    branches: ["main", "develop"]
+jobs:
+  build:
+    runs-on: ubuntu-latest
+    strategy:
+      fail-fast: false
+      matrix:
+        python-version: ["3.9", "3.10", "3.11", "3.12", "3.13", "3.14"]
+    steps:
+      - uses: actions/checkout@v5
+
+      - name: Setup uv
+        uses: astral-sh/setup-uv@v7
+        with:
+          version: "0.9.15"
+          python-version: ${{ matrix.python-version }}
+
+      - name: Install the project
+        run: uv sync --all-extras --dev
+
+      - name: Run tests with pytest
+        run: uv run pytest tests
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -0,0 +1,32 @@
+ci:
+  autoupdate_commit_msg: "chore: update pre-commit hooks"
+  autofix_commit_msg: "style: pre-commit fixes"
+
+exclude: "^(tests/testdata/)"
+
+repos:
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: "v6.0.0"
+    hooks:
+      - id: check-added-large-files
+      - id: check-case-conflict
+      - id: check-merge-conflict
+      - id: check-symlinks
+      - id: check-yaml
+      - id: debug-statements
+      - id: end-of-file-fixer
+      - id: mixed-line-ending
+      - id: name-tests-test
+        args: ["--pytest-test-first"]
+      - id: requirements-txt-fixer
+      - id: trailing-whitespace
+
+  - repo: https://github.com/astral-sh/ruff-pre-commit
+    rev: "v0.14.13"
+    hooks:
+      # first, lint + autofix
+      - id: ruff
+        types_or: [python, pyi, jupyter]
+        args: ["--fix", "--show-fixes"]
+      # then, format
+      - id: ruff-format
diff --git a/.vscode/settings.json b/.vscode/settings.json
@@ -9,4 +9,4 @@
     "[python]": {
         "editor.defaultFormatter": "ms-python.black-formatter"
     }
-}
+}
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -45,21 +45,21 @@ In short:
 
 ## 🔀 Development Workflow
 
-Development is managed through the git repository at https://github.com/DataONEorg/hashstore.  The repository is organized into several branches, each with a specific purpose.  
+Development is managed through the git repository at https://github.com/DataONEorg/hashstore.  The repository is organized into several branches, each with a specific purpose.
 
 **main**. The `main` branch represents the stable branch that is constantly maintained with the current release.  It should generally be safe to install and use the `main` branch the same way as binary releases. The version number in all configuration files and the README on the `main` branch follows [semantic versioning](https://semver.org/) and should always be set to the current stable release, for example `2.8.5`.
 
 **develop**. Development takes place on a single branch for integrated development and testing of the set of features
 targeting the next release. Commits should only be pushed to this branch once they are ready to be deployed to
 production immediately after being pushed. This keeps the `develop` branch in a state of readiness for the next release.
-Any unreleased code changes on the `develop` branch represent changes that have been tested and staged for the next 
-release. 
+Any unreleased code changes on the `develop` branch represent changes that have been tested and staged for the next
+release.
 The tip of the `develop` branch always represents the set of features that are awaiting the next release. The develop
 branch represents the opportunity to integrate changes from multiple features for integrated testing before release.
 
 Version numbers on the `develop` branch represent either the planned next release number (e.g., `2.9.0`), or the planned next release number with a `beta` designator or release candidate `rc` designator appended as appropriate.  For example, `2.8.6-beta1` or `2.9.0-rc1`.
 
-**feature**. To isolate development on a specific set of capabilities, especially if it may be disruptive to other 
+**feature**. To isolate development on a specific set of capabilities, especially if it may be disruptive to other
 developers working on the `develop` branch, feature branches should be created.
 
 Feature branches are named as `feature-` + `{issue}` +  `-{short-description}`, with `{issue}` being the GitHub issue number related to that new feature. e.g. `feature-23-refactor-storage`.
@@ -73,11 +73,11 @@ been tested and are awaiting release.  Thus, each `feature-*` branch can be test
 ### Development flow overview
 
 ```mermaid
-%%{init: {  'theme': 'base', 
+%%{init: {  'theme': 'base',
             'gitGraph': {
                 'rotateCommitLabel': false,
                 'showCommitLabel': false
-            },            
+            },
             'themeVariables': {
               'commitLabelColor': '#ffffffff',
               'commitLabelBackground': '#000000'
@@ -110,8 +110,8 @@ gitGraph
 changes that are desired in a release are merged into the `develop` branch, we run
 the full set of tests on a clean checkout of the `develop` branch.
 2. After testing, the `develop` branch is merged to main, and the `main` branch is tagged with
-the new version number (e.g. `2.11.2`). At this point, the tip of the `main` branch will 
-reflect the new release and the `develop` branch can be fast-forwarded to sync with `main` to 
+the new version number (e.g. `2.11.2`). At this point, the tip of the `main` branch will
+reflect the new release and the `develop` branch can be fast-forwarded to sync with `main` to
 start work on the next release.
 3. Releases can be downloaded from the [GitHub releases page](https://github.com/DataONEorg/hashstore/releases).
 

diff --git a/README.md b/README.md
@@ -16,18 +16,18 @@ Version: 1.1.0
 
 Cite this software as:
 
-> Dou Mok, Matthew Brooke, Jing Tao, Jeanette Clarke, Ian Nesbitt, Matthew B. Jones. 2024. 
+> Dou Mok, Matthew Brooke, Jing Tao, Jeanette Clarke, Ian Nesbitt, Matthew B. Jones. 2024.
 > HashStore: hash-based object storage for DataONE data packages. Arctic Data Center.
 > [doi:10.18739/A2ZG6G87Q](https://doi.org/10.18739/A2ZG6G87Q)
 
 ## Introduction
 
-HashStore is a server-side python package that implements a hash-based object storage file system 
-for storing and accessing data and metadata for DataONE services. The package is used in DataONE 
-system components that need direct, filesystem-based access to data objects, their system 
-metadata, and extended metadata about the objects. This package is a core component of the 
-[DataONE federation](https://dataone.org), and supports large-scale object storage for a variety 
-of repositories, including the [KNB Data Repository](http://knb.ecoinformatics.org), the [NSF 
+HashStore is a server-side python package that implements a hash-based object storage file system
+for storing and accessing data and metadata for DataONE services. The package is used in DataONE
+system components that need direct, filesystem-based access to data objects, their system
+metadata, and extended metadata about the objects. This package is a core component of the
+[DataONE federation](https://dataone.org), and supports large-scale object storage for a variety
+of repositories, including the [KNB Data Repository](http://knb.ecoinformatics.org), the [NSF
 Arctic Data Center](https://arcticdata.io/catalog/), the [DataONE search service](https://search.dataone.org), and other repositories.
 
 DataONE in general, and HashStore in particular, are open source, community projects.
@@ -38,17 +38,17 @@ contributions with us.
 
 ## Documentation
 
-The documentation around HashStore's initial design phase can be found here in the [Metacat 
+The documentation around HashStore's initial design phase can be found here in the [Metacat
 repository](https://github.com/NCEAS/metacat/blob/feature-1436-storage-and-indexing/docs/user/metacat/source/storage-subsystem.rst#physical-file-layout)
 as part of the storage re-design planning. Future updates will include documentation here as the
 package matures.
 
 ## HashStore Overview
 
-HashStore is a hash-based object storage system that provides persistent file-based storage using 
-content hashes to de-duplicate data. The system stores data objects, references (refs) and 
-metadata in its respective directories and utilizes an identifier-based API for interacting 
-with the store. HashStore storage classes (like `filehashstore`) must implement the HashStore 
+HashStore is a hash-based object storage system that provides persistent file-based storage using
+content hashes to de-duplicate data. The system stores data objects, references (refs) and
+metadata in its respective directories and utilizes an identifier-based API for interacting
+with the store. HashStore storage classes (like `filehashstore`) must implement the HashStore
 interface to ensure the consistent and expected usage of HashStore.
 
 ### Public API Methods
@@ -160,11 +160,11 @@ metadata_cid_two = hashstore.store_metadata(pid, metadata, format_id)
 
 ### Working with objects (store, retrieve, delete)
 
-In HashStore, data objects begin as temporary files while their content identifiers are 
+In HashStore, data objects begin as temporary files while their content identifiers are
 calculated. Once the default hash algorithm list and their hashes are generated, objects are stored
-in their permanent locations using the hash value of the store's configured algorithm, and 
-then divided accordingly based on the configured width and depth. Lastly, objects are 'tagged' 
-with a given identifier (ex. persistent identifier (pid)). This process produces reference 
+in their permanent locations using the hash value of the store's configured algorithm, and
+then divided accordingly based on the configured width and depth. Lastly, objects are 'tagged'
+with a given identifier (ex. persistent identifier (pid)). This process produces reference
 files, which allow objects to be found and retrieved with a given identifier.
 
 - Note 1: An identifier can only be used once
@@ -176,9 +176,9 @@ files, which allow objects to be found and retrieved with a given identifier.
 By calling the various interface methods for  `store_object`, the calling app/client can validate,
 store and tag an object simultaneously if the relevant data is available. In the absence of an
 identifier (ex. persistent identifier (pid)), `store_object` can be called to solely store an
-object. The client is then expected to call `delete_if_invalid_object` when the relevant 
+object. The client is then expected to call `delete_if_invalid_object` when the relevant
 metadata is available to confirm that the object is what is expected. And to finalize the data-only
-storage process (to make the object discoverable), the client calls `tagObject``. In summary, there 
+storage process (to make the object discoverable), the client calls `tagObject``. In summary, there
 are two expected paths to store an object:
 
 ```py
@@ -263,16 +263,16 @@ ex. `store_metadata(stream, pid, format_id)`).
 
 ### What are HashStore reference files?
 
-HashStore assumes that every data object is referenced by its a respective identifier. This 
-identifier is then used when storing, retrieving and deleting an object. In order to facilitate 
+HashStore assumes that every data object is referenced by its a respective identifier. This
+identifier is then used when storing, retrieving and deleting an object. In order to facilitate
 this process, we create two types of reference files:
 
 - pid (persistent identifier) reference files
 - cid (content identifier) reference files
 
 These reference files are implemented in HashStore underneath the hood with no expectation for
 modification from the calling app/client. The one and only exception to this process is when the
-calling client/app does not have an identifier available (i.e. they receive the stream to store 
+calling client/app does not have an identifier available (i.e. they receive the stream to store
 the data object first without any metadata, thus calling `store_object(stream)`).
 
 **'pid' Reference Files**
@@ -282,7 +282,7 @@ the data object first without any metadata, thus calling `store_object(stream)`)
 - If an identifier is not available at the time of storing an object, the calling app/client must
   create this association between a pid and the object it represents by calling `tag_object`
   separately.
-- Each pid reference file contains a single string that represents the content identifier of the 
+- Each pid reference file contains a single string that represents the content identifier of the
   object it references
 - Like how objects are stored once and only once, there is also only one pid reference file for each
   data object.
@@ -297,10 +297,10 @@ the data object first without any metadata, thus calling `store_object(stream)`)
 
 ## Concurrency in HashStore
 
-HashStore is both threading and multiprocessing safe, and by default synchronizes calls to store & 
-delete objects/metadata with Python's threading module. If you wish to use multiprocessing to 
-parallelize your application, please declare a global environment variable `USE_MULTIPROCESSING` 
-as `True` before initializing Hashstore. This will direct the relevant Public API calls to 
+HashStore is both threading and multiprocessing safe, and by default synchronizes calls to store &
+delete objects/metadata with Python's threading module. If you wish to use multiprocessing to
+parallelize your application, please declare a global environment variable `USE_MULTIPROCESSING`
+as `True` before initializing Hashstore. This will direct the relevant Public API calls to
 synchronize using the Python `multiprocessing` module's locks and conditions.
 Please see below for example:
 
@@ -316,13 +316,23 @@ use_multiprocessing = os.getenv("USE_MULTIPROCESSING", "False") == "True"
 
 ## Development build
 
-HashStore is a python package, and built using the [Python Poetry](https://python-poetry.org)
-build tool.
-
-To install `hashstore` locally, create a virtual environment for python 3.9+,
-install poetry, and then install or build the package with `poetry install` or `poetry build`,
-respectively. Note, installing `hashstore` with poetry will also make the `hashstore` command 
-available through the command line terminal (see `HashStore Client` section below for details).
+HashStore is a python package. We recommend installing it using `uv`. Instructions on how to install and set up `uv` can be found [here](https://gist.github.com/datadavev/3975f244e5db500ba0328ef771ca74dd).
+
+Friendly Notes:
+ - You may run into a `command not found: compdef` when adding code to your `.zshrc` file, this can be resolved by adjusting the code to be:
+    ```sh
+    # .zshrc
+    autoload -Uz compinit
+    compinit
+    eval "$(uv generate-shell-completion zsh)"
+    eval "$(uvx --generate-shell-completion zsh)"
+    ```
+  - When downloading the script `uv-python-symlink`, an extension may be added to it, for example: `uv-python-symlink.txt`. It may also not have an executable status. You can execute the following to adjust it:
+    ```sh
+    $ mv uv-python-symlink uv-python-symlink.sh
+    chmod +x uv-python-symlink.sh
+    ```
+  - After following the steps and navigating to the python project, `uv` may not have sufficient permissions to run. Follow the given prompts and execute `direnv allow`
 
 To run tests, navigate to the root directory and run `pytest`. The test suite contains tests that
 take a longer time to run (relating to the storage of large files) - to execute all tests, run
@@ -404,5 +414,3 @@ California.
 [![DataONE_footer](https://user-images.githubusercontent.com/6643222/162324180-b5cf0f5f-ae7a-4ca6-87c3-9733a2590634.png)](https://dataone.org)
 
 [![nceas_footer](https://www.nceas.ucsb.edu/sites/default/files/2020-03/NCEAS-full%20logo-4C.png)](https://www.nceas.ucsb.edu)
-
-
diff --git a/hashstore.code-workspace b/hashstore.code-workspace
@@ -5,4 +5,4 @@
 		}
 	],
 	"settings": {}
-}
+}
-Original file line number
+Diff line change
@@ Expand Up / @@ -9,4 +9,4 @@ @@
         "[python]": {
             "editor.defaultFormatter": "ms-python.black-formatter"
         }
-    }
+    }