IQSS · landreev · Dec 9, 2025 · Nov 24, 2025 · Nov 24, 2025 · Nov 24, 2025
diff --git a/doc/release-notes/11639-db-opts-idempotency.md b/doc/release-notes/11639-db-opts-idempotency.md
@@ -43,3 +43,13 @@ The following database settings are were added to the official list within the c
 - `:LDNAnnounceRequiredFields`
 - `:LDNTarget`
 - `:WorkflowsAdminIpWhitelist` - formerly `WorkflowsAdmin#IP_WHITELIST_KEY`
+- `:PrePublishDatasetWorkflowId` - formerly `WorkflowServiceBean.WorkflowId:PrePublishDataset`
+- `:PostPublishDatasetWorkflowId` - formerly `WorkflowServiceBean.WorkflowId:PostPublishDataset`
+
+### Important Considerations During Upgrade Of Your Installation
+
+1. Running a customized fork? Make sure to add any custom settings to the SettingsServiceBean.Key enum before deploying!
+2. Any database settings not contained in the `SettingServiceBean.Key` will be removed from your database during each deployment cycle.
+3. As always when upgrading, make sure to backup your database beforehand!
+   You can also use the existing API endpoint `/api/admin/settings` to retrieve all settings as JSONish data for a quick backup before upgrading.
+
diff --git a/doc/sphinx-guides/source/developers/workflows.rst b/doc/sphinx-guides/source/developers/workflows.rst
@@ -27,6 +27,8 @@ If a step in a workflow fails, the Dataverse installation makes an effort to rol
   provider offers two steps for sending and receiving customizable HTTP requests.
   *http/sr* and *http/authExt*, detailed below, with the latter able to use the API to make changes to the dataset being processed. (Both lock the dataset to prevent other processes from changing the dataset between the time the step is launched to when the external process responds to the Dataverse instance.)
 
+.. _workflow_admin:
+
 Administration
 ~~~~~~~~~~~~~~
 
@@ -36,6 +38,8 @@ At the moment, defining a workflow for each trigger is done for the entire insta
 
 In order to prevent unauthorized resuming of workflows, the Dataverse installation maintains a "white list" of IP addresses from which resume requests are honored. This list is maintained using the ``/api/admin/workflows/ip-whitelist`` endpoint of the :doc:`/api/native-api`. By default, the Dataverse installation honors resume requests from localhost only (``127.0.0.1;::1``), so set-ups that use a single server work with no additional configuration.
 
+Note: these settings are also exposed and manageable via the Settings API.
+See :ref:`:WorkflowsAdminIpWhitelist`, :ref:`:PrePublishDatasetWorkflowId` and :ref:`:PostPublishDatasetWorkflowId`
 
 Available Steps
 ~~~~~~~~~~~~~~~

diff --git a/doc/sphinx-guides/source/installation/config.rst b/doc/sphinx-guides/source/installation/config.rst
@@ -2396,6 +2396,9 @@ The workflow id returned in this call (or available by doing a GET of /api/admin
 
 Once these steps are taken, new publication requests will automatically trigger submission of an archival copy to the specified archiver, Chronopolis' DuraCloud component in this example. For Chronopolis, as when using the API, it is currently the admin's responsibility to snap-shot the DuraCloud space and monitor the result. Failure of the workflow, (e.g. if DuraCloud is unavailable, the configuration is wrong, or the space for this dataset already exists due to a prior publication action or use of the API), will create a failure message but will not affect publication itself.
 
+Note: setting the default workflow is also available via the Settings API.
+See :ref:`:WorkflowsAdminIpWhitelist`, :ref:`:PrePublishDatasetWorkflowId` and :ref:`:PostPublishDatasetWorkflowId`
+
 .. _bag-info.txt:
 
 Configuring bag-info.txt
@@ -4515,7 +4518,11 @@ Using a JSON-based setting, you can set a global default and per-format limits f
 
 (In previous releases of Dataverse, a colon-separated form was used to specify per-format limits, such as ``:TabularIngestSizeLimit:Rdata``, but this is no longer supported. Now JSON is used.)
 
-The expected JSON is an object with key/value pairs like the following. Format names are case-insensitive, and all fields are optional. The size limits must be strings with double quotes around them (e.g. ``"10"``) rather than numbers (e.g. ``10``).
+The expected JSON is an object with key/value pairs like the following.
+Format names are case-insensitive, and all fields are optional (an empty JSON object equals not restricted).
+The size limits must be whole numbers, either presented as strings with double quotes around them (e.g. ``"10"``) or numeric values (e.g. ``10`` or ``10.0``).
+Note that decimal numbers like ``10.5`` are invalid.
+Any invalid setting will temporarily disable tabular ingest until corrected.
 
 .. code:: json
 
@@ -5134,6 +5141,43 @@ Number of errors to display to the user when creating DataFiles from a file uplo
 
 ``curl -X PUT -d '1' http://localhost:8080/api/admin/settings/:CreateDataFilesMaxErrorsToDisplay``
 
+.. _:WorkflowsAdminIpWhitelist:
+
+:WorkflowsAdminIpWhitelist
+++++++++++++++++++++++++++
+
+A semicolon-separated list of IP addresses from which workflow resume requests are honored.
+By default, the Dataverse installation honors resume requests from localhost only (``127.0.0.1;::1``).
+This setting allows for preventing unauthorized resuming of workflows.
+
+``curl -X PUT -d '127.0.0.1;::1;192.168.0.1' http://localhost:8080/api/admin/settings/:WorkflowsAdminIpWhitelist``
+
+See :ref:`Workflow Admin section <workflow_admin>` for more details and context.
+
+.. _:PrePublishDatasetWorkflowId:
+
+:PrePublishDatasetWorkflowId
+++++++++++++++++++++++++++++
+
+The identifier of the workflow to be executed prior to dataset publication.
+This pre-publish workflow is useful for preparing a dataset for public access (e.g., moving files, checking metadata) or starting an approval process.
+
+``curl -X PUT -d '1' http://localhost:8080/api/admin/settings/:PrePublishDatasetWorkflowId``
+
+See :ref:`Workflow Admin section <workflow_admin>` for more details and context.
+
+.. _:PostPublishDatasetWorkflowId:
+
+:PostPublishDatasetWorkflowId
++++++++++++++++++++++++++++++
+
+The identifier of the workflow to be executed after a dataset has been successfully published.
+This post-publish workflow is useful for actions such as sending notifications about the newly published dataset or archiving.
+
+``curl -X PUT -d '2' http://localhost:8080/api/admin/settings/:PostPublishDatasetWorkflowId``
+
+See :ref:`Workflow Admin section <workflow_admin>` for more details and context.
+
 .. _:BagItHandlerEnabled:
 
 :BagItHandlerEnabled

diff --git a/modules/dataverse-parent/pom.xml b/modules/dataverse-parent/pom.xml
@@ -152,6 +152,7 @@
         <payara.version>6.2025.10</payara.version>
         <postgresql.version>42.7.7</postgresql.version>
         <solr.version>9.8.0</solr.version>
+        <postgresql.server.version>16</postgresql.server.version>
         <aws.version>2.33.0</aws.version>
         <google.library.version>26.30.0</google.library.version>
 
@@ -168,7 +169,7 @@
         <gdcc.xoai.version>5.3.0</gdcc.xoai.version>
 
         <!-- Testing dependencies -->
-        <testcontainers.version>1.19.7</testcontainers.version>
+        <testcontainers.version>2.0.2</testcontainers.version>
         <smallrye-mpconfig.version>3.7.1</smallrye-mpconfig.version>
         <junit.jupiter.version>5.10.2</junit.jupiter.version>
         <mockito.version>5.11.0</mockito.version>

diff --git a/pom.xml b/pom.xml
@@ -20,7 +20,7 @@
     <properties>
         <skipUnitTests>false</skipUnitTests>
         <skipIntegrationTests>false</skipIntegrationTests>
-        <it.groups>integration</it.groups>
+        <it.groups>integration,migration</it.groups>
         <!-- Provide a fallback value that won't break things if JaCoCo prepare-agent steps don't set it. -->
         <!-- Note: you must use @{} style late variable binding in argLine, otherwise JaCoCo cannot inject the right settings! -->
         <surefire.jacoco.args>-Ddummy.jacoco.property=true</surefire.jacoco.args>
@@ -748,36 +748,36 @@
                 </exclusion>
             </exclusions>
         </dependency>
+        <dependency>
+            <groupId>org.dbunit</groupId>
+            <artifactId>dbunit</artifactId>
+            <version>3.0.0</version>
+            <scope>test</scope>
+        </dependency>
         <dependency>
             <groupId>org.testcontainers</groupId>
             <artifactId>testcontainers</artifactId>
             <scope>test</scope>
-            <exclusions>
-                <exclusion>
-                    <groupId>junit</groupId>
-                    <artifactId>junit</artifactId>
-                </exclusion>
-            </exclusions>
         </dependency>
         <dependency>
             <groupId>org.testcontainers</groupId>
-            <artifactId>junit-jupiter</artifactId>
+            <artifactId>testcontainers-junit-jupiter</artifactId>
             <scope>test</scope>
         </dependency>
         <dependency>
             <groupId>org.testcontainers</groupId>
-            <artifactId>postgresql</artifactId>
+            <artifactId>testcontainers-postgresql</artifactId>
             <scope>test</scope>
         </dependency>
         <dependency>
             <groupId>com.github.dasniko</groupId>
             <artifactId>testcontainers-keycloak</artifactId>
-            <version>3.6.0</version>
+            <version>4.0.0</version>
             <scope>test</scope>
         </dependency>
         <dependency>
             <groupId>org.testcontainers</groupId>
-            <artifactId>localstack</artifactId>
+            <artifactId>testcontainers-localstack</artifactId>
             <scope>test</scope>
         </dependency>
         <!--
@@ -1070,6 +1070,9 @@
                     -->
                     <argLine>@{failsafe.jacoco.args} ${argLine}</argLine>
                     <skip>${skipIntegrationTests}</skip>
+                    <systemPropertyVariables>
+                        <postgresql.server.version>${postgresql.server.version}</postgresql.server.version>
+                    </systemPropertyVariables>
                 </configuration>
                 <executions>
                     <execution>

diff --git a/src/main/java/edu/harvard/iq/dataverse/flyway/SettingsCleanupCallback.java b/src/main/java/edu/harvard/iq/dataverse/flyway/SettingsCleanupCallback.java
@@ -11,7 +11,9 @@
 import java.sql.ResultSet;
 import java.sql.SQLException;
 import java.util.ArrayList;
+import java.util.HashMap;
 import java.util.List;
+import java.util.Map;
 import java.util.logging.Level;
 import java.util.logging.Logger;
 
@@ -40,6 +42,7 @@ public boolean canHandleInTransaction(Event event, Context context) {
 
     @Override
     public void handle(Event event, Context context) {
+        // Failsafe - we only run _after_ all migrations are done.
         if (event != Event.AFTER_MIGRATE) {
             return;
         }
@@ -61,10 +64,19 @@ public String getCallbackName() {
         return "SettingsCleanup";
     }
 
+    /**
+     * Cleans up invalid settings from the database by identifying and removing
+     * rows in the `setting` table where the `name` attribute does not correspond
+     * to a valid SettingsServiceBean.Key.
+     *
+     * @param connection the database connection to use for querying and updating the `setting` table
+     * @throws SQLException if a database access error occurs or the query fails
+     */
     private void cleanupInvalidSettings(Connection connection) throws SQLException {
-        // Collect IDs of rows to delete
-        List<Long> idsToDelete = new ArrayList<>();
+        // Collect IDs of rows to delete, together with the setting's "name" attribute.
+        Map<Long, String> entriesToDelete = new HashMap<>();
 
+        // IMPORTANT: as we cannot use JPQL mid-Flyway, this query needs to be carefully aligned with the Setting class!
         String selectSql = "SELECT id, name FROM setting";
         try (PreparedStatement ps = connection.prepareStatement(selectSql);
              ResultSet rs = ps.executeQuery()) {
@@ -77,24 +89,25 @@ private void cleanupInvalidSettings(Connection connection) throws SQLException {
                 // to a SettingsServiceBean.Key is considered invalid and will be removed.
                 SettingsServiceBean.Key key = SettingsServiceBean.Key.parse(name);
                 if (key == null) {
-                    idsToDelete.add(id);
+                    entriesToDelete.put(id, name);
                 }
             }
         }
 
-        if (idsToDelete.isEmpty()) {
+        if (entriesToDelete.isEmpty()) {
             logger.fine("Settings cleanup: no invalid settings found");
             return;
         }
 
-        logger.info(() -> "Settings cleanup: found " + idsToDelete.size()
-                + " invalid settings; deleting them");
+        logger.info(() -> "Settings cleanup: found " + entriesToDelete.size()
+                + " invalid/obsolete settings; deleting them.");
 
         String deleteSql = "DELETE FROM setting WHERE id = ?";
         try (PreparedStatement delete = connection.prepareStatement(deleteSql)) {
-            for (Long id : idsToDelete) {
-                delete.setLong(1, id);
+            for (Map.Entry<Long, String> entry : entriesToDelete.entrySet()) {
+                delete.setLong(1, entry.getKey());
                 delete.addBatch();
+                logger.info("Settings cleanup: deleting \"" + entry.getValue() + "\"");
             }
             int[] counts = delete.executeBatch();
             logger.info(() -> "Settings cleanup: deleted " + counts.length + " rows with invalid keys");

diff --git a/src/main/java/edu/harvard/iq/dataverse/settings/SettingsServiceBean.java b/src/main/java/edu/harvard/iq/dataverse/settings/SettingsServiceBean.java
@@ -177,6 +177,18 @@ public enum Key {
          */
         WorkflowsAdminIpWhitelist,
 
+        /**
+         * Represents the workflow identifier for the "pre-publish dataset" operation.
+         * This identifier is used to manage and define the specific workflow
+         * triggered before a dataset is published within the application.
+         */
+        PrePublishDatasetWorkflowId,
+        /**
+         * Represents the configuration key for specifying the workflow identifier that
+         * will be executed after a dataset has been published.
+         */
+        PostPublishDatasetWorkflowId,
+
         /**
          * A special secret that, if set, needs to be given when trying to manage internal users.
          * This key was formerly known as "BuiltinUsers.KEY", which never was a setting name aligning with the others.
@@ -291,13 +303,14 @@ public enum Key {
          */
         @Deprecated(since = "6.2", forRemoval = true)
         SystemEmail, 
-        /* size limit for Tabular data file ingests */
-        /* (can be set separately for specific ingestable formats; in which 
-        case the actual stored option will be TabularIngestSizeLimit:{FORMAT_NAME}
-        where {FORMAT_NAME} is the format identification tag returned by the 
-        getFormatName() method in the format-specific plugin; "sav" for the 
-        SPSS/sav format, "RData" for R, etc.
-        for example: :TabularIngestSizeLimit:RData */
+
+        /**
+        <p>Size limit (in bytes) for tabular file ingest. Accepts either a single numeric value or JSON for per-format control.</p>
+        <p>Values: -1 (or absent) = no limit, 0 = disable ingest, >0 = byte threshold, or JSON object.</p>
+        <p>JSON object allows setting a "default" (same as single byte value) and override limits per-format for: CSV, DTA, POR, Rdata, SAV, XLSX.
+        Example: <code>{"default": "536870912", "CSV": "0", "Rdata": "1000000"}</code></p>
+        <p>Format names are case-insensitive. Invalid settings disable ingest until corrected.</p>
+        */
         TabularIngestSizeLimit,
         /* Validate physical files in the dataset when publishing, if the dataset size less than the threshold limit */
         DatasetChecksumValidationSizeLimit,