Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
8e1da69
chore(deps): update Testcontainers testing dependencies (breaking cha…
poikilotherm Nov 24, 2025
4d47ef0
test,chore(db): add PostgreSQL server version property to Maven confi…
poikilotherm Nov 24, 2025
11984e8
test(migrations): introducing migration tests to the codebase
poikilotherm Nov 24, 2025
83f3d37
test(migration): introduce `SharedPostgresContainer` singleton for mi…
poikilotherm Nov 24, 2025
664b294
test(migration): add integration tests for `V6_8_0_1__SettingsDataMig…
poikilotherm Nov 24, 2025
9c258bd
fix(db): align ON CONFLICT clause with existing functional index
poikilotherm Nov 25, 2025
b18716c
test,fix(auth): avoid flapping test results due to logout when Keyclo…
poikilotherm Nov 26, 2025
c63e695
refactor(migration): centralize DBUnit helpers and reuse shared Postg…
poikilotherm Nov 27, 2025
85e94ee
fix(settings,workflow): migrate workflow keys to structured enums and…
poikilotherm Nov 27, 2025
58a1237
docs(workflow): add details on workflow configuration via Settings API
poikilotherm Nov 27, 2025
a05fcc8
test(migration): update tests to account for additional workflow sett…
poikilotherm Nov 27, 2025
a818596
refactor(migration): simplify setting name migrations with direct `UP…
poikilotherm Nov 27, 2025
2340a75
Merge branch 'develop' into 11996-fix-settings
poikilotherm Nov 28, 2025
c385abb
refactor(settings): enhance cleanup logic with detailed logging of ke…
poikilotherm Dec 3, 2025
984cd27
docs: expand release notes with workflow keys and upgrade considerati…
poikilotherm Dec 3, 2025
586cac3
Merge branch 'develop' into 11996-fix-settings
qqmyers Dec 3, 2025
0ca703a
fix(config): support numeric values for TabularIngestSizeLimit, impro…
poikilotherm Dec 4, 2025
270d039
refactor(config): improve JSON parsing and logging in getTabularInges…
poikilotherm Dec 4, 2025
37df103
style: remove unused (& illegal) Spring `@Value` import from `SystemC…
poikilotherm Dec 5, 2025
28d90a7
docs(settings): clarify Javadocs on TabularIngestSizeLimit usage #11639
poikilotherm Dec 8, 2025
2f662ec
test(files): refactor tabular ingest size limit tests with parameteri…
poikilotherm Dec 8, 2025
cd5f33d
test(files): expand TabularIngestSizeLimit tests with additional nume…
poikilotherm Dec 8, 2025
8ddd5b8
Merge branch 'develop' into 11996-fix-settings
landreev Dec 9, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions doc/release-notes/11639-db-opts-idempotency.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,3 +43,13 @@ The following database settings are were added to the official list within the c
- `:LDNAnnounceRequiredFields`
- `:LDNTarget`
- `:WorkflowsAdminIpWhitelist` - formerly `WorkflowsAdmin#IP_WHITELIST_KEY`
- `:PrePublishDatasetWorkflowId` - formerly `WorkflowServiceBean.WorkflowId:PrePublishDataset`
- `:PostPublishDatasetWorkflowId` - formerly `WorkflowServiceBean.WorkflowId:PostPublishDataset`

### Important Considerations During Upgrade Of Your Installation

1. Running a customized fork? Make sure to add any custom settings to the SettingsServiceBean.Key enum before deploying!
2. Any database settings not contained in the `SettingServiceBean.Key` will be removed from your database during each deployment cycle.
3. As always when upgrading, make sure to backup your database beforehand!
You can also use the existing API endpoint `/api/admin/settings` to retrieve all settings as JSONish data for a quick backup before upgrading.

4 changes: 4 additions & 0 deletions doc/sphinx-guides/source/developers/workflows.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ If a step in a workflow fails, the Dataverse installation makes an effort to rol
provider offers two steps for sending and receiving customizable HTTP requests.
*http/sr* and *http/authExt*, detailed below, with the latter able to use the API to make changes to the dataset being processed. (Both lock the dataset to prevent other processes from changing the dataset between the time the step is launched to when the external process responds to the Dataverse instance.)

.. _workflow_admin:

Administration
~~~~~~~~~~~~~~

Expand All @@ -36,6 +38,8 @@ At the moment, defining a workflow for each trigger is done for the entire insta

In order to prevent unauthorized resuming of workflows, the Dataverse installation maintains a "white list" of IP addresses from which resume requests are honored. This list is maintained using the ``/api/admin/workflows/ip-whitelist`` endpoint of the :doc:`/api/native-api`. By default, the Dataverse installation honors resume requests from localhost only (``127.0.0.1;::1``), so set-ups that use a single server work with no additional configuration.

Note: these settings are also exposed and manageable via the Settings API.
See :ref:`:WorkflowsAdminIpWhitelist`, :ref:`:PrePublishDatasetWorkflowId` and :ref:`:PostPublishDatasetWorkflowId`

Available Steps
~~~~~~~~~~~~~~~
Expand Down
46 changes: 45 additions & 1 deletion doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2396,6 +2396,9 @@ The workflow id returned in this call (or available by doing a GET of /api/admin

Once these steps are taken, new publication requests will automatically trigger submission of an archival copy to the specified archiver, Chronopolis' DuraCloud component in this example. For Chronopolis, as when using the API, it is currently the admin's responsibility to snap-shot the DuraCloud space and monitor the result. Failure of the workflow, (e.g. if DuraCloud is unavailable, the configuration is wrong, or the space for this dataset already exists due to a prior publication action or use of the API), will create a failure message but will not affect publication itself.

Note: setting the default workflow is also available via the Settings API.
See :ref:`:WorkflowsAdminIpWhitelist`, :ref:`:PrePublishDatasetWorkflowId` and :ref:`:PostPublishDatasetWorkflowId`

.. _bag-info.txt:

Configuring bag-info.txt
Expand Down Expand Up @@ -4515,7 +4518,11 @@ Using a JSON-based setting, you can set a global default and per-format limits f

(In previous releases of Dataverse, a colon-separated form was used to specify per-format limits, such as ``:TabularIngestSizeLimit:Rdata``, but this is no longer supported. Now JSON is used.)

The expected JSON is an object with key/value pairs like the following. Format names are case-insensitive, and all fields are optional. The size limits must be strings with double quotes around them (e.g. ``"10"``) rather than numbers (e.g. ``10``).
The expected JSON is an object with key/value pairs like the following.
Format names are case-insensitive, and all fields are optional (an empty JSON object equals not restricted).
The size limits must be whole numbers, either presented as strings with double quotes around them (e.g. ``"10"``) or numeric values (e.g. ``10`` or ``10.0``).
Note that decimal numbers like ``10.5`` are invalid.
Any invalid setting will temporarily disable tabular ingest until corrected.

.. code:: json

Expand Down Expand Up @@ -5134,6 +5141,43 @@ Number of errors to display to the user when creating DataFiles from a file uplo

``curl -X PUT -d '1' http://localhost:8080/api/admin/settings/:CreateDataFilesMaxErrorsToDisplay``

.. _:WorkflowsAdminIpWhitelist:

:WorkflowsAdminIpWhitelist
++++++++++++++++++++++++++

A semicolon-separated list of IP addresses from which workflow resume requests are honored.
By default, the Dataverse installation honors resume requests from localhost only (``127.0.0.1;::1``).
This setting allows for preventing unauthorized resuming of workflows.

``curl -X PUT -d '127.0.0.1;::1;192.168.0.1' http://localhost:8080/api/admin/settings/:WorkflowsAdminIpWhitelist``

See :ref:`Workflow Admin section <workflow_admin>` for more details and context.

.. _:PrePublishDatasetWorkflowId:

:PrePublishDatasetWorkflowId
++++++++++++++++++++++++++++

The identifier of the workflow to be executed prior to dataset publication.
This pre-publish workflow is useful for preparing a dataset for public access (e.g., moving files, checking metadata) or starting an approval process.

``curl -X PUT -d '1' http://localhost:8080/api/admin/settings/:PrePublishDatasetWorkflowId``

See :ref:`Workflow Admin section <workflow_admin>` for more details and context.

.. _:PostPublishDatasetWorkflowId:

:PostPublishDatasetWorkflowId
+++++++++++++++++++++++++++++

The identifier of the workflow to be executed after a dataset has been successfully published.
This post-publish workflow is useful for actions such as sending notifications about the newly published dataset or archiving.

``curl -X PUT -d '2' http://localhost:8080/api/admin/settings/:PostPublishDatasetWorkflowId``

See :ref:`Workflow Admin section <workflow_admin>` for more details and context.

.. _:BagItHandlerEnabled:

:BagItHandlerEnabled
Expand Down
3 changes: 2 additions & 1 deletion modules/dataverse-parent/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,7 @@
<payara.version>6.2025.10</payara.version>
<postgresql.version>42.7.7</postgresql.version>
<solr.version>9.8.0</solr.version>
<postgresql.server.version>16</postgresql.server.version>
<aws.version>2.33.0</aws.version>
<google.library.version>26.30.0</google.library.version>

Expand All @@ -168,7 +169,7 @@
<gdcc.xoai.version>5.3.0</gdcc.xoai.version>

<!-- Testing dependencies -->
<testcontainers.version>1.19.7</testcontainers.version>
<testcontainers.version>2.0.2</testcontainers.version>
<smallrye-mpconfig.version>3.7.1</smallrye-mpconfig.version>
<junit.jupiter.version>5.10.2</junit.jupiter.version>
<mockito.version>5.11.0</mockito.version>
Expand Down
25 changes: 14 additions & 11 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
<properties>
<skipUnitTests>false</skipUnitTests>
<skipIntegrationTests>false</skipIntegrationTests>
<it.groups>integration</it.groups>
<it.groups>integration,migration</it.groups>
<!-- Provide a fallback value that won't break things if JaCoCo prepare-agent steps don't set it. -->
<!-- Note: you must use @{} style late variable binding in argLine, otherwise JaCoCo cannot inject the right settings! -->
<surefire.jacoco.args>-Ddummy.jacoco.property=true</surefire.jacoco.args>
Expand Down Expand Up @@ -748,36 +748,36 @@
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.dbunit</groupId>
<artifactId>dbunit</artifactId>
<version>3.0.0</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.testcontainers</groupId>
<artifactId>testcontainers</artifactId>
<scope>test</scope>
<exclusions>
<exclusion>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.testcontainers</groupId>
<artifactId>junit-jupiter</artifactId>
<artifactId>testcontainers-junit-jupiter</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.testcontainers</groupId>
<artifactId>postgresql</artifactId>
<artifactId>testcontainers-postgresql</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.github.dasniko</groupId>
<artifactId>testcontainers-keycloak</artifactId>
<version>3.6.0</version>
<version>4.0.0</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.testcontainers</groupId>
<artifactId>localstack</artifactId>
<artifactId>testcontainers-localstack</artifactId>
<scope>test</scope>
</dependency>
<!--
Expand Down Expand Up @@ -1070,6 +1070,9 @@
-->
<argLine>@{failsafe.jacoco.args} ${argLine}</argLine>
<skip>${skipIntegrationTests}</skip>
<systemPropertyVariables>
<postgresql.server.version>${postgresql.server.version}</postgresql.server.version>
</systemPropertyVariables>
</configuration>
<executions>
<execution>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,9 @@
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.logging.Level;
import java.util.logging.Logger;

Expand Down Expand Up @@ -40,6 +42,7 @@ public boolean canHandleInTransaction(Event event, Context context) {

@Override
public void handle(Event event, Context context) {
// Failsafe - we only run _after_ all migrations are done.
if (event != Event.AFTER_MIGRATE) {
return;
}
Expand All @@ -61,10 +64,19 @@ public String getCallbackName() {
return "SettingsCleanup";
}

/**
* Cleans up invalid settings from the database by identifying and removing
* rows in the `setting` table where the `name` attribute does not correspond
* to a valid SettingsServiceBean.Key.
*
* @param connection the database connection to use for querying and updating the `setting` table
* @throws SQLException if a database access error occurs or the query fails
*/
private void cleanupInvalidSettings(Connection connection) throws SQLException {
// Collect IDs of rows to delete
List<Long> idsToDelete = new ArrayList<>();
// Collect IDs of rows to delete, together with the setting's "name" attribute.
Map<Long, String> entriesToDelete = new HashMap<>();

// IMPORTANT: as we cannot use JPQL mid-Flyway, this query needs to be carefully aligned with the Setting class!
String selectSql = "SELECT id, name FROM setting";
try (PreparedStatement ps = connection.prepareStatement(selectSql);
ResultSet rs = ps.executeQuery()) {
Expand All @@ -77,24 +89,25 @@ private void cleanupInvalidSettings(Connection connection) throws SQLException {
// to a SettingsServiceBean.Key is considered invalid and will be removed.
SettingsServiceBean.Key key = SettingsServiceBean.Key.parse(name);
if (key == null) {
idsToDelete.add(id);
entriesToDelete.put(id, name);
}
}
}

if (idsToDelete.isEmpty()) {
if (entriesToDelete.isEmpty()) {
logger.fine("Settings cleanup: no invalid settings found");
return;
}

logger.info(() -> "Settings cleanup: found " + idsToDelete.size()
+ " invalid settings; deleting them");
logger.info(() -> "Settings cleanup: found " + entriesToDelete.size()
+ " invalid/obsolete settings; deleting them.");

String deleteSql = "DELETE FROM setting WHERE id = ?";
try (PreparedStatement delete = connection.prepareStatement(deleteSql)) {
for (Long id : idsToDelete) {
delete.setLong(1, id);
for (Map.Entry<Long, String> entry : entriesToDelete.entrySet()) {
delete.setLong(1, entry.getKey());
delete.addBatch();
logger.info("Settings cleanup: deleting \"" + entry.getValue() + "\"");
}
int[] counts = delete.executeBatch();
logger.info(() -> "Settings cleanup: deleted " + counts.length + " rows with invalid keys");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,18 @@ public enum Key {
*/
WorkflowsAdminIpWhitelist,

/**
* Represents the workflow identifier for the "pre-publish dataset" operation.
* This identifier is used to manage and define the specific workflow
* triggered before a dataset is published within the application.
*/
PrePublishDatasetWorkflowId,
/**
* Represents the configuration key for specifying the workflow identifier that
* will be executed after a dataset has been published.
*/
PostPublishDatasetWorkflowId,

/**
* A special secret that, if set, needs to be given when trying to manage internal users.
* This key was formerly known as "BuiltinUsers.KEY", which never was a setting name aligning with the others.
Expand Down Expand Up @@ -291,13 +303,14 @@ public enum Key {
*/
@Deprecated(since = "6.2", forRemoval = true)
SystemEmail,
/* size limit for Tabular data file ingests */
/* (can be set separately for specific ingestable formats; in which
case the actual stored option will be TabularIngestSizeLimit:{FORMAT_NAME}
where {FORMAT_NAME} is the format identification tag returned by the
getFormatName() method in the format-specific plugin; "sav" for the
SPSS/sav format, "RData" for R, etc.
for example: :TabularIngestSizeLimit:RData */

/**
<p>Size limit (in bytes) for tabular file ingest. Accepts either a single numeric value or JSON for per-format control.</p>
<p>Values: -1 (or absent) = no limit, 0 = disable ingest, >0 = byte threshold, or JSON object.</p>
<p>JSON object allows setting a "default" (same as single byte value) and override limits per-format for: CSV, DTA, POR, Rdata, SAV, XLSX.
Example: <code>{"default": "536870912", "CSV": "0", "Rdata": "1000000"}</code></p>
<p>Format names are case-insensitive. Invalid settings disable ingest until corrected.</p>
*/
TabularIngestSizeLimit,
/* Validate physical files in the dataset when publishing, if the dataset size less than the threshold limit */
DatasetChecksumValidationSizeLimit,
Expand Down
Loading