Migrate parameter estimation to Quarkus REST with database-backed job tracking#1659
Open
Migrate parameter estimation to Quarkus REST with database-backed job tracking#1659
Conversation
Design document for migrating optimization endpoints from legacy vcell-api (/api/v0/) to Quarkus vcell-rest (/api/v1/) with database-backed job tracking, ActiveMQ messaging, and filesystem polling. Includes desktop client migration and decommissioning plan. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add vc_optjob table to init.sql for database-backed job tracking - Add OptJobStatus enum (SUBMITTED, QUEUED, RUNNING, COMPLETE, FAILED, STOPPED) - Add OptimizationJobStatus response DTO with progress and results fields - Add OptimizationRestService with submit, status polling, and update methods - Writes OptProblem to NFS filesystem - Reads progress/results from filesystem via CopasiUtils - JDBC operations for vc_optjob table Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- POST /api/v1/optimization — submit optimization job
- GET /api/v1/optimization/{optId} — get status, progress, or results
- POST /api/v1/optimization/{optId}/stop — stop a running job
All endpoints require authenticated user. Status endpoint returns
OptimizationJobStatus with typed fields for progress and results.
ActiveMQ dispatch to vcell-submit is marked TODO for commit 3.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use VCell convention: bigint primary key from newSeq database sequence, consistent with all other VCell tables. Uses KeyValue type and KeyFactory.getNewKey() infrastructure throughout. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Returns lightweight job metadata (id, status, htcJobId, statusMessage) without the heavy progressReport/results fields. Most recent first. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add OptimizationMQ with producer (opt-request) and consumer (opt-status) - Producer sends submit/stop commands to vcell-submit via AMQP - Consumer receives status updates (QUEUED/htcJobId, FAILED/error) and updates the database accordingly - Wire messaging into OptimizationResource submit and stop endpoints - Add AMQP channel configuration to application.properties (test profile) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
OptimizationBatchServer.initOptimizationQueue() creates a JMS consumer on the "opt-request" queue (activemqint broker). Cross-protocol with vcell-rest's SmallRye AMQP producer — ActiveMQ bridges AMQP 1.0 and OpenWire transparently on the same queue name. On "submit": reads OptProblem from NFS, submits SLURM job via SlurmProxy, sends QUEUED status back on "opt-status" with htcJobId. On "stop": parses htcJobId and calls killJobSafe() for scancel. Message format: plain JSON text matching OptimizationMQ records in vcell-rest. Uses mutable POJOs (not records) for Jackson compatibility with the vcell-server Java 17 codebase. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move OptJobStatus, OptRequestMessage, and OptStatusMessage from duplicate definitions in vcell-rest and vcell-server into org.vcell.optimization in vcell-core. Both modules now share a single source of truth for the cross-protocol messaging contract. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests cover: submit and verify status, list jobs, progress report from mock report file, auto-transition to COMPLETE on output file, stop running job, unauthorized access (different user), and unauthenticated access (401). Uses testcontainers for PostgreSQL and Keycloak (same infrastructure as existing Quarkus tests). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…endpoints
- Add tools/openapi-clients.sh: single script for spec generation and
client generation, with --update-spec flag to optionally rebuild
vcell-rest first. Replaces tools/generate.sh and
tools/compile-and-build-clients.sh.
- Regenerate OpenAPI spec with new optimization endpoints:
GET/POST /api/v1/optimization, GET /api/v1/optimization/{optId},
POST /api/v1/optimization/{optId}/stop
- Regenerate Java (vcell-restclient), Python (python-restclient),
and TypeScript-Angular (webapp-ng) clients
- Update CLAUDE.md with new script name and usage
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add 3 tests using the auto-generated OptimizationResourceApi from vcell-restclient. These exercise the same client library the desktop client will use, validating serialization round-trips: - Submit and get status via generated client - List jobs via generated client - Stop a running job via generated client Also serves as usage documentation for the generated API. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrite CopasiOptimizationSolverRemote.solveRemoteApi() to use the
auto-generated OptimizationResourceApi (vcell-restclient) instead of
the legacy VCellApiClient.submitOptimization/getOptRunJson methods.
Key improvements:
- Typed OptimizationJobStatus response with explicit status enum,
progressReport, and results fields
- Replaces error-prone string-prefix parsing ("QUEUED:", "RUNNING:")
- Separate stop endpoint (POST /{id}/stop) replaces bStop query param
- Clean switch-based status handling
Add getOptimizationApi() accessor to VCellApiClient.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The optimization messaging uses Artemis (shared with vcell-rest's SmallRye AMQP), not activemqint. Add PropertyLoader properties for Artemis host/port (vcell.jms.artemis.host.internal, vcell.jms.artemis.port.internal) and use them in HtcSimulationWorker.init() for the optimization queue listener. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The optimization endpoint had a hardcoded /simdata/parest_data path which doesn't exist in CI. Make it configurable via vcell.optimization.parest-data-dir property, defaulting to /simdata/parest_data in production. Test profile uses java.io.tmpdir. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
vcell-server/src/main/java/cbit/vcell/message/server/batch/opt/OptimizationBatchServer.java
Fixed
Show fixed
Hide fixed
Validate that file paths from JMS messages are under the expected parest_data directory using canonical path comparison. Also validate that jobId is numeric to prevent injection in file names constructed from it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Refactor CopasiOptimizationSolverRemote to extract a testable overload that accepts OptimizationResourceApi directly and a pluggable progress dispatcher (SwingUtilities::invokeLater in GUI, Runnable::run in tests). Add OptimizationE2ETest that exercises the same client code path as the desktop client against a live Quarkus instance with testcontainers: - testOptimizationE2E_submitPollComplete: submit, mock vcell-submit processing (QUEUED → RUNNING with progress → COMPLETE with results), poll and verify results match - testOptimizationE2E_submitAndStop: submit, transition to RUNNING with progress, stop, verify progress survives stop The mock vcell-submit consumer runs in-process, updating DB status and writing result files to the filesystem — same contract as the real vcell-submit service. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Document the three-tier database architecture, Table class hierarchy, CRUD operation patterns, connection management, access control, and schema management utilities (AdminCli db-create-script, db-compare-schema). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Create OptJobTable (extends Table) with field declarations, SQL generation methods, and ResultSet mapping following VCell's established patterns. Register in SQLCreateAllTables.getVCellTables() so the table participates in db-create-script and db-compare-schema tooling. Refactor OptimizationRestService to use OptJobTable instead of inline SQL strings. Regenerate init.sql DDL from db-create-script. Update database design patterns doc with corrected SQLDataType table and init.sql structure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Map existing K8s configmap/secret env vars (jmshost_artemis_internal, jmsport_artemis_internal, AMQP_USER, AMQP_PASSWORD) to SmallRye AMQP connection properties so the REST pod can connect to Artemis for optimization job messaging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace Executors.newFixedThreadPool with Quarkus ManagedExecutor in ExportRequestListenerMQ so async export jobs run on threads with CDI context, fixing PropertyLoader access failures. Apply same fix in ExportServerTest. Scope AMQP connection properties to %prod profile so DevServices handles test AMQP configuration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Split the single sequential build job into parallel matrix jobs: - maven-build: compiles Java once, uploads JARs as artifacts - docker-build: 17 parallel jobs, one per image (api, rest, exporter, 5 webapp variants, db, sched, submit, data, mongo, batch, opt, clientgen, admin) - tag-and-push: tags all images with friendly version and latest Installer secrets for clientgen are fetched directly via SSH in that matrix job rather than uploaded as artifacts (security: artifacts are downloadable on public repos). Previously all 13+ Docker builds ran sequentially in one job taking 6+ hours. With matrix parallelization, total wall time should be limited by the slowest single image build. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
c40fbd4 to
1e9c088
Compare
- Upload all **/target/ directories and localsolvers/ as artifacts so all Docker matrix jobs get the complete Maven build output including transitive dependencies - rest/exporter do full mvn install dependency:copy-dependencies with their respective -Dvcell.exporter flag - localsolvers/ contains solver binaries downloaded by Maven profiles Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add production AMQP channel config for opt-request/opt-status queues — without these, SmallRye was sending to the channel name instead of the queue address, so messages never reached vcell-submit. Fix Statement and ResultSet leaks in getOptJobRecord() and listOptimizationJobs() by wrapping in try-with-resources. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pass vcell.jms.artemis.host.internal and vcell.jms.artemis.port.internal as Java system properties so the optimization queue listener can connect to the Artemis broker. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…routing Add capabilities=queue to both production and test AMQP channel configs so SmallRye attaches as ANYCAST consumer/producer. Without this, Artemis creates MULTICAST subscriptions that miss messages from OpenWire JMS producers on the same queue. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Test the full round-trip through Artemis: vcell-rest publishes via AMQP 1.0, an OpenWire JMS stub (mimicking vcell-submit) consumes and sends status back, vcell-rest consumes the response. This catches address mapping and ANYCAST/MULTICAST routing bugs that the existing E2E test misses by bypassing messaging. - ArtemisTestResource: testcontainer with both AMQP and OpenWire ports - OpenWireOptSubmitStub: mirrors OptimizationBatchServer.handleSubmitRequest() - OptimizationCrossProtocolTest: submit via REST, poll until COMPLETE Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Stale import of org.apache.activemq.ActiveMQConnectionFactory fails compile when activemq-client is only in test scope. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
vcell-rest/src/test/java/org/vcell/restq/testresources/OpenWireOptSubmitStub.java
Fixed
Show fixed
Hide fixed
vcell-rest/src/test/java/org/vcell/restq/testresources/OpenWireOptSubmitStub.java
Fixed
Show fixed
Hide fixed
vcell-rest/src/test/java/org/vcell/restq/testresources/OpenWireOptSubmitStub.java
Fixed
Show fixed
Hide fixed
vcell-rest/src/test/java/org/vcell/restq/testresources/OpenWireOptSubmitStub.java
Fixed
Show fixed
Hide fixed
Server now reads the progress report file for SUBMITTED/QUEUED states (not just RUNNING), and auto-promotes to RUNNING when progress appears on disk. Client now dispatches progress to the UI for all active states, so the objective function graph and best parameter values update as soon as the SLURM solver starts writing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrite architecture diagram to show Artemis cross-protocol flow and filesystem-driven status promotion. Add sections on cross-protocol messaging pitfalls, real-time progress reporting, and message types. Replace implementation plan with completed work and remaining items. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Set confirm_overwrite=False in basico.assign_report() so COPASI flushes progress lines incrementally during execution. The default (True) caused COPASI to buffer the entire report until task completion, preventing real-time progress updates in the client. - Remove redundant mkdir on external NFS path in SlurmProxy.createOptJobScript() — vcell-rest already creates the parest_data directory, and the external path is not accessible from inside the vcell-submit container. - Add test_incremental_report_writing using multiprocessing to verify COPASI writes progress to the report file during execution (not just at the end). - Add debug/info logging to CopasiOptimizationSolverRemote polling loop. - Add .gitignore entries for vcell-opt .venv and test artifacts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Upgrade copasi-basico 0.40 → 0.86, python-copasi 4.37.264 → 4.46.300 - Set minimum Python to 3.10 (matches Dockerfile and COPASI wheel availability) - Fix report format: use separator='\t' parameter instead of inline '\\\t' body items, which new basico writes as literal backslashes - Upgrade Dockerfile base from bullseye (EOL) to bookworm (Debian 12) - Add gcc and python3-dev to Dockerfile for psutil compilation - Fix deprecated poetry.dev-dependencies → poetry.group.dev.dependencies - Regenerate poetry.lock Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…points Delete the legacy parameter estimation code path that used direct TCP socket connections (port 8877) between vcell-api and vcell-submit. This is replaced by the new Quarkus /api/v1/optimization endpoints with database-backed job tracking and AMQP messaging via Artemis. Removed: - OptimizationRunServerResource.java, OptimizationRunResource.java (vcell-api) - Optimization route registration in VCellApiApplication.java - OptMessage.java socket protocol classes (vcell-core) - Socket server (initOptimizationSocket, OptCommunicationThread) from OptimizationBatchServer - Legacy submitOptProblem (random IDs), optServerStopJob, optServerGetJobStatus - submitOptimization(), getOptRunJson() from VCellApiClient - VCellOptClient.java (unused standalone client) - Port 8877 from docker-compose.yml Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This property was only used by the deleted OptimizationRunServerResource to find the submit service for socket connections on port 8877. The corresponding submit_service_host config in vcell-fluxcd api.env files and port 8877 in submit.yaml should also be removed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove migration-specific content (commit history, decommissioning plan, implementation status tracking). Legacy code has been removed — the doc now describes the current architecture for ongoing maintenance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- SlurmProxy.submitOptimizationJob: validate sub_file_internal is under htcLogDir and use canonical path for writeString - OpenWireOptSubmitStub: add validatePath() and use canonical paths for all file operations (matches real OptimizationBatchServer.validateParestPath) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
vcell-server/src/main/java/cbit/vcell/message/server/htc/slurm/SlurmProxy.java
Fixed
Show fixed
Hide fixed
Using getCanonicalPath().startsWith(String) is not slash-terminated, so /data/parest_data would incorrectly match /data/parest_data_evil. Switch to Path.startsWith(Path) which compares path segments correctly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Migrates parameter estimation (optimization) from the legacy vcell-api to Quarkus vcell-rest with database-backed job tracking, replacing the fragile in-memory TCP socket protocol. Fixes #1653.
Architecture
Changes
Deployment notes
Design document
See docs/parameter-estimation-service.md for architecture, configuration, and maintenance reference.
Test plan