Skip to content
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
3a016be
fix: submit cluster jobs as the authenticated user's euid
allison-truhlar Mar 19, 2026
543d9cb
fix: resolve job work directory under the user's home, not root's
allison-truhlar Mar 19, 2026
95742f4
fix: remove any existing repo_link symlink before creating it
allison-truhlar Mar 19, 2026
db6f047
test: resolve bsub to full path
allison-truhlar Mar 19, 2026
99fda28
fix: submit with username because bsub doesn't use geteuid internally
allison-truhlar Mar 19, 2026
e2affc7
chore: bump alpha version for a pre-release to test changes
allison-truhlar Mar 19, 2026
95b301f
feat: add cluster.extra_paths setting to configure scheduler PATH
allison-truhlar Mar 20, 2026
eff3c09
chore: new alpha version to create a release testing more changes to …
allison-truhlar Mar 20, 2026
c85bade
feat: add cluster.extra_env config for setting scheduler environment …
allison-truhlar Mar 20, 2026
fc2a569
chore: new alpha version to test changes for using apps while running…
allison-truhlar Mar 20, 2026
ab33f34
test: take out previously added -U username
allison-truhlar Mar 20, 2026
6ebf98b
chore: bump alpha version for another test pre-release
allison-truhlar Mar 20, 2026
8b57085
refactor: move extra_paths, extra_env out of cluster settings to top …
allison-truhlar Mar 23, 2026
be57371
feat: add env_source_script setting to source shell env at startup
allison-truhlar Mar 23, 2026
a276bb3
feat: add pre_run to pixi task entry points for PATH setup
allison-truhlar Mar 23, 2026
045ff9c
chore: bump alpha version for test release
allison-truhlar Mar 23, 2026
52d6c84
fix: use per-user repo cache and submit LSF jobs as authenticated user
allison-truhlar Mar 23, 2026
4638fd5
chore: bump alpha version for test release
allison-truhlar Mar 23, 2026
a9b6c89
test: remove -U; this is only for advance submissions
allison-truhlar Mar 23, 2026
ce1c0d8
chore: bump alpha version
allison-truhlar Mar 23, 2026
422170f
cleanup: remove now unused top level extra_env and extra_paths env vars
allison-truhlar Mar 23, 2026
486489f
run lsf operations in separate worker with setuid
krokicki Mar 23, 2026
bb301a8
wrap other file operations in user contexts
krokicki Mar 23, 2026
e83338c
fix error serializing cached_repo_dir
krokicki Mar 23, 2026
1c69ef2
only switch identity when running as root
krokicki Mar 23, 2026
69a0303
Merge remote-tracking branch 'refs/remotes/origin/further-fixes-to-jo…
krokicki Mar 23, 2026
0bc19b7
chore: bump alpha verison for test release
allison-truhlar Mar 24, 2026
198ef8e
updated to py_cluster_api 0.4.0
krokicki Mar 25, 2026
594cadc
move bjobs monitoring into worker process
krokicki Mar 25, 2026
e93c7fc
chore: bump alpha version for test release
allison-truhlar Mar 26, 2026
a8f5dba
fix: if cluster rejects a submission, clean up job entry in db
allison-truhlar Mar 26, 2026
06c0ccd
refactor: show job submission errs on form, not as toast
allison-truhlar Mar 26, 2026
d507051
fix: run git/manifest operations in worker subprocess instead of Effe…
allison-truhlar Mar 26, 2026
e1a58e2
fix: add user context to submit_job response and validate_paths endpoint
allison-truhlar Mar 26, 2026
97fc166
fix: seed poll stubs with current DB status to prevent status toggling
allison-truhlar Mar 26, 2026
5e571d1
chore: add debug logging for effective user identity in apps/jobs flow
allison-truhlar Mar 26, 2026
804295c
chore: bump alpha version for test release
allison-truhlar Mar 26, 2026
df1c5d0
fix: use file lock so only one uvicorn worker polls bjobs per cycle
allison-truhlar Mar 26, 2026
07c129c
test: add tests for poll lock election and status-update logic
allison-truhlar Mar 26, 2026
5105d12
chore: new alpha version for test release
allison-truhlar Mar 26, 2026
134a760
fix: hold poll lock through sleep interval and fix misleading log
allison-truhlar Mar 26, 2026
391f838
fix(tests): hold the file lock for longer to prevent other worker fro…
allison-truhlar Mar 27, 2026
312fe46
fix: only run job polling if there are active jobs in the user's db
allison-truhlar Mar 27, 2026
c1e4bc7
chore: new alpha version for test release
allison-truhlar Apr 6, 2026
8c89296
Merge branch 'main' into further-fixes-to-job-submission
allison-truhlar Apr 8, 2026
e631cb9
Merge branch 'main' into further-fixes-to-job-submission
allison-truhlar Apr 8, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/config.yaml.template
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,12 @@ session_cookie_secure: true
#
cluster:
executor: local # "local" or "lsf"
# extra_paths: # Directories prepended to the server's PATH at
# - /opt/lsf/bin # startup so scheduler commands (bsub, bjobs,
Comment thread
allison-truhlar marked this conversation as resolved.
Outdated
# # bkill) are findable. Useful when running as a
# # systemd service with a minimal default PATH.
# extra_env: # Extra environment variables set at startup.
# LSF_ENVDIR: /misc/lsf/conf # Required for LSF to find lsf.conf.
Comment thread
allison-truhlar marked this conversation as resolved.
Outdated
# job_name_prefix: fg # Prefix for cluster job names. REQUIRED for job
# # reconnection after server restarts. Without this,
# # active jobs will not be re-tracked and will
Expand Down
55 changes: 33 additions & 22 deletions fileglancer/apps/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
import shlex
import shutil
import subprocess
from contextlib import nullcontext
from pathlib import Path
from datetime import datetime, UTC
from typing import Optional
Expand Down Expand Up @@ -584,6 +585,8 @@
# extra_args are handled via ResourceSpec in _build_resource_spec
# to avoid double-application (config + per-job merge in py-cluster-api)
config.pop("extra_args", None)
config.pop("extra_paths", None)
config.pop("extra_env", None)
_executor = create_executor(**config)
return _executor

Expand Down Expand Up @@ -788,12 +791,18 @@


def _build_work_dir(job_id: int, app_name: str, entry_point_id: str,
job_name_prefix: Optional[str] = None) -> Path:
"""Build a working directory path under ~/.fileglancer/jobs/."""
job_name_prefix: Optional[str] = None,
username: Optional[str] = None) -> Path:
"""Build a working directory path under ~/.fileglancer/jobs/.

When username is provided, expands ~username to the user's home directory
instead of the server process's home (which is typically root).
"""
safe_app = _sanitize_for_path(app_name)
safe_ep = _sanitize_for_path(entry_point_id)
prefix = f"{_sanitize_for_path(job_name_prefix)}-" if job_name_prefix else ""
return Path(os.path.expanduser(f"~/.fileglancer/jobs/{prefix}{job_id}-{safe_app}-{safe_ep}"))
home = os.path.expanduser(f"~{username}") if username else os.path.expanduser("~")
return Path(f"{home}/.fileglancer/jobs/{prefix}{job_id}-{safe_app}-{safe_ep}")


async def submit_job(
Expand All @@ -810,6 +819,7 @@
post_run: Optional[str] = None,
container: Optional[str] = None,
container_args: Optional[str] = None,
user_context=None,
) -> db.JobDB:
"""Submit a new job to the cluster.

Expand Down Expand Up @@ -891,25 +901,19 @@

# Compute and persist work_dir now that we have the job ID
work_dir = _build_work_dir(job_id, manifest.name, entry_point.id,
job_name_prefix=settings.cluster.job_name_prefix)
job_name_prefix=settings.cluster.job_name_prefix,
username=username)
db_job.work_dir = str(work_dir)
session.commit()

# Create work directory on disk
work_dir.mkdir(parents=True, exist_ok=True)

# Determine which repo to symlink and where to cd
# Pre-fetch repo into the shared server-owned cache. This must run as the
# server/root user because the cache lives under the server's home directory
# (~/.fileglancer/apps), which the authenticated user cannot write to.
if manifest.repo_url:
# Tool code lives in a separate repo — clone it and cd to its root
tool_repo_dir = await _ensure_repo_cache(manifest.repo_url, pull=pull_latest)
repo_link = work_dir / "repo"
repo_link.symlink_to(tool_repo_dir)
cached_repo_dir = await _ensure_repo_cache(manifest.repo_url, pull=pull_latest)
cd_suffix = "repo"
else:
# Tool code is in the discovery repo — cd into manifest's subdirectory
repo_dir = await _ensure_repo_cache(app_url, pull=pull_latest)
repo_link = work_dir / "repo"
repo_link.symlink_to(repo_dir)
cached_repo_dir = await _ensure_repo_cache(app_url, pull=pull_latest)
cd_suffix = f"repo/{manifest_path}" if manifest_path else "repo"

# Build environment variable export lines
Expand Down Expand Up @@ -984,14 +988,21 @@
resource_spec.stdout_path = str(work_dir / "stdout.log")
resource_spec.stderr_path = str(work_dir / "stderr.log")

# Submit to executor
# Create work directory, symlink the cached repo, and submit to the cluster —
# all as the authenticated user so the job runs with correct ownership.
executor = await get_executor()
job_name = f"{manifest.name}-{entry_point.id}"
cluster_job = await executor.submit(
command=full_command,
name=job_name,
resources=resource_spec,
)
with user_context if user_context is not None else nullcontext():
work_dir.mkdir(parents=True, exist_ok=True)
Comment thread Fixed
repo_link = work_dir / "repo"
if repo_link.is_symlink() or repo_link.exists():
Comment thread Fixed
Comment thread Fixed
repo_link.unlink()
Comment thread Fixed
repo_link.symlink_to(cached_repo_dir)
Comment thread Fixed
cluster_job = await executor.submit(
command=full_command,
name=job_name,
resources=resource_spec,
)

# Register callback to update DB when job reaches terminal state
cluster_job.on_exit(_on_job_exit)
Expand Down
48 changes: 32 additions & 16 deletions fileglancer/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -266,6 +266,22 @@ def mask_password(url: str) -> str:
logger.debug(f" external_proxy_url: {settings.external_proxy_url}")
logger.debug(f" atlassian_url: {settings.atlassian_url}")

# Prepend cluster extra_paths to PATH so scheduler commands
# (bsub, bjobs, bkill) are findable without relying on the
# system service's default PATH.
if settings.cluster.extra_paths:
extra = os.pathsep.join(settings.cluster.extra_paths)
os.environ["PATH"] = extra + os.pathsep + os.environ.get("PATH", "")
logger.debug(f" cluster.extra_paths prepended to PATH: {extra}")

# Set extra environment variables needed by scheduler commands
# (e.g., LSF_ENVDIR for bsub to find lsf.conf). Pixi strips
# inherited env vars, so they must be set inside the process.
if settings.cluster.extra_env:
for key, value in settings.cluster.extra_env.items():
os.environ[key] = value
logger.debug(f" cluster.extra_env set: {key}={value}")

# Initialize database (run migrations once at startup)
db.initialize_database(settings.db_url)

Expand Down Expand Up @@ -1687,22 +1703,22 @@ async def submit_job(body: JobSubmitRequest,
if body.resources:
resources_dict = body.resources.model_dump(exclude_none=True)

with _get_user_context(username):
db_job = await apps_module.submit_job(
username=username,
app_url=body.app_url,
entry_point_id=body.entry_point_id,
parameters=body.parameters,
resources=resources_dict,
extra_args=body.extra_args,
pull_latest=body.pull_latest,
manifest_path=body.manifest_path,
env=body.env,
pre_run=body.pre_run,
post_run=body.post_run,
container=body.container,
container_args=body.container_args,
)
db_job = await apps_module.submit_job(
username=username,
app_url=body.app_url,
entry_point_id=body.entry_point_id,
parameters=body.parameters,
resources=resources_dict,
extra_args=body.extra_args,
pull_latest=body.pull_latest,
manifest_path=body.manifest_path,
env=body.env,
pre_run=body.pre_run,
post_run=body.post_run,
container=body.container,
container_args=body.container_args,
user_context=_get_user_context(username),
)
return _convert_job(db_job)
except ValueError as e:
raise HTTPException(status_code=400, detail=str(e))
Expand Down
4 changes: 3 additions & 1 deletion fileglancer/settings.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from typing import List, Optional
from typing import Dict, List, Optional
from functools import cache
import sys

Expand All @@ -14,6 +14,8 @@
class ClusterSettings(BaseModel):
"""Cluster configuration matching py-cluster-api's ClusterConfig."""
executor: str = 'local'
extra_paths: List[str] = []
extra_env: Dict[str, str] = {}
cpus: Optional[int] = None
gpus: Optional[int] = None
memory: Optional[str] = None
Expand Down
4 changes: 2 additions & 2 deletions frontend/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion frontend/package.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "fileglancer",
"type": "module",
"version": "2.7.0-a4",
"version": "2.7.0-a9",
"description": "Browse, share, and publish files on the Janelia file system",
"keywords": [
"ngff",
Expand Down
Loading