Description of the issue
Hi experienced a condition where I was unable to reprocess a PSET for a certain date. The batch starter reported that the job was in progress or already succeeded. The problem was tracked down to the job with that version number failing to upload the file. See logs below. This resulted in the entry in the processing_job_table being update with status SUCCESS but the file was not in S3. This causes a deadlock because the batch starter sets the version to the same version as is recorded in the "successful" job.
processing_job_table entry:
44124,SUCCEEDED,hi,l1c,45sensor-pset,2026-01-29 00:00:00.000000,v031,141,arn:aws:batch:us-west-2:593025701104:job-definition/ProcessingJob-hi:5,ProcessingJob-hi/default/47db5803a2374fe49045022e8fcff3a2,593025701104.dkr.ecr.us-west-2.amazonaws.com/hi-repo:latest,--instrument hi --data-level l1c --descriptor 45sensor-pset --start-date 20260129 --version v031 --dependency imap_hi_l1c_45sensor-pset-1faa47ed_20260129-repoint00141_v031.json --upload-to-sdc --repointing repoint00141,2026-03-07 19:45:27.479000 +00:00,2026-03-07 19:46:31.516000 +00:00
Job log snippet:
2026-03-07 19:46:07 - INFO:imap_processing.cli:Uploading file: /app/data/imap/hi/l1c/2026/01/imap_hi_l1c_45sensor-pset_20260129-repoint00141_v031.cdf
2026-03-07 19:46:07 - INFO:imap_data_access.io:Uploading file /app/data/imap/hi/l1c/2026/01/imap_hi_l1c_45sensor-pset_20260129-repoint00141_v031.cdf to https://api.imap-mission.com/api-key/upload/imap_hi_l1c_45sensor-pset_20260129-repoint00141_v031.cdf
2026-03-07 19:46:07 - ERROR:imap_processing.cli:Upload failed with error: 503 Service Unavailable: <title>503 Slow Down</title>
503 Slow Down
- Code: SlowDown
- Message: Please reduce your request rate.
- RequestId: 89894346131FD2DD
- HostId: eEYibtJJhTXz0rnMj7d6w/XilO6F5Fp982m7BLUZ9ADu2W6sWG1cAiPpRSzF/PtdWn4ycG7bTugUKBBN2q6bopZQtWh52E5AppynDy8mUhaP9HHg/IJafDjdl8z4KPuf
2026-03-07 19:46:07 - INFO:imap_processing.cli:Clearing furnished SPICE kernels
2026-03-07 19:46:07 - INFO:imap_processing.cli:Processing complete
Steps to reproduce the issue
No response
Expected vs Actual behavior
No response
Code Snippet (If applicable)
Additional notes, affected areas, and suggested fixes
Suggested fix is to check the response from the attempt to upload and raise an error if the upload fails.
Description of the issue
Hi experienced a condition where I was unable to reprocess a PSET for a certain date. The batch starter reported that the job was in progress or already succeeded. The problem was tracked down to the job with that version number failing to upload the file. See logs below. This resulted in the entry in the
processing_job_tablebeing update with status SUCCESS but the file was not in S3. This causes a deadlock because the batch starter sets the version to the same version as is recorded in the "successful" job.processing_job_table entry:
Job log snippet:
Steps to reproduce the issue
No response
Expected vs Actual behavior
No response
Code Snippet (If applicable)
CodeAdditional notes, affected areas, and suggested fixes
Suggested fix is to check the response from the attempt to upload and raise an error if the upload fails.