Skip to content

Handle grobid connection failures#43

Merged
lfoppiano merged 9 commits into
mainfrom
Sanakhamassi-fix/Display-an-error-message-when-grobid-is-not-responding
Jun 7, 2026
Merged

Handle grobid connection failures#43
lfoppiano merged 9 commits into
mainfrom
Sanakhamassi-fix/Display-an-error-message-when-grobid-is-not-responding

Conversation

@lfoppiano

Copy link
Copy Markdown
Collaborator

Revised #35 from @Sanakhamassi

@lfoppiano lfoppiano force-pushed the Sanakhamassi-fix/Display-an-error-message-when-grobid-is-not-responding branch from d8131bf to 7f65f7c Compare June 7, 2026 18:04
Sanakhamassi and others added 8 commits June 7, 2026 19:26
Finalizes PR #35 (issue #11). Resolves the outstanding Copilot and
reviewer remarks and adds handling for Grobid responses that are empty
or truncated (HTTP 200 with no usable body), which were previously
undetected.

grobid_processors.process_structure (single validation chokepoint):
- Catch only requests.exceptions.RequestException instead of bare
  Exception, so local/usage errors (bad path, parser bugs) keep their
  real traceback instead of being mislabelled "Grobid did not respond".
- Raise GrobidServiceError for empty body, malformed/truncated XML, and
  well-formed XML with no extractable text.

document_qa_engine:
- ping_grobid_server now defaults to True (fail-fast for library users);
  Streamlit passes ping_grobid_server=False to degrade gracefully.
- Remove the now-dead "if not structure" guard.
- Fix return-type hints (query_document now consistently returns a
  3-tuple; query_storage -> tuple[List[str], list]; _run_query ->
  tuple[Any, list]); drop unused Tuple import; fix verbose hash print.

streamlit_app:
- Drop the `hash` alias that shadowed the builtin.
- Clean the error message (no duplicated status, correct punctuation).

tests:
- Use requests exceptions; assert local errors are not masked; add empty
  / malformed / no-extractable-text cases. 19 passed.
@lfoppiano lfoppiano force-pushed the Sanakhamassi-fix/Display-an-error-message-when-grobid-is-not-responding branch from 7f65f7c to abcd1f4 Compare June 7, 2026 18:26
@lfoppiano lfoppiano merged commit 0d0843f into main Jun 7, 2026
@lfoppiano lfoppiano deleted the Sanakhamassi-fix/Display-an-error-message-when-grobid-is-not-responding branch June 7, 2026 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants