Direct llm call by puja-trivedi · Pull Request #131 · sensein/structsense

puja-trivedi · 2026-07-03T02:30:50Z

No description provided.

…verbose flag

gemini-code-assist

Code Review

This pull request introduces scripts and prompts for performing neuroscience-wide named-entity recognition (NER) using OpenAI and OpenRouter models, including support for local Grobid PDF parsing. The feedback highlights several critical improvements: resolving a configuration conflict in pyproject.toml where static dependencies are defined despite being marked as dynamic, safely handling potential null values in the LLM's JSON output to prevent TypeError crashes, and ensuring uploaded files are deleted from OpenAI's servers using a try...finally block to avoid exceeding storage quotas.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-07-03T02:33:01Z

+dependencies = [
+    "requests>=2.32.5",
+]


Since dependencies is declared as a dynamic field under [project] (i.e., dynamic = ["version", "dependencies"]), any static dependencies defined in [project.dependencies] will be ignored or cause build/locking issues with Poetry. Instead, the requests dependency should be added to [tool.poetry.dependencies].

gemini-code-assist · 2026-07-03T02:33:01Z

+        n_entities = len(payload.get("entities", [])) if isinstance(payload, dict) else 0
+        n_terms = len(payload.get("key_terms", [])) if isinstance(payload, dict) else 0


If the LLM returns null for "entities" or "key_terms" (which is common when no entities are found or if the generation is incomplete), payload.get("entities", []) will return None. Calling len(None) will then raise a TypeError and crash the script. It is safer to check the type of the returned value before calling len().

Suggested change

n_entities = len(payload.get("entities", [])) if isinstance(payload, dict) else 0

n_terms = len(payload.get("key_terms", [])) if isinstance(payload, dict) else 0

entities_list = payload.get("entities") if isinstance(payload, dict) else None

n_entities = len(entities_list) if isinstance(entities_list, list) else 0

terms_list = payload.get("key_terms") if isinstance(payload, dict) else None

n_terms = len(terms_list) if isinstance(terms_list, list) else 0

gemini-code-assist · 2026-07-03T02:33:01Z

+        n_entities = len(payload.get("entities", [])) if isinstance(payload, dict) else 0
+        n_terms = len(payload.get("key_terms", [])) if isinstance(payload, dict) else 0


If the LLM returns null for "entities" or "key_terms", payload.get("entities", []) will return None. Calling len(None) will then raise a TypeError and crash the script. It is safer to check the type of the returned value before calling len().

Suggested change

n_entities = len(payload.get("entities", [])) if isinstance(payload, dict) else 0

n_terms = len(payload.get("key_terms", [])) if isinstance(payload, dict) else 0

entities_list = payload.get("entities") if isinstance(payload, dict) else None

n_entities = len(entities_list) if isinstance(entities_list, list) else 0

terms_list = payload.get("key_terms") if isinstance(payload, dict) else None

n_terms = len(terms_list) if isinstance(terms_list, list) else 0

gemini-code-assist · 2026-07-03T02:33:02Z

+    vprint(f"Uploading file: {args.file} ...")
+    with open(args.file, "rb") as fh:
+        uploaded = client.files.create(file=fh, purpose="user_data")
+    vprint(f"  uploaded (file_id={uploaded.id})")
+
+    vprint(f"Sending request to model '{args.model}' (streaming; this may take a while)...")
+    request_input = [
+        {
+            "role": "user",
+            "content": [
+                {"type": "input_file", "file_id": uploaded.id},
+                {
+                    "type": "input_text",
+                    "text": USER_PROMPT.format(metadata_json=json.dumps(metadata)),
+                },
+            ],
+        }
+    ]
+
+    # Only send sampling controls when explicitly requested; some reasoning
+    # models reject temperature/seed outright.
+    request_kwargs = {}
+    if args.temperature is not None:
+        request_kwargs["temperature"] = args.temperature
+    if args.seed is not None:
+        request_kwargs["seed"] = args.seed
+    if request_kwargs:
+        vprint(f"  sampling controls: {request_kwargs}")
+
+    chunks = []
+    chars = 0
+    next_report = 2000  # print a progress line every ~2000 chars
+    with client.responses.stream(
+        model=args.model,
+        instructions=system_prompt,
+        input=request_input,
+        **request_kwargs,
+    ) as stream:
+        for event in stream:
+            if event.type == "response.output_text.delta":
+                chunks.append(event.delta)
+                chars += len(event.delta)
+                if chars >= next_report:
+                    vprint(f"  ...streaming, {chars} chars received so far")
+                    next_report += 2000
+            elif event.type == "error":
+                print(f"  stream error: {event.error}")
+        final_response = stream.get_final_response()  # surfaces any terminal API error


The uploaded file is persisted indefinitely on OpenAI's servers, which can eventually exhaust the user's file storage quota or accumulate unnecessary files. It is highly recommended to wrap the file upload and streaming process in a try...finally block to ensure the uploaded file is deleted after the request completes.

uploaded = None try: vprint(f"Uploading file: {args.file} ...") with open(args.file, "rb") as fh: uploaded = client.files.create(file=fh, purpose="user_data") vprint(f" uploaded (file_id={uploaded.id})") vprint(f"Sending request to model '{args.model}' (streaming; this may take a while)...") request_input = [ { "role": "user", "content": [ {"type": "input_file", "file_id": uploaded.id}, { "type": "input_text", "text": USER_PROMPT.format(metadata_json=json.dumps(metadata)), }, ], } ] # Only send sampling controls when explicitly requested; some reasoning # models reject temperature/seed outright. request_kwargs = {} if args.temperature is not None: request_kwargs["temperature"] = args.temperature if args.seed is not None: request_kwargs["seed"] = args.seed if request_kwargs: vprint(f" sampling controls: {request_kwargs}") chunks = [] chars = 0 next_report = 2000 # print a progress line every ~2000 chars with client.responses.stream( model=args.model, instructions=system_prompt, input=request_input, **request_kwargs, ) as stream: for event in stream: if event.type == "response.output_text.delta": chunks.append(event.delta) chars += len(event.delta) if chars >= next_report: vprint(f" ...streaming, {chars} chars received so far") next_report += 2000 elif event.type == "error": print(f" stream error: {event.error}") final_response = stream.get_final_response() # surfaces any terminal API error finally: if uploaded is not None: vprint(f"Deleting uploaded file {uploaded.id} ...") try: client.files.delete(uploaded.id) except Exception as exc: print(f"WARNING: Failed to delete uploaded file {uploaded.id}: {exc}")

gemini-code-assist · 2026-07-03T02:33:02Z

+        vprint(f"  stamped source_metadata: {metadata}")
+
+        # Compute extraction statistics and stamp them into the metadata.
+        entities = payload.get("entities", []) or []


Similarly, if "entities" is None or not a list, payload.get("entities", []) or [] can still result in None or an invalid type. Explicitly validating that entities is a list prevents potential runtime errors during iteration.

Suggested change

entities = payload.get("entities", []) or []

entities = payload.get("entities")

if not isinstance(entities, list):

entities = []

gemini-code-assist · 2026-07-03T02:33:02Z

+
+        # Compute extraction statistics and stamp them into the metadata.
+        entities = payload.get("entities", []) or []
+        label_counts = {}


Similarly, if "entities" is None or not a list, payload.get("entities", []) or [] can still result in None or an invalid type. Explicitly validating that entities is a list prevents potential runtime errors during iteration.

entities = payload.get("entities") if not isinstance(entities, list): entities = []

puja-trivedi added 8 commits June 28, 2026 23:02

feat: add CLI for neuroscience NER via OpenAI file upload

70e341f

feat: save NER output to timestamped, model-named JSON file

38a842e

feat: stream output, externalize prompt, stamp metadata + stats, add …

2686160

…verbose flag

feat: record prompt_file path in output statistics

3d26489

feat: add temperature/seed flags and truncation detection

b7cb216

feat: add OpenRouter NER script with PDF/inline/grobid input modes

1120b2a

added 'requests' library

7d71673

added output from direct llm calls

71a515f

puja-trivedi marked this pull request as draft July 3, 2026 02:31

gemini-code-assist Bot reviewed Jul 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Direct llm call#131

Direct llm call#131
puja-trivedi wants to merge 8 commits into
mainfrom
direct_llm_call

puja-trivedi commented Jul 3, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jul 3, 2026

Uh oh!

gemini-code-assist Bot Jul 3, 2026

Uh oh!

gemini-code-assist Bot Jul 3, 2026

Uh oh!

gemini-code-assist Bot Jul 3, 2026

Uh oh!

gemini-code-assist Bot Jul 3, 2026

Uh oh!

gemini-code-assist Bot Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		n_entities = len(payload.get("entities", [])) if isinstance(payload, dict) else 0
		n_terms = len(payload.get("key_terms", [])) if isinstance(payload, dict) else 0

-        n_entities = len(payload.get("entities", [])) if isinstance(payload, dict) else 0
-        n_terms = len(payload.get("key_terms", [])) if isinstance(payload, dict) else 0
+        entities_list = payload.get("entities") if isinstance(payload, dict) else None
+        n_entities = len(entities_list) if isinstance(entities_list, list) else 0
+        terms_list = payload.get("key_terms") if isinstance(payload, dict) else None
+        n_terms = len(terms_list) if isinstance(terms_list, list) else 0

-        entities = payload.get("entities", []) or []
+        entities = payload.get("entities")
+        if not isinstance(entities, list):
+            entities = []

Uh oh!

Conversation

puja-trivedi commented Jul 3, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant