Skip to content

LLM simplifies citations for filenames with special characters, breaking document links #2950

@pamelafox

Description

@pamelafox

Description

When uploading documents via the user upload feature (with USE_USER_UPLOAD=true and AZURE_ENFORCE_ACCESS_CONTROL=true), the sourcefile field in the Azure AI Search index gets a "- " prefix that shouldn't be there.

Steps to Reproduce

  1. Enable user upload and access control:
    azd env set USE_USER_UPLOAD true
    azd env set AZURE_ENFORCE_ACCESS_CONTROL true
    azd provision && azd deploy
    
  2. Log in and upload a document (e.g., PyCon US 2025.pdf)
  3. Query the search index to inspect the indexed document

Expected Behavior

The sourcefile field should match the original filename:

sourcefile: "PyCon US 2025.pdf"

Actual Behavior

The sourcefile field has a "- " prefix:

sourcefile: "- PyCon US 2025.pdf"

Impact

This causes the /content/<path> route to return 403 Forbidden when trying to view the document, because check_path_auth() filters by sourcefile eq 'PyCon US 2025.pdf' but the indexed value is '- PyCon US 2025.pdf'.

Environment

  • User upload enabled with ADLS Gen2
  • Access control enforcement enabled
  • Local development (may also affect deployed environments)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugA bug in the code that should be fixed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions