Open
Conversation
Updated the sanitizer function to improve input sanitization by removing style tags, normalizing pronouns, and increasing the maximum length limit.
Author
Changelog- Enhance input sanitization and normalize pronouns @AkhilRB0204
- Fix: duplicate healthcheck key in docker-compose db service [#462] @amahuli03
- fix: changed donate button link to direct to balancer github page [#465] @amahuli03
- Fix 401 by using adminApi instead of raw axios [#468] @amahuli03
- Fixed error 1 (openAI title sanitization) and added unit tests [#466] @amahuli03
- Fix: File downloads and opens used wrong API URL [#471] @amahuli03
- update site links on README [#479] @amahuli03
- Generate API docs [#472] @amahuli03
- Update files that Git should ignore [#457] @sahilds1
- fix: openAI fallback crash in title generation [#474] @amahuli03
- refactor: file upload uses font size and more lenient regex to extract titles [#475] @amahuli03 |
…-healthchecks-docker-compose
…docker-compose Fix: duplicate healthcheck key in docker-compose db service
fix: changed donate button link to direct to balancer github page
Fix 401 by using adminApi instead of raw axios
…cular missed with default settings
…sed, not where it's defined
Fixed error 1 (openAI title sanitization) and added unit tests
Fix: File downloads and opens used wrong API URL
The old "scan first couple pages" logic used get_text("blocks") and picked the first
block matching a title regex, which frequently selected preambles,
journal names, and article headers instead of the actual title.
The new approach uses get_text("dict") to find the largest font size
across the first few pages and collects contiguous runs of text at
that size, since research paper titles are typically the
largest font.
marks, apostrophes, and non-breaking spaces in titles.
Refactor test helpers to use get_text("dict") structure instead of
get_text("blocks"). Add tests for multi-span joining, short span
filtering, regex rejection, and multi-page title detection.
The links were for the old site and needed an update
update site links on README
Generate API docs
Update files that Git should ignore
…tribute-error fix: openAI fallback crash in title generation
refactor: file upload uses font size and more lenient regex to extract titles
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Improvements
Technical