Fix: locale-aware (words/characters) minimum content length across AI content experiments#581
Fix: locale-aware (words/characters) minimum content length across AI content experiments#581hbhalodia wants to merge 24 commits into
Conversation
…aracters and words
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message. To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #581 +/- ##
=============================================
- Coverage 73.18% 73.15% -0.03%
Complexity 1731 1731
=============================================
Files 85 85
Lines 7473 7476 +3
=============================================
Hits 5469 5469
- Misses 2004 2007 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
dkotter
left a comment
There was a problem hiding this comment.
Left a few comments but overall this is looking good.
The only other thing I'd flag is this standardizes things for Content Classification, Content Resizing and Summarization but none of the other Features. I still think it would be ideal to standardize everything, which likely means closing out #479 and #545 and handling those in this PR, as well as looking at all other Features and ensure, where needed, they follow the same approach
| * @param int $min_content_length The minimum number of characters required. Default 100. | ||
| */ | ||
| $min_content_length = (int) apply_filters( 'wpai_summarization_min_content_length', 100 ); | ||
| $min_content_length = (int) apply_filters( 'wpai_summarization_min_content_length', get_min_content_length( 'summarization', 100 ) ); |
There was a problem hiding this comment.
So I know 100 was the last value but that was when we looked at characters only. Now we'll look at words if the locale supports it so is 100 words the right length? Seems that may be too long
There was a problem hiding this comment.
I am not sure on this what number would be the best here. If we go with less then what if locale supports characters, then it would be less either. So may be we can change to 50?
| 'enabled' => $this->is_enabled(), | ||
| 'strategy' => $this->get_strategy(), | ||
| 'maxSuggestions' => $this->get_max_suggestions(), | ||
| 'minContentLength' => get_min_content_length( 'content-classification', 150 ), |
There was a problem hiding this comment.
Is 150 the right length here or should we lower that?
There was a problem hiding this comment.
I guess it's good thing to have more. Because if we have words as locale, higher the words AI may provide the better suggestions. For characters as well, it would provide better suggestions.
Here in the comment as well - #581 (comment), I guess we can stick with 100, that's because for words locale higher the words it would provide better summarization? yes?
Thanks @dkotter, I am checking those and will update in the PR as needed. |
… characters and words
SummaryFixes Content Resizing's "Shorten" action not detecting the length of Japanese (and other character-based) text, and generalizes the fix into a unified, locale-aware minimum content length system shared by every AI content experiment. The root problem: content-length checks counted words (whitespace-delimited), which is meaningless for CJK languages where there are no spaces. This PR introduces locale-aware counting (words vs. characters, driven by WordPress core's word-count strategy) and a single, filterable source of truth for the minimum length, then applies it consistently across all content experiments — disabling the relevant action and showing a clear, locale-correct message when there isn't enough content. Problem / root cause
Solution
New shared infrastructure
Per-experiment changesAll seven content experiments now localize a
PHP (each adds the
Frontend (types + components/hooks reading
Deprecation / backward compatibility
BehaviorWhen content is below the threshold the action button is
Because of TestingUnit / Integration (PHPUnit) — 7 experiment test filesEach adds localization assertions:
Files under npm run wp-env:test start
npm run test:php tests/Integration/Includes/Experiments/Title_Generation/Title_GenerationTest.php
# …repeat per experiment, or run the whole suite:
npm run test:phpEnd-to-end (Playwright) — 7 experiment spec filesCoverage added across
npm run test:e2e -- specs/experiments/content-resizing.spec.js \
specs/experiments/content-classification.spec.js \
specs/experiments/title-generation.spec.js \
specs/experiments/excerpt-generation.spec.js \
specs/experiments/meta-description.spec.js
Configuration// Adjust the minimum per feature (defaults shown in the table above).
add_filter( 'wpai_min_content_length', function ( $length, $feature_id ) {
// $feature_id: 'content-resizing' | 'content-classification' | 'summarization'
// | 'editorial-notes' | 'excerpt-generation' | 'meta-description'
// | 'title-generation'
return $length;
}, 10, 2 );Notes / follow-ups
🤖 Generated with Claude Code |
|
Hi @dkotter, Below are some visuals for the updates, Title GenerationScreen.Recording.2026-06-09.at.4.57.45.PM.movExcerpt GenerationScreen.Recording.2026-06-09.at.4.58.19.PM.movMeta Description GenerationScreen.Recording.2026-06-09.at.4.58.53.PM.movContent ClassificationScreen.Recording.2026-06-09.at.5.00.01.PM.movThanks, |
What?
Closes #578, #391, #390
Why?
wordCountTypefor counting words or characters based on the users locale, standardize across all the experiments.How?
_x()language pack.Use of AI Tools
Testing Instructions
あああああああああああScreenshots or screencast
Screen.Recording.2026-05-19.at.3.52.39.PM.mov
Changelog Entry