🔝Block API | HTML issues | Refactors | ↑ Broader Roadmap | ← Plans for 6.7
HTML rule changes.
Trac tickets.
Core improvements in handling HTML
Improvements to the HTML API
Performance
Feature-set
- Potentially some movement towards
inner_html functionality.
Block Scanner
The rest of this section is likely not going to make it into 6.9 but it retained for continuity and tracking purposes.
Lingering work from 6.7
⚠️ The tasks in this section likely won’t make it into 6.9 due to pausing of Core work in early 2025. While it could still make it, as the roadmap is revisited other priorities may take place.
-
Speed speed speed. Make the HTML Processor 10x faster.
- Can we defer parsing and deduplicating attribute names while parsing tags and only start doing that when reading attributes?
- Potentially around a 3% speed improvement in scanning tokens with the Tag Processor when not interacting with attributes.
- Remove all
if statements that don't execute anything (they have a comment as their body).
- If 6.7 includes full support for all HTML tags, measure the impact of reordering the
case statements in each insertion mode. Test against 100s of 1000s of websites based on web popularity.
- Profile the parsing of 100s of 1000s of websites and see if anything surprising pops up in the results.
- Replace
'#text' === $token_type with ::STATE_TEXT_NODE === $this->parser_state
- Eagerly set token name, type in
step() where all nodes are real. Reference these values instead of calling ->get_token_name() etc…
- Remove
after_element_push() since these are all instigated from within the HTML Processor, unlike pop with pop_until() (unless we made pop_until() return a generator and we could foreach ( $state->pop_until( 'TAG' ) as $popped )
- Flagification
- Replace as many repetitive
if checks with flags that are set on events, as is done with has_p_in_button_scope.
- Indicate once in
next_token() if a text node is only whitespace.
-
Following the change to push/pop, immediately pop elements off of the stack of elements as instructed in the parsing rules, vs. letting step() perform the check and pop.
With some initial explorations I've found 16% - 40% speed improvement with some of these ideas. That's not good enough, but it's a start.
Lingering support edge-cases.
New Features and Interfaces
Blocks
🔝Block API | HTML issues | Refactors | ↑ Broader Roadmap | ← Plans for 6.7
HTML rule changes.
<select>parser whatwg/html#10557Trac tickets.
Core improvements in handling HTML
wp_kses_hair()wordpress-develop#9248wp_html_split(). wordpress-develop#9270wp_strip_all_tags(). wordpress-develop#9271wp_strip_all_tags()and CSS content, which is not HTML and actually will be corrupted if treated as such.get_url_in_content()wordpress-develop#9272Improvements to the HTML API
Performance
Feature-set
inner_htmlfunctionality.Block Scanner
WP_Block_Processorfor efficiently parsing blocks. wordpress-develop#9105Lingering work from 6.7
Speed speed speed. Make the HTML Processor 10x faster.
ifstatements that don't execute anything (they have a comment as their body).casestatements in each insertion mode. Test against 100s of 1000s of websites based on web popularity.'#text' === $token_typewith::STATE_TEXT_NODE === $this->parser_statestep()where all nodes are real. Reference these values instead of calling->get_token_name()etc…after_element_push()since these are all instigated from within the HTML Processor, unlike pop withpop_until()(unless we madepop_until()return a generator and we couldforeach ( $state->pop_until( 'TAG' ) as $popped )ifchecks with flags that are set on events, as is done withhas_p_in_button_scope.next_token()if a text node is only whitespace.Following the change to push/pop, immediately pop elements off of the stack of elements as instructed in the parsing rules, vs. letting
step()perform the check and pop.With some initial explorations I've found 16% - 40% speed improvement with some of these ideas. That's not good enough, but it's a start.
Lingering support edge-cases.
seek()calls in the HTML Processor to ensure reliability. (Core-????)May be covered by HTML API: Ensure that full processor can seek to earlier bookmarks.
New Features and Interfaces
set_inner_html()to HTML Processor wordpress-develop#7326Blocks