Support » Fixing WordPress » Bug in wp_html_split with unclosed PHP tag

  • Hi, I have already asked similar question, but I had no time to dive deeper in this problem couple weeks ago.

    https://wordpress.org/support/topic/wordpress-php-code-in-post-content-bug/

    But today I have stacktraced the flow in search of the problem, and I’ve finally found it.

    The problem is in the shortcodes.php file, but exact problem is function wp_html_split in formatting.php

    This function doesn’t splits parts correctly thats why I am missing output.

    Lets start from the beginning. Consider following post code.

    Some amount of useless text <!--more-->
    
    [code-highlight line-numbers="table" linenostart="53" highlight-lines="1,3,8" style="native" lang="html+php" pyg-id="1" ]
    <?php
    //This callback registers our plug-in
    function wpse72394_register_tinymce_plugin($plugin_array) {
        $plugin_array['wpse72394_button'] = 'path/to/shortcode.js';
        return $plugin_array;
    }
    
    //This callback adds our button to the toolbar
    function wpse72394_add_tinymce_button($buttons) {
                //Add the button ID to the $button array
        $buttons[] = "wpse72394_button";
        return $buttons;
    }
    ?
    [/code-highlight]
    
    Some amount of useless text <strong>checkstyle</strong>
    
    [code-highlight style="native" lang="perl" pyg-id="2" ]
    (?:s+)(?:(/*([^*]|[rn]|(*+([^*/]|[rn])))**+/)|(//(?!.*(CHECKSTYLE)).*))
    [/code-highlight]

    Here is result for user

    Result

    Lets find out why

    Because of incorrect regex

    Here dump after this line

    $textarr = wp_html_split( $content );
        var_dump($textarr);
        exit;
    
                    array(25) {
      [0]=>
      string(0) ""
      [1]=>
      string(3) "<p>"
      [2]=>
      string(28) "Some amount of useless text "
      [3]=>
      string(11) "<!--more-->"
      [4]=>
      string(0) ""
      [5]=>
      string(4) "</p>"
      [6]=>
      string(1) "
    "
      [7]=>
      string(3) "<p>"
      [8]=>
      string(121) "[code-highlight line-numbers="table" linenostart="53" highlight-lines="1,3,8" style="native" lang="html+php" pyg-id="1" ]"
      [9]=>
      string(6) "<br />"
      [10]=>
      string(1) "
    "
      [11]=>
      string(464) "<?php
    //This callback registers our plug-in
    function wpse72394_register_tinymce_plugin($plugin_array) {
        $plugin_array['wpse72394_button'] = 'path/to/shortcode.js';
        return $plugin_array;
    }
    
    //This callback adds our button to the toolbar
    function wpse72394_add_tinymce_button($buttons) {
                //Add the button ID to the $button array
        $buttons[] = "wpse72394_button";
        return $buttons;
    }
    ?
    [/code-highlight]
    
    Some amount of useless text <strong>"
      [12]=>
      string(10) "checkstyle"
      [13]=>
      string(9) "</strong>"
      [14]=>
      string(0) ""
      [15]=>
      string(4) "</p>"
      [16]=>
      string(56) "
    [code-highlight style="native" lang="perl" pyg-id="2" ]"
      [17]=>
      string(6) "<br />"
      [18]=>
      string(72) "
    (?:s+)(?:(/*([^*]|[rn]|(*+([^*/]|[rn])))**+/)|(//(?!.*(CHECKSTYLE)).*))"
      [19]=>
      string(6) "<br />"
      [20]=>
      string(19) "
    [/code-highlight]
    "
      [21]=>
      string(3) "<p>"
      [22]=>
      string(15) "Some Text Again"
      [23]=>
      string(4) "</p>"
      [24]=>
      string(1) "
    "
    }

    As you can see one shortcode was not splitted, and here the problem.
    Problematic regex provider

    function get_html_split_regex() {
    	static $regex;
    
    	if ( ! isset( $regex ) ) {
    		$comments =
    			  '!'           // Start of comment, after the <.
    			. '(?:'         // Unroll the loop: Consume everything until --> is found.
    			.     '-(?!->)' // Dash not followed by end of comment.
    			.     '[^\-]*+' // Consume non-dashes.
    			. ')*+'         // Loop possessively.
    			. '(?:-->)?';   // End of comment. If not found, match all input.
    
    		$cdata =
    			  '!\[CDATA\['  // Start of comment, after the <.
    			. '[^\]]*+'     // Consume non-].
    			. '(?:'         // Unroll the loop: Consume everything until ]]> is found.
    			.     '](?!]>)' // One ] not followed by end of comment.
    			.     '[^\]]*+' // Consume non-].
    			. ')*+'         // Loop possessively.
    			. '(?:]]>)?';   // End of comment. If not found, match all input.
    
    		$escaped =
    			  '(?='           // Is the element escaped?
    			.    '!--'
    			. '|'
    			.    '!\[CDATA\['
    			. ')'
    			. '(?(?=!-)'      // If yes, which type?
    			.     $comments
    			. '|'
    			.     $cdata
    			. ')';
    
    		$regex =
    			  '/('              // Capture the entire match.
    			.     '<'           // Find start of element.
    			.     '(?'          // Conditional expression follows.
    			.         $escaped  // Find end of escaped element.
    			.     '|'           // ... else ...
    			.         '[^>]*>?' // Find end of normal element.
    			.     ')'
    			. ')/';
    	}
    
    	return $regex;
    }

    So I am asking for help to fix this bug, I have some skills in regex and will try to fix it myself, but it would better to discuss this problem with community.

    Hope for your help.

Viewing 4 replies - 1 through 4 (of 4 total)
Viewing 4 replies - 1 through 4 (of 4 total)
  • The topic ‘Bug in wp_html_split with unclosed PHP tag’ is closed to new replies.