Support » Fixing WordPress » wptexturize inconsistent behavior

  • I previously opened this as a bug but was referred here.
    Given post content:

    [show_custom_field field="image_media" before="<img src='" after="'>"]
    [show_custom_field field="image_media" after="'>" before="<img src='"]

    wptexturize returns:

    [show_custom_field field="image_media" before="<img src='" after="'>"]
    [show_custom_field field=”image_media” after=”‘>” before=”<img src='”]

    The two shortcodes are essentially identical only the order of the attributes are different but wptexturize only texturizes the quote marks in the second shortcode. First, I don’t think wptexturize should process attribute values but in any case, it should do this consistently.

Viewing 1 replies (of 1 total)
  • To show this is really a bug I did a new install of WordPress and disabled all plugins.
    I created a new post with content:

    [audio src="anything" before="<img src='" after="'>"]
    [audio src="anything" after="'>" before="<img src='"]

    I add error_log() to the beginning and end of wptexturize() to show the input and output of wptexturize() which was:

    [02-Oct-2015 07:37:29 UTC] enter:wptexturize():$text=[audio src="anything" before="<img src='" after="'>"]
    [audio src="anything" after="'>" before="<img src='"]
    #####
    [02-Oct-2015 07:37:29 UTC] exit:wptexturize():$text=[audio src="anything" before="<img src='" after="'>"]
    [audio src=”anything” after=”‘>” before=”<img src='”]
    #####

    Note that wptexturize() only texturized the quote marks in the second shortcode. Also, the shortcode must be valid otherwise wptexturize() texturizes uniformly.

    Here is the function wptexturize() that I executed:

    function wptexturize( $text, $reset = false ) {
    	global $wp_cockneyreplace, $shortcode_tags;
    	static $static_characters = null,
    		$static_replacements = null,
    		$dynamic_characters = null,
    		$dynamic_replacements = null,
    		$default_no_texturize_tags = null,
    		$default_no_texturize_shortcodes = null,
    		$run_texturize = true,
    		$apos = null,
    		$prime = null,
    		$double_prime = null,
    		$opening_quote = null,
    		$closing_quote = null,
    		$opening_single_quote = null,
    		$closing_single_quote = null,
    		$open_q_flag = '<!--oq-->',
    		$open_sq_flag = '<!--osq-->',
    		$apos_flag = '<!--apos-->';
        error_log( 'enter:wptexturize():$text=' . $text . "\n#####" );
    	// If there's nothing to do, just stop.
    	if ( empty( $text ) || false === $run_texturize ) {
    		return $text;
    	}
    
    	// Set up static variables. Run once only.
    	if ( $reset || ! isset( $static_characters ) ) {
    		/**
    		 * Filter whether to skip running wptexturize().
    		 *
    		 * Passing false to the filter will effectively short-circuit wptexturize().
    		 * returning the original text passed to the function instead.
    		 *
    		 * The filter runs only once, the first time wptexturize() is called.
    		 *
    		 * @since 4.0.0
    		 *
    		 * @see wptexturize()
    		 *
    		 * @param bool $run_texturize Whether to short-circuit wptexturize().
    		 */
    		$run_texturize = apply_filters( 'run_wptexturize', $run_texturize );
    		if ( false === $run_texturize ) {
    			return $text;
    		}
    
    		/* translators: opening curly double quote */
    		$opening_quote = _x( '“', 'opening curly double quote' );
    		/* translators: closing curly double quote */
    		$closing_quote = _x( '”', 'closing curly double quote' );
    
    		/* translators: apostrophe, for example in 'cause or can't */
    		$apos = _x( '’', 'apostrophe' );
    
    		/* translators: prime, for example in 9' (nine feet) */
    		$prime = _x( '′', 'prime' );
    		/* translators: double prime, for example in 9" (nine inches) */
    		$double_prime = _x( '″', 'double prime' );
    
    		/* translators: opening curly single quote */
    		$opening_single_quote = _x( '‘', 'opening curly single quote' );
    		/* translators: closing curly single quote */
    		$closing_single_quote = _x( '’', 'closing curly single quote' );
    
    		/* translators: en dash */
    		$en_dash = _x( '–', 'en dash' );
    		/* translators: em dash */
    		$em_dash = _x( '—', 'em dash' );
    
    		$default_no_texturize_tags = array('pre', 'code', 'kbd', 'style', 'script', 'tt');
    		$default_no_texturize_shortcodes = array('code');
    
    		// if a plugin has provided an autocorrect array, use it
    		if ( isset($wp_cockneyreplace) ) {
    			$cockney = array_keys( $wp_cockneyreplace );
    			$cockneyreplace = array_values( $wp_cockneyreplace );
    		} else {
    			/* translators: This is a comma-separated list of words that defy the syntax of quotations in normal use,
    			 * for example...  'We do not have enough words yet' ... is a typical quoted phrase.  But when we write
    			 * lines of code 'til we have enough of 'em, then we need to insert apostrophes instead of quotes.
    			 */
    			$cockney = explode( ',', _x( "'tain't,'twere,'twas,'tis,'twill,'til,'bout,'nuff,'round,'cause,'em",
    				'Comma-separated list of words to texturize in your language' ) );
    
    			$cockneyreplace = explode( ',', _x( '’tain’t,’twere,’twas,’tis,’twill,’til,’bout,’nuff,’round,’cause,’em',
    				'Comma-separated list of replacement words in your language' ) );
    		}
    
    		$static_characters = array_merge( array( '...', '<code></code>', '\'\'', ' (tm)' ), $cockney );
    		$static_replacements = array_merge( array( '…', $opening_quote, $closing_quote, ' ™' ), $cockneyreplace );
    
    		// Pattern-based replacements of characters.
    		// Sort the remaining patterns into several arrays for performance tuning.
    		$dynamic_characters = array( 'apos' => array(), 'quote' => array(), 'dash' => array() );
    		$dynamic_replacements = array( 'apos' => array(), 'quote' => array(), 'dash' => array() );
    		$dynamic = array();
    		$spaces = wp_spaces_regexp();
    
    		// '99' and '99" are ambiguous among other patterns; assume it's an abbreviated year at the end of a quotation.
    		if ( "'" !== $apos || "'" !== $closing_single_quote ) {
    			$dynamic[ '/\'(\d\d)\'(?=\Z|[.,:;!?)}\-\]]|>|' . $spaces . ')/' ] = $apos_flag . '$1' . $closing_single_quote;
    		}
    		if ( "'" !== $apos || '"' !== $closing_quote ) {
    			$dynamic[ '/\'(\d\d)"(?=\Z|[.,:;!?)}\-\]]|>|' . $spaces . ')/' ] = $apos_flag . '$1' . $closing_quote;
    		}
    
    		// '99 '99s '99's (apostrophe)  But never '9 or '99% or '999 or '99.0.
    		if ( "'" !== $apos ) {
    			$dynamic[ '/\'(?=\d\d(?:\Z|(?![%\d]|[.,]\d)))/' ] = $apos_flag;
    		}
    
    		// Quoted Numbers like '0.42'
    		if ( "'" !== $opening_single_quote && "'" !== $closing_single_quote ) {
    			$dynamic[ '/(?<=\A|' . $spaces . ')\'(\d[.,\d]*)\'/' ] = $open_sq_flag . '$1' . $closing_single_quote;
    		}
    
    		// Single quote at start, or preceded by (, {, <, [, ", -, or spaces.
    		if ( "'" !== $opening_single_quote ) {
    			$dynamic[ '/(?<=\A|[([{"\-]|<|' . $spaces . ')\'/' ] = $open_sq_flag;
    		}
    
    		// Apostrophe in a word.  No spaces, double apostrophes, or other punctuation.
    		if ( "'" !== $apos ) {
    			$dynamic[ '/(?<!' . $spaces . ')\'(?!\Z|[.,:;!?"\'(){}[\]\-]|&[lg]t;|' . $spaces . ')/' ] = $apos_flag;
    		}
    
    		$dynamic_characters['apos'] = array_keys( $dynamic );
    		$dynamic_replacements['apos'] = array_values( $dynamic );
    		$dynamic = array();
    
    		// Quoted Numbers like "42"
    		if ( '"' !== $opening_quote && '"' !== $closing_quote ) {
    			$dynamic[ '/(?<=\A|' . $spaces . ')"(\d[.,\d]*)"/' ] = $open_q_flag . '$1' . $closing_quote;
    		}
    
    		// Double quote at start, or preceded by (, {, <, [, -, or spaces, and not followed by spaces.
    		if ( '"' !== $opening_quote ) {
    			$dynamic[ '/(?<=\A|[([{\-]|<|' . $spaces . ')"(?!' . $spaces . ')/' ] = $open_q_flag;
    		}
    
    		$dynamic_characters['quote'] = array_keys( $dynamic );
    		$dynamic_replacements['quote'] = array_values( $dynamic );
    		$dynamic = array();
    
    		// Dashes and spaces
    		$dynamic[ '/---/' ] = $em_dash;
    		$dynamic[ '/(?<=^|' . $spaces . ')--(?=$|' . $spaces . ')/' ] = $em_dash;
    		$dynamic[ '/(?<!xn)--/' ] = $en_dash;
    		$dynamic[ '/(?<=^|' . $spaces . ')-(?=$|' . $spaces . ')/' ] = $en_dash;
    
    		$dynamic_characters['dash'] = array_keys( $dynamic );
    		$dynamic_replacements['dash'] = array_values( $dynamic );
    	}
    
    	// Must do this every time in case plugins use these filters in a context sensitive manner
    	/**
    	 * Filter the list of HTML elements not to texturize.
    	 *
    	 * @since 2.8.0
    	 *
    	 * @param array $default_no_texturize_tags An array of HTML element names.
    	 */
    	$no_texturize_tags = apply_filters( 'no_texturize_tags', $default_no_texturize_tags );
    	/**
    	 * Filter the list of shortcodes not to texturize.
    	 *
    	 * @since 2.8.0
    	 *
    	 * @param array $default_no_texturize_shortcodes An array of shortcode names.
    	 */
    	$no_texturize_shortcodes = apply_filters( 'no_texturize_shortcodes', $default_no_texturize_shortcodes );
    
    	$no_texturize_tags_stack = array();
    	$no_texturize_shortcodes_stack = array();
    
    	// Look for shortcodes and HTML elements.
    
    	$tagnames = array_keys( $shortcode_tags );
    	$tagregexp = join( '|', array_map( 'preg_quote', $tagnames ) );
    	$tagregexp = "(?:$tagregexp)(?![\\w-])"; // Excerpt of get_shortcode_regex().
    
    	$comment_regex =
    		  '!'           // Start of comment, after the <.
    		. '(?:'         // Unroll the loop: Consume everything until --> is found.
    		.     '-(?!->)' // Dash not followed by end of comment.
    		.     '[^\-]*+' // Consume non-dashes.
    		. ')*+'         // Loop possessively.
    		. '(?:-->)?';   // End of comment. If not found, match all input.
    
    	$shortcode_regex =
    		  '\['              // Find start of shortcode.
    		. '[\/\[]?'         // Shortcodes may begin with [/ or [[
    		. $tagregexp        // Only match registered shortcodes, because performance.
    		. '(?:'
    		.     '[^\[\]<>]+'  // Shortcodes do not contain other shortcodes. Quantifier critical.
    		. '|'
    		.     '<[^\[\]>]*>' // HTML elements permitted. Prevents matching ] before >.
    		. ')*+'             // Possessive critical.
    		. '\]'              // Find end of shortcode.
    		. '\]?';            // Shortcodes may end with ]]
    
    	$regex =
    		  '/('                   // Capture the entire match.
    		.     '<'                // Find start of element.
    		.     '(?(?=!--)'        // Is this a comment?
    		.         $comment_regex // Find end of comment.
    		.     '|'
    		.         '[^>]*>'       // Find end of element.
    		.     ')'
    		. '|'
    		.     $shortcode_regex   // Find shortcodes.
    		. ')/s';
    
    	$textarr = preg_split( $regex, $text, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY );
    
    	foreach ( $textarr as &$curl ) {
    		// Only call _wptexturize_pushpop_element if $curl is a delimiter.
    		$first = $curl[0];
    		if ( '<' === $first && '<!--' === substr( $curl, 0, 4 ) ) {
    			// This is an HTML comment delimeter.
    
    			continue;
    
    		} elseif ( '<' === $first && '>' === substr( $curl, -1 ) ) {
    			// This is an HTML element delimiter.
    
    			_wptexturize_pushpop_element( $curl, $no_texturize_tags_stack, $no_texturize_tags );
    
    		} elseif ( '' === trim( $curl ) ) {
    			// This is a newline between delimiters.  Performance improves when we check this.
    
    			continue;
    
    		} elseif ( '[' === $first && 1 === preg_match( '/^' . $shortcode_regex . '$/', $curl ) ) {
    			// This is a shortcode delimiter.
    
    			if ( '[[' !== substr( $curl, 0, 2 ) && ']]' !== substr( $curl, -2 ) ) {
    				// Looks like a normal shortcode.
    				_wptexturize_pushpop_element( $curl, $no_texturize_shortcodes_stack, $no_texturize_shortcodes );
    			} else {
    				// Looks like an escaped shortcode.
    				continue;
    			}
    
    		} elseif ( empty( $no_texturize_shortcodes_stack ) && empty( $no_texturize_tags_stack ) ) {
    			// This is neither a delimiter, nor is this content inside of no_texturize pairs.  Do texturize.
    
    			$curl = str_replace( $static_characters, $static_replacements, $curl );
    
    			if ( false !== strpos( $curl, "'" ) ) {
    				$curl = preg_replace( $dynamic_characters['apos'], $dynamic_replacements['apos'], $curl );
    				$curl = wptexturize_primes( $curl, "'", $prime, $open_sq_flag, $closing_single_quote );
    				$curl = str_replace( $apos_flag, $apos, $curl );
    				$curl = str_replace( $open_sq_flag, $opening_single_quote, $curl );
    			}
    			if ( false !== strpos( $curl, '"' ) ) {
    				$curl = preg_replace( $dynamic_characters['quote'], $dynamic_replacements['quote'], $curl );
    				$curl = wptexturize_primes( $curl, '"', $double_prime, $open_q_flag, $closing_quote );
    				$curl = str_replace( $open_q_flag, $opening_quote, $curl );
    			}
    			if ( false !== strpos( $curl, '-' ) ) {
    				$curl = preg_replace( $dynamic_characters['dash'], $dynamic_replacements['dash'], $curl );
    			}
    
    			// 9x9 (times), but never 0x9999
    			if ( 1 === preg_match( '/(?<=\d)x\d/', $curl ) ) {
    				// Searching for a digit is 10 times more expensive than for the x, so we avoid doing this one!
    				$curl = preg_replace( '/\b(\d(?(?<=0)[\d\.,]+|[\d\.,]*))x(\d[\d\.,]*)\b/', '$1×$2', $curl );
    			}
    		}
    	}
    	$text = implode( '', $textarr );
        error_log( 'exit:wptexturize():$text=' . $text  . "\n#####" );
    	// Replace each & with & unless it already looks like an entity.
    	return preg_replace( '/&(?!#(?:\d+|x[a-f0-9]+);|[a-z1-4]{1,8};)/i', '&', $text );
    }

Viewing 1 replies (of 1 total)
  • The topic ‘wptexturize inconsistent behavior’ is closed to new replies.