Support » Plugins and Hacks » [Plugin: HTML Import 2] Importing preformatted text (pre tag)

[Plugin: HTML Import 2] Importing preformatted text (pre tag)

  • Are there known issues with importing preformatted text?

    Importing the test file

    This is the first line
    This is the second line

    with <pre> listed under import settings -> content -> allowed html results in the single unbroken line

    <pre> This is the first line This is the second line </pre>

    Is there a configuration of this plug in that will respect the line breaks within the <pre></pre> tags?


Viewing 3 replies - 1 through 3 (of 3 total)
  • I propose adding to the HTML_Import class defined in html-importer.php the function

    function strip_insignificant_html_whitespace($string) {
      $pre_start = "<pre(?:>|\\s[^>]*>)";
      $pre_end   = "</pre(?:>|\\s[^>]*>)";
      $old_parts = preg_split(";($pre_start|$pre_end);i",$string,0,PREG_SPLIT_DELIM_CAPTURE);
      $new_parts = array();
      $strip = true;
      foreach ($old_parts as $part) {
        if (preg_match(";$pre_start;i",$part)) {
          $tmp = preg_replace(";\s+;"," ",$part);
          $new_parts[] = preg_replace("; +>;",">",$tmp);
          $strip = false;
        if (preg_match(";$pre_end;i",$part)) {
          $tmp = preg_replace(";\s+;"," ",$part);
          $new_parts[] = preg_replace("; +>;",">",$tmp);
          $strip = true;
        if ($strip)
          $new_parts[] = preg_replace(";\s+;"," ",$part);
          $new_parts[] = $part;
      return implode("",$new_parts);

    In clean_html

      $string = str_replace( '\n', ' ', $string );
      $string = $this->strip_insignificant_html_whitespace($string);

    In get_post in the !empty($my_post['post_content']))

      $my_post['post_content'] = ereg_replace("[\n\r]", " ", $my_post['post_content']);
      $my_post['post_content'] = $this->strip_insignificant_html_whitespace($my_post['post_content']);

    It would be nice also to strip the contents of cdata blocks and <script>..</script> blocks cleanly. I find examples like

    <div id="googleAds">
      <!-- b e g i n   g o o g l e  a d s  -->
      <script type="text/javascript">
        google_ad_client = "...";
        google_ad_slot = "...";
        google_ad_width = ...;
        google_ad_height = ...;
      <script type="text/javascript" src="/data/../pagead2.googlesyndication.com/pagead/show_ads.js">
      </script> <!-- e n d   g o o g l e  a d s  -->

    that are not stripped cleanly by the application of the php strip_tags function in the plugin.

    To strip the cdata, script, and style blocks, I think it is sufficient to add the functions

    function allowed_tag($tag,$allowedtags=NULL) {
        !is_null($allowedtags) &&
        stripos($allowedtags,$tag) !== false;
    function strip_cdata_block($string,$allowedtags=NULL) {
      if ($this->allowed_tag('<cdata>',$allowedtags)) return $string;
      $delim = "@";
      $cdata_start = preg_quote('<![CDATA[',$delim);
      $cdata_end = preg_quote(']]>',$delim);
      $block = "$cdata_start.*?$cdata_end";
      return preg_replace("${delim}$block${delim}s","",$string);
    function strip_tag_block($tag,$string,$allowedtags=NULL) {
      if ($this->allowed_tag($tag,$allowedtags)) return $string;
      if (!preg_match(":<(.*?)>:",$tag,$match)) return $string;
      $delim = "@";
      $tag_str = $match[1];
      $tag_start = "<$tag_str(?:>|\\s[^>]*>)";
      $tag_end   = "</$tag_str(?:>|\\s[^>]*>)";
      $block = "$tag_start.*?$tag_end";
      return preg_replace("${delim}$block${delim}is","",$string);
    function strip_comment_block($string) {
      $delim = "@";
      $comment_start = preg_quote('<!--',$delim);
      $comment_end = preg_quote('-->',$delim);
      $block = "$comment_start.*?$comment_end";
      return preg_replace("${delim}$block${delim}s","",$string);

    and add the following calls before strip_tags at the head of clean_html:

    $string = $this->strip_cdata_block($string,$allowtags);
    $string = $this->strip_tag_block('<script>',$string,$allowtags);
    $string = $this->strip_tag_block('<style>',$string,$allowtags);
    $string = $this->strip_comment_block($string);
    Plugin Author Stephanie Leary


    Thanks, Mark! I’ll try to incorporate this into the next version.

Viewing 3 replies - 1 through 3 (of 3 total)
  • The topic ‘[Plugin: HTML Import 2] Importing preformatted text (pre tag)’ is closed to new replies.