Support » Developing with WordPress » Preg_match the_content() to pull out images?

  • oguruma

    (@oguruma)


    Suppose I wanted to parse all of the the html inside the_content() and pull out anything that was an img tag.

    Then, pass off those images as an array. Is there any way to find those images in the media library just by their url?

    I like using the RoyalSlider javascript plugin to build galleries, but it performs best when you can pass into it the actual dimensions of the images. Of course, this is easy when you just add the images using something like the Gallery function of Advanced Custom Fields, but is there a way to do it just by matching the url of the images that I’ve plucked out of the the_content().

    Suppose you made a post that had 10 images throughout the post. I’d like to be able to pull in those images and pass them to a slider plugin like RoyalSlider to create a slider of all of those images at the top of the post.

    • This topic was modified 9 months ago by oguruma.
Viewing 15 replies - 1 through 15 (of 15 total)
  • I’d use DOMDocument() and extract image nodes the $content variable. You can even target them by class.

    So if you have the images already updated basically you need to get the image id, some themes add that id to the class attribute, some don’t. Basically you can either add those ids to img classes or even better as a data-attr and push everything found to an array.

    If you don’t have the images uploaded, and have them on a separate URL, you should import them to WP with wp_insert_attachment().

    So long story short, when you have the attachment ID you can get the url and the width & height attributes simply by passing the id to wp_get_attachment_image_src()

    P.S. DOMDocument can throw some notices for HTML5 elements such as header, footer, figure…

    Thread Starter oguruma

    (@oguruma)

    @stefanue I’m not sure I understand. Where in the single template do I call DOMDocument()? Or are you saying that I should do it when the post is updated?

    Moderator bcworkz

    (@bcworkz)

    As long as your code runs before the slider HTML is output, it can reside anywhere considered “proper” (theme or plugin) within WordPress.

    Doesn’t RoyalSlider utilize pre-defined slides to do its thing? It doesn’t create slider content completely anew for each request does it? If so, I’d think it’d be better to extract images and create the slider definitions when the post is saved/updated. It seems very inefficient to do so on every request, like what would happen with template code.

    @oguruma If you’re on the single template you can access post_content from the global $post; object.

    For example:

    global $post;
    $collected_images = array();
    $post_content = $post->post_content;
    $dom = new DOMDocument();
    $dom->loadHTML(mb_convert_encoding($post_content, 'HTML-ENTITIES', 'UTF-8')); // load the HTML
    $xpath = new DOMXPath($dom);
    $nodes = $xpath->query('//img');
    foreach ($nodes as $node) {
        $img_src = $node->getAttribute('src');
        if($img_src){
            $collected_images[] = $img_src;
        }
    }
    
    echo '<pre>';
    var_dump($collected_images);
    echo '</pre>';

    So the urls from the source attribute are now added to the $collected_images

    I hope this helps

    MK

    (@mkarimzada)

    I like what Stefan has suggested but I think it’s way too slow. Instead I would use regex to match the images and execute it later in the_content().

    function images_in_the_content($content)
    {
        if (preg_match_all('/<img[^>]*src="([^"]+)"/i', $content, $matches)) {
            foreach ($matches[1] as $key => $value) {
                echo '<pre>';
                print_r($value);
                echo '</pre>';
            }
        }
    
        return $content;
    }
    
    add_filter('the_content', 'images_in_the_content', 15000);

    I hope this helps.

    @mkarimzada

    Hello, if I don’t want to execute in the_content(), but in a single template, how do you write this way? Or how to call the_content() alone without mounting it? Looking forward to your reply. Thank you

    @mkarimzada be similar to

    function hui_get_thumbnail( $single=true, $must=true ) {
        global $post;
        $html = '';
        if ( has_post_thumbnail() ) {
            $domsxe = simplexml_load_string(get_the_post_thumbnail());
            $src = $domsxe->attributes()->src;
            $src_array = wp_get_attachment_image_src(hui_get_attachment_id_from_src($src), 'thumbnail');
            $html = sprintf('<div class="swiper-slide" style="background-image: url("%s")> </div>', $src_array[0]);
        } else {
            $content = $post->post_content;
            preg_match_all('/<img.*?(?: |\\t|\\r|\\n)?src=[\'"]?(.+?)[\'"]?(?:(?: |\\t|\\r|\\n)+.*?)?>/sim', $content, $strResult, PREG_PATTERN_ORDER);
            $images = $strResult[1];
            $counter = count($strResult[1]);
            $i = 0;
            foreach($images as $src){
                $i++;
                $src2 = wp_get_attachment_image_src(hui_get_attachment_id_from_src($src), 'thumbnail');
                $src2 = $src2[0];
                if( !$src2 && true ){
                    $src = $src;
                }else{
                    $src = $src2;
                }
                $item = sprintf('<div class="swiper-slide"><img src="%s" /></div>', $src);
                if( $single){
                    return $item;
                    break;
                }
                $html .= $item;
                if(
                    ($counter >= 20 && $counter < 40 && $i >= 20) ||
                    ($counter >= 40 && $i >= 40) ||
                    ($counter > 20 && $counter < 20 && $i >= $counter)
                ){
                    break;
                }
            }
        }
        return $html;
    }

    in this way ,
    <?php echo hui_get_thumbnail(false,true);?>

    The above method no longer gets the image, but I want to use your method instead of using the_content(); How should I write the code? I’m a novice, thank you

    MK

    (@mkarimzada)

    @huanhuanhuan You can use conditional tags inside the_content hook to run your function on a specific page, or page template. https://developer.wordpress.org/themes/basics/conditional-tags/#is-a-page-template

    if (preg_match_all('/<img[^>]*src="([^"]+)"/i', $content, $matches) && is_page_template('custom_template.php')) { 
    // Do something with images 
    }

    The issue with your function, $post->post_content is returning plain text, removing all the html tags. If you are using Gutenberg, you can use do_blocks or parse_blocks to get post_content with html tags.

    if ( has_blocks( $post->post_content ) ) {
        // Parse blocks
        $blocks = parse_blocks( $post->post_content );
    
        foreach( $blocks as $block ) {
            // For example to render the block. You can look for images here as well.
            echo render_block( $block );
        }
    }

    I would stick to the_content hook in your case, performance wise it’s way faster, just look for correct page or page template.

    I hope this helps.

    @mkarimzada
    Oh my god! You are great! I think using the_content is actually faster in terms of performance !

    I only used your first piece of code to determine if it was single, and I didn’t use the second because my article was a long time ago, when there was no Gutenberg editor

    I still have two problems to solve, I hope you can help me, thank you very much!
    1. I also extracted the article image to build the gallery, and I wanted the gallery to load in front of the title of the article, not inside the article ;So I was wondering if there is a way to make it appear before the title of the article instead of in the content block

    2. I built the gallery using swiper.js, and then needed to call the image twice. Based on your code, I changed it to the following form: It does what I want it to do, but is it wrong? And then foreach () it twice does that affect the loading speed?

     
    function images_in_the_content($content)
    {
        if (preg_match_all('/<img[^>]*src="([^"]+)"/i', $content, $matches) && is_single()){
            $html =  print_r('<div class="swiper-main">
        <div thumbsSlider="" class="swiper mySwiper" >
          <div class="swiper-wrapper">');
            foreach ($matches[1] as $key => $value) {
                $html =  print_r('<div class="swiper-slide"><img src="'.$value.'"/> </div>');
            }
             
            $html =  print_r('</div></div>
            <div class="hh_swiper"><div style="--swiper-navigation-color: #666; --swiper-pagination-color: #f4f5f9" class="swiper mySwiper2">
            <div class="swiper-wrapper">');
             foreach ($matches[1] as $key => $value) {
                $html =  print_r('<div class="swiper-slide"><img src="'.$value.'"/> </div>');
            }
           $html =  print_r('</div>
          <div class="swiper-button-next"></div>
          <div class="swiper-button-prev"></div>
        </div>
        </div>
    </div>	');
            
        }
        return $content;
    }
    
    add_filter('the_content', 'images_in_the_content', 15000);

    Looking forward to your reply, thank you again!

    • This reply was modified 5 months, 2 weeks ago by huanhuanhuan.

    @mkarimzada
    I foreach () the image twice because I wanted to create a gallery with thumbnails, one large and one small, like the one below

    So I don’t know if I’m writing it right or wrong

    20220111154322.png

    • This reply was modified 5 months, 2 weeks ago by huanhuanhuan.

    @mkarimzada
    Hello, the first problem has been solved just now. Could you please help me check whether there is a problem with the second problem? Thank you very much!

    MK

    (@mkarimzada)

    @huanhuanhuan If you are planning to have both thumbnail and slider image (assuming the mark up for both thumbnail and pagination are the same), let’s create $sliders variable to hold them then inside sprintf we will print them twice. You could write a more dynamic solution for this, however the most easiest method would be:

    add_action('the_post', 'pull_images_in_the_content_before_title', 10, 2);
    
    function pull_images_in_the_content_before_title($post, $query) {
      $content = get_the_content();
      $sliders = '';
    
      if (preg_match_all('/<img[^>]*src="([^"]+)"/i', $content, $matches) && is_single()) {
          foreach ($matches[1] as $key => $value) {
              $sliders .= '<div class="swiper-slide"><img src="' . esc_url($value) . '"/> </div>';
          }
          
          echo sprintf('<div class="swiper-main"><div thumbsSlider="" class="swiper mySwiper"><div class="swiper-wrapper">%1$s</div></div><div class="hh_swiper"><div style="--swiper-navigation-color: #666; --swiper-pagination-color: #f4f5f9" class="swiper mySwiper2">
            <div class="swiper-wrapper">%1$s</div><div class="swiper-button-next"></div><div class="swiper-button-prev"></div></div></div></div>',
            $sliders,
          );
      }
    }

    I’ve changed the_content hook to the_post hook. Now your gallery should appear before title.

    I hope this helps.

    @mkarimzada
    wow! Dear Mr Mkarimzada, you are such a good samaritans!

    I was so surprised to see your reply.I’ve already tried using your code.

    The code you gave really solved my two problems. Thank you again!

    However, there are still some problems that are not very optimistic.

    For example, some of the widgets on my site are associated with the current post, and there are several galleries in the Widget, like the ones in the picture below. The Widget for my theme should also use the the_POST hook. But I don’t have a solution for it, so I’m still using the the_content hook,

    image.png

    Whether the problem is solved or not, I am very grateful to you!

    @mkarimzada
    I have successfully solved the problem by using the code you sent me. Just call the function in single. PHP without the_POST hook! Hee hee, I am really a green hand, I just realized now. Thank you very much!

    MK

    (@mkarimzada)

    @huanhuanhuan You’re welcome. I’m glad that it worked.

Viewing 15 replies - 1 through 15 (of 15 total)
  • You must be logged in to reply to this topic.