Title: Regex bug in UrlRequest.php
Last modified: August 30, 2018

---

# Regex bug in UrlRequest.php

 *  Resolved [shareasale-wp](https://wordpress.org/support/users/shareasale-wp/)
 * (@shareasale-wp)
 * [7 years, 10 months ago](https://wordpress.org/support/topic/regex-bug-in-urlrequest-php/)
 * In the function extractAllUrls() it’s running a preg_match_all call that should
   also exclude parentheses and semicolons, not just hashtags (anchors) and question
   marks (query strings).
 * Line 1225 of UrlRequest.php:
 *     ```
       preg_match_all(
                       '/' . str_replace('/', '\/', $baseUrl) . '[^"\'#\? ]+/i', // find this
                       $this->_response['body'], // in this
                       $matches // save matches into this array
                   )
       ```
   
 * Otherwise HTML like this will be crawled:
 * `style="background-image: url(http://www.example.com/wp-content/uploads/2018/
   08/image.jpg);"`
 * … and return `http://www.example.com/wp-content/uploads/2018/08/image.jpg);` 
   including the parentheses and semicolon. This of course causes 404 errors in 
   the static HTML output. Fortunately it’s a simple fix in the regex pattern:
 * `'/' . str_replace('/', '\/', $baseUrl) . '[^"\'#\?); ]+/i'`
 * Thanks!
    -  This topic was modified 7 years, 10 months ago by [shareasale-wp](https://wordpress.org/support/users/shareasale-wp/).
    -  This topic was modified 7 years, 10 months ago by [shareasale-wp](https://wordpress.org/support/users/shareasale-wp/).
    -  This topic was modified 7 years, 10 months ago by [shareasale-wp](https://wordpress.org/support/users/shareasale-wp/).

Viewing 1 replies (of 1 total)

 *  Plugin Author [Leon Stafford](https://wordpress.org/support/users/leonstafford/)
 * (@leonstafford)
 * [7 years, 10 months ago](https://wordpress.org/support/topic/regex-bug-in-urlrequest-php/#post-10641936)
 * Thanks again, Sharesale,
 * I’ve added that in to:
 * [https://github.com/leonstafford/wordpress-static-html-plugin/commit/bfe48c890f26718492ecb3570bfc401daffb2f61](https://github.com/leonstafford/wordpress-static-html-plugin/commit/bfe48c890f26718492ecb3570bfc401daffb2f61)
 * There was a band-aid solution made previously with:
 * [https://github.com/leonstafford/wordpress-static-html-plugin/commit/daa21a842d33e14bc1c7680759c9357c03595e77](https://github.com/leonstafford/wordpress-static-html-plugin/commit/daa21a842d33e14bc1c7680759c9357c03595e77)
 * I will add better tests around these cases and look to switch/offer an option
   to use DOMDocument parsing/rewriting instead of regex, depending on the performance
   hit.
 * I hope you’re on 5.5.1 of the plugin and that’s giving you a speed boost to deployment
   times.
 * There’s a Slack community here and would love to chat more with you about your
   usage/how you’d like to see the plugin improve:
 * [https://wp2static.com/community/](https://wp2static.com/community/)
 * Cheers,
 * Leon

Viewing 1 replies (of 1 total)

The topic ‘Regex bug in UrlRequest.php’ is closed to new replies.

 * ![](https://s.w.org/plugins/geopattern-icon/static-html-output-plugin_cce3eb.
   svg)
 * [WP2Static](https://wordpress.org/plugins/static-html-output-plugin/)
 * [Frequently Asked Questions](https://wordpress.org/plugins/static-html-output-plugin/#faq)
 * [Support Threads](https://wordpress.org/support/plugin/static-html-output-plugin/)
 * [Active Topics](https://wordpress.org/support/plugin/static-html-output-plugin/active/)
 * [Unresolved Topics](https://wordpress.org/support/plugin/static-html-output-plugin/unresolved/)
 * [Reviews](https://wordpress.org/support/plugin/static-html-output-plugin/reviews/)

 * 1 reply
 * 2 participants
 * Last reply from: [Leon Stafford](https://wordpress.org/support/users/leonstafford/)
 * Last activity: [7 years, 10 months ago](https://wordpress.org/support/topic/regex-bug-in-urlrequest-php/#post-10641936)
 * Status: resolved