Support » Plugin: Pardot » Incorrect regex for HTTP to HTTPS?

  • Resolved cjhaas

    (@chrisvendiadvertisingcom)


    In the file pardot-plugin-class.php on line 635 (version 1.4) there’s a regex for transforming non-secure URLs into secure ones but it is breaking for us. The last pattern is (\/\S*)? which is an optional group that’s a forward slash followed by zero or more non-space characters but since we’re parsing HTML I think we’d want single or double-quote terminator in there, too. We changed it to [^\"']* which is zero or more characters that aren’t one of the two quotes. (Technically you’d want to balance quotes but I just needed to get this working.)

    Our full line is (we also disabled capturing on the first group)L

    $reg_exUrl = "/(?:http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}[^\"']*/";

    I think the bigger problem stems from our HTML including some fully-qualified domains for CSS/JS and the original code wasn’t intended for this.

    https://wordpress.org/plugins/pardot/

Viewing 13 replies - 1 through 13 (of 13 total)
  • Yeah, we are seeing this too. Thanks for the fix.

    However, your proposed regex also doesn’t work for domains that don’t end in 2 or 3 characters. I’d propose the following:

    /(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+(\/\S*)?/

    cjhaas

    (@chrisvendiadvertisingcom)

    Well, we ended up going down a much more complicated but safer path. The regex never worked on our iFrame and that was always ugly to begin with. So we built a dedicated parser that actually walks the HTML and converts it. It also exposes four WordPress filters for developers to override specific parts of the logic. Maybe overkill but it works. You can see the full class here:

    https://www.vendiadvertising.com/cjhaas/pardot/view-pardot.php

    To use it

    1. Create a new file called pardot-http-to-https.php in the pardot includes directory and copy the contents of the above URL into it.
    2. In pardot.php add require_once PARDOT_PLUGIN_DIR . '/includes/pardot-http-to-https.php'; before the other require statements.
    3. Remove the contents of the if statement on line 631 (of version 1.4) and replace them with $form_html = http_to_https::convertHttpToHttps( $form_html );

    I don’t have time to perform a pull request right now (someone else, please feel free!) but might be able to this weekend or next week.

    Plugin Author Cliff Seal

    (@cliffseal)

    I’m hoping to have a release out for this today or tomorrow.

    To others reading this thread: please don’t add a new file and edit the plugin itself in this manner. It may be overridden when you upgrade, and may leave an orphaned file in your directory.

    Plugin Author Cliff Seal

    (@cliffseal)

    I made a couple of tweaks I’d like to get tested before release.

    Please download the 1.4.1 branch (also PR #11) to see if things work for you. I’ll release officially soon if so. Thanks!

    Cliff,

    Your RegEx only works on domains that have 3 components.

    e.g. it’ll work on http://www.xyz.com but not xyz.com

    because your RegEx explicitly looks for <string>.<string>.<string>

    I don’t know why you are doing this when you don’t care what the domain is. All you care about is capturing the path that comes after the domain.

    So I recommend the following regex:

    /(http|https)\:\/\/[a-zA-Z0-9\-\.]+(\/\S*)?/

    This matches any domain, regardless of what it is and how many dotted components it has, and captures the path that comes after the domain.

    Plugin Author Cliff Seal

    (@cliffseal)

    Locked the other thread and moved it here to keep the conversation in one place. 🙂

    @ellimerino Right. Pardot supports tracker subdomains, so that should be all that it’s looking for. Is there a case in which a Pardot form is being hosted in a single example.com domain?

    Also, we’re not only “capturing the path that comes after”. We’re looking in the embed code for only the domain that’s hosting the form, and we’re replacing it with https://go.pardot.com, then appending whatever was already there.

    The RegEx does *not* look for only the domain that hosts the form. It matches *any domain*, and the domain matching RegEx does not match all valid tracker domains, only domains of the form x.y.z. So it will break on perfectly valid tracker domains where there are more than 3 components to the domain name, like tracker.company.co.uk or tracker.department.university.edu. There are also likely corner cases where you have 2 component domain names (e.g. tracker.local) for test servers too.

    I would suggest one of the following:
    1) Match on any domain (what the code currently tries to do). Modify the RegEx to allow any valid domain (an arbitrary string containing a-z, A-Z, 0-9, – and .)
    2) Match on only the exact tracker domain. Get the tracker domain from Pardot and drop it into the regex after escaping any special characters like . and -.

    Plugin Author Cliff Seal

    (@cliffseal)

    @ellimerino Have you tested the branch? If it’s not matching properly, please send me the tracker domain in use and what the output of the regex function is (if it’s matching). Thanks!

    Seriously? anyone can inspect the code and tell you the conditions where it will fail and you ask that?

    Plugin Author Cliff Seal

    (@cliffseal)

    The regex should be looking for a minimum of 3 components, but should not fail on more than that. So yes, it would be most helpful if you could check the code and let me know the results.

    Plugin Author Cliff Seal

    (@cliffseal)

    FYI, I’ve tested the proposed changes repeatedly and all possible iterations seem to be working (tested 3-6 components of a domain).

    Feel free to test out the RC for v1.4.1 prior to release to give me feedback: https://github.com/Pardot/pardot-for-wordpress/archive/1.4.1.zip (use what’s in the “trunk” folder)

    Plugin Author Cliff Seal

    (@cliffseal)

    Hi! I’ve released 1.4.1, which solves this issue in all my tests. Please let me know if you continue to have trouble.

Viewing 13 replies - 1 through 13 (of 13 total)
  • The topic ‘Incorrect regex for HTTP to HTTPS?’ is closed to new replies.