• I’m planning to file a ticket at https://core.trac.wordpress.org/newticket , but I wanted to verify on the forums first.

    I am trying to put this HTML on a WordPress page:

    
    < a href="git+https://github.com/user/repo@1.0.0#egg=package-1.0.0" > package-1.0.0 < /a >
    

    I have tried on two different WordPress installations, both using the frontend page HTML editor and the REST API v2. Every time, the href is corrupted when the page is saved. The git qualifier is lost, leaving:

    
    < a href="//github.com/user/repo@1.0.0#egg=package-1.0.0" > package-1.0.0 < /a >
    

    I’m guessing it will do the exact same thing if I paste the unescaped HTML into this box:

    package-1.0.0

    I know little about WordPress, but I would guess that it is doing some kind of parsing/introspection on the links (it must, to know where to add ‘class=”broken_link”). It appears that WordPress neglects to save the original link, so any kind of link other than the kind it’s expecting gets corrupted. But I really don’t know what’s going on here.

    Assuming this happens to everyone else too, I’m planning to file a ticket. (I’m certainly guessing that it’s not just me, since I already tried on two different WordPress installations.)

Viewing 8 replies - 1 through 8 (of 8 total)
  • I’m not sure what “the frontend page HTML editor” is.
    I tried your link in a paragraph block (by editing the HTML) in the block editor and the Preview had the correct link. However, clicking it gives the error of unknown protocol, because it is. That format only works inside of git. It should not be in a link on a web page.

    Thread Starter nononsensetraveler

    (@nononsensetraveler)

    Thank you! Can you post the code that worked? (These forums are powered by WordPress, as I understand it, so anything that works here I can just copy over.) There’s probably something obvious that I’m missing that I can just copy from yours.

    Just to clarify, did you actually try to save the page?
    Maybe I wasn’t clear; when I edit the HTML with the block editor and switch over from HTML to viewing the link without actually publishing, that always has the correct link, but when I go to actually save the page, the link gets changed.

    link-In-Block-Editor

    • This reply was modified 6 years, 11 months ago by nononsensetraveler. Reason: trying to insert screenshot

    The Preview does a draft save, and that’s what I used.
    I just copied your link from here and removed the extra spaces.

    Thread Starter nononsensetraveler

    (@nononsensetraveler)

    Huh. That’s very strange. That’s very valuable to know.
    I’ve tried it on WordPress 5.1.1 and WordPress 5.2.1 and in both cases the href was changed. As you can see, it was also changed when I posted here, and according to the meta tag on https://wordpress.org/support/topic/corruption-of-anchor-href/, this site is running WordPress 5.3-alpha-45383.

    
    <meta name="generator" content="WordPress 5.3-alpha-45383" />
    
    
    < a href="git+https://github.com/user/repo@1.0.0#egg=package-1.0.0">package-1.0.0< /a>
    

    (Sorry about the extra spaces, even with ampersand-lt it converts the code block into an actual link unless the extra spaces are there.)

    package-1.0.0

    Can I ask what version of WordPress you’re running?

    • This reply was modified 6 years, 11 months ago by nononsensetraveler. Reason: fix code block
    • This reply was modified 6 years, 11 months ago by nononsensetraveler. Reason: fix code block
    • This reply was modified 6 years, 11 months ago by nononsensetraveler. Reason: fix code block

    I tried your link on 5.2.1 in the block editor.

    Perhaps what you are looking at is from
    https://developer.wordpress.org/reference/hooks/kses_allowed_protocols/

    You could have a plugin that is changing something.

    Thread Starter nononsensetraveler

    (@nononsensetraveler)

    Thank you so much! That looks like it exactly.

    https://developer.wordpress.org/reference/hooks/kses_allowed_protocols/

    Filters the list of protocols allowed in HTML attributes.

    https://core.trac.wordpress.org/browser/tags/5.2/src/wp-includes/functions.php#L5811

    
    $protocols = array_unique( (array) apply_filters( 'kses_allowed_protocols', $protocols ) );
    

    From reading https://developer.wordpress.org/reference/functions/apply_filters/, the filter called ‘kses_allowed_protocols’ is being applied to the array of $protocols…makes sense.

    
     * @return string[] Array of allowed protocols. Defaults to an array containing 'http', 'https',
     *                  'ftp', 'ftps', 'mailto', 'news', 'irc', 'gopher', 'nntp', 'feed', 'telnet',
     *                  'mms', 'rtsp', 'svn', 'tel', 'fax', 'xmpp', 'webcal', and 'urn'. This covers
     *                  all common link protocols, except for 'javascript' which should not be
     *                  allowed for untrusted users.
    

    The comment matches the list of values given for $protocols before the filter is applied.

    
    $protocols = array( 'http', 'https', 'ftp', 'ftps', 'mailto', 'news', 'irc', 'gopher', 'nntp', 'feed', 'telnet', 'mms', 'rtsp', 'svn', 'tel', 'fax', 'xmpp', 'webcal', 'urn' );
    ...
    $protocols = array_unique( (array) apply_filters( 'kses_allowed_protocols', $protocols ) );
    

    In any case, this list includes svn, but not git. I just tried an svn link instead, and that works perfectly. Bingo, I think we’ve got it.

    Hmm, still don’t know why git works for you and how to duplicate whatever’s going right for you. For git to be allowed just like svn, something would need to actually add it to the $protocols array, right? And clearly something is adding it to the $protocols array, on your install.

    The $protocols array is a static variable of the wp_allowed_protocols() function. I know almost nothing about PHP, but as I understand it, that means the $protocols array is persistent and might be changed literally anywhere in the codebase.

    
    static $protocols = array();
    

    …but as you alluded to, the most likely thing to change the $protocols array is a kses_allowed_protocols filter.
    I was tripped up at first by the word “filter”, which sounds like it can only remove allowed protocols, not add them. But I just searched and apparently the word “filter” is also used for adding protocols, e.g.

    
    // Whitelist the steam:// and ts3server:// protocols for links
    add_filter( 'kses_allowed_protocols', function( $protocols ) {
        $protocols[] = 'steam';
        $protocols[] = 'ts3server';
        return $protocols;
    } );
    

    (Thanks to you pointing me to kses_allowed_protocols, I’m now finding a ton of information.)
    (PHP syntax is confusing to me, but I gather that $protocols[] = ‘blah’ is adding on to the end of the array.)

    There are a lot of places in the WordPress source code where the protocols array is touched, e.g. in wp-includes/kses.php by the name $allowed_protocols.
    But there are very few places in the WordPress source code where the string “git” appears.
    I just went through all twelve appearances of that three-character string in the WordPress source code, and none of them look like they could possibly end up on the $protocols array, even indirectly. (9/12 appearances are actually in comments, and the others are part of larger strings that could not be part of a URI, such as the string ‘.git’ in a list of file extensions.)

    In view of that, uh, not to contradict you when you clearly know way more about this stuff than I do, but it seems more likely that you have a plugin that is adding git to the $protocols whitelist, rather than me having a plugin that is removing it.

    In any case, while I still don’t really understand what’s going on — my working hypothesis is that you somehow have a plugin or custom theme functions.php that’s adding git to the whitelist without you knowing about it, which obviously sounds really strange — it sounds like my next step should be to try to make a plugin. I’ve never done that, but even without knowing what I’m doing, I can sort of adapt what other people have written.

    
    // Whitelist the git protocols for links
    add_filter( 'kses_allowed_protocols', function( $protocols ) {
        $protocols[] = 'git+ssh';
        $protocols[] = 'git+https';
        return $protocols;
    } );
    

    I think I might still make a ticket — not a bug, but a change request. Since svn is on the default $protocols whitelist, it seems reasonable to have git there too.

    Just to make sure, I saved the post with your link, and it still had the entire link as copied. Of course, it still doesn’t work as a link because it’s not a valid protocol in the browser.

    But no, I do not have a plugin or theme adding the protocol. It’s a test site, with 5.2.1, no plugins, and my own theme (so I know it has no filter for that).

    Thread Starter nononsensetraveler

    (@nononsensetraveler)

    Huh…baffling. Makes me wonder if making and adding a plugin is worth pursuing, but it’s the only lead I have at the moment…since a whitelist for protocols exists, it seems like it must be related somehow…

Viewing 8 replies - 1 through 8 (of 8 total)

The topic ‘Corruption of anchor href’ is closed to new replies.