WordPress.org

Ready to get started?Download WordPress

Forums

WordPress Importer
ID="stuffhere" Stripped During Import (4 posts)

  1. SEO Dave
    Member
    Posted 1 year ago #

    Got a strange one.

    All four WordPress installs (2 local installs, one single site and one multisite/domain mapped) mentioned below are running the latest WordPress version.

    Running a domain mapped setup at http://www.musicred.com/ (that's a mapped domain not the main domain).

    Moved a single site install to a domain mapped install using the MU Domain Mapping Plugin http://wordpress.org/extend/plugins/wordpress-mu-domain-mapping/ I don't know if domain nmapping is part of the issue, suspect not as that site is working as expected other than the issue below.

    The current site was created by exporting the posts from the old site (just 6 posts) and importing to the new setup and an export file from a localhost (single site) install of around 150 lyrics posts (new content).

    With both xml files the import process hasn't worked correctly, what's imported isn't identical to the content within the xml files. The localhost generated xml file with each post there's a song lyric and an Amazon affiliate link at the bottom with this sort of code:

    <div><span class="testing" title="tests" id="http://www.amazon.com/Bon-Jovi-Greatest-Hits/dp/B0045EH4SS%3FSubscriptionId%3DAKIAI52UZ4Z3MZYXPEIQ%26tag%3Dclassiclitera-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3DB0045EH4SS"><img style="float:left;margin: 0 20px 10px 0;" src="http://localhost/wprobot365/wp-content/uploads/a2672_Bon_Jovi_51xz-ab-XRL._SL75_.jpg" alt="Bon Jovi - Greatest Hits" /></span><span class="testing" title="tests" id="http://www.amazon.com/Bon-Jovi-Greatest-Hits/dp/B0045EH4SS%3FSubscriptionId%3DAKIAI52UZ4Z3MZYXPEIQ%26tag%3Dclassiclitera-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3DB0045EH4SS">Bon Jovi - Greatest Hits</span><br />Dvd is Ntsc Rc-0. 17 track collection of live performances taken from various tours/locations (Madison Square Garden, London, 02, ... <br /><div style="clear:both;"></div></div>

    The theme I use converts the above span tag code into a clickable link using javascript. What's within the ID is the Amazon affiliate URL. the above code gnerates a clickble image link and a text link with a short description, if javascript is disabled (or another theme is active) it's a non-clickable image and just text. I'm locked in to using id="url".

    Loading the xml file in a text editor shows the above code (the above is copied from the xml file). After importing into the new setup the code is changed to:

    <div><span class="testing" title="tests"><img style="float:left;margin: 0 20px 10px 0" src="http://www.musicred.com/files/music-lyrics/a2672_Bon_Jovi_51xz-ab-XRL._SL75_.jpg" alt="Bon Jovi - Greatest Hits" /></span><span class="testing" title="tests">Bon Jovi - Greatest Hits</span><br />Dvd is Ntsc Rc-0. 17 track collection of live performances taken from various tours/locations (Madison Square Garden, London, 02, ... <br /><div style="clear:both"></div></div>

    As you can see both instances of id="Amazon Affiliate Link" is gone.

    Have tried a different format (class="testing" id="affiliateurl" title="tests") with the same result, the id="" is gone.

    Had to get around this by removing the id=" part (so class="testing" title="affiliateurl") and after import use SQL search and replace to add the (tests" id=") so it works. Looks like the importer or WordPress core is stripping id="stuffhere".

    I've had a similar problem with a plugin that changed the date of posts to the current date (reposter plugin), any posts that had the date changed had the id="" part stripped. Didn't figure out the cause used a different plugin instead.

    During the import process no plugins for that domain where active (other than the import plugin and domain mapping). Have deleted all the posts (including emptying trash), activated TwentyEleven and reimported the file and same result, so have ruled out a theme issue.

    Imported the posts into another localhost multisite install that doesn't have domain mapping with the same result, that suggests it's either the importer plugin or WordPress core.

    Also had problems with the original 6 posts as well. Youtube video code with format:

    <object width="425" height="355"><param name="movie" value="http://www.youtube.com/v/NYcnSriCpwk&rel=1"></param><param name="wmode" value="transparent"></param><embed src="http://www.youtube.com/v/NYcnSriCpwk&rel=1" type="application/x-shockwave-flash" wmode="transparent" width="425" height="355"></embed></object>

    Was deleted completely

    Under Settings >> Writing

    Convert emoticons like :-) and :-P to graphics on display is ticked
    WordPress should correct invalidly nested XHTML automatically is unticked

    Couldn't think of anything else to check that might delete code.

    Some non-standard caracters are not the same after import, will look into that at another time (all installs are UTF-8).

    Any ideas?

    David

    http://wordpress.org/extend/plugins/wordpress-importer/

  2. SEO Dave
    Member
    Posted 1 year ago #

    Doing a little testing, interesting results.

    <p id="123">p test</p>
    <div id="123">div test</div>
    <span id="123">span test</span>
    <h1 id="123">h1 test</h1>
    <h2 id="123">h2 test</h2>
    <h3 id="123">h3 test</h3>
    <b id="123">b test<b>
    <i id="123">i test<i>
    <em id="123">em test<em>

    id="123" is deleted sometimes, this is the output after import.

    <p>p test</p>
    <div>div test</div>
    <span>span test</span>
    <h1 id="123">h1 test</h1>
    <h2 id="123">h2 test</h2>
    <h3 id="123">h3 test</h3>
    <b>b test<b>
    <i>i test<i>
    <em>em test<em>

    So p, span, div tags have their id code stripped, but headers don't.

    Could someone confirm the above test results to rule out an issue with my setup.

    David

  3. SEO Dave
    Member
    Posted 1 year ago #

    Figured out how the id="" is being stripped.

    /wp-includes/kses.php

    around line 307 is a list of allowed attributes:

    'span' => array (
    			'class' => true,
    			'dir' => true,
    			'align' => true,
    			'lang' => true,
    			'style' => true,
    			'title' => true,
    			'xml:lang' => true,
    		),

    doesn't include id.

    a core hack adding 'id' => true, to the list works.

    Interesting around line 140 id is also missing from div, that has to be a mistake. WordPress shouldn't be stripping ID from divs.

    Not looked into why the importer is using this code and messing up my content.

    David

  4. SEO Dave
    Member
    Posted 1 year ago #

    Got a solution for others who find the importer is using the $allowedposttags code and is stripping attributes you don't want stripped.

    I added this to the WordPress Importer plugin in the file wordpress-importer.php just below the GPL info.

    $allowedposttags["span"] = array(
     "id" => array(),
     "class" => array(),
     "dir" => array(),
     "align" => array(),
     "lang" => array(),
     "style" => array(),
     "title" => array(),
     "xml:lang" => array()
    );

    This will work by adding to your themes functions.php file, but I added it to the plugin as after I use the importer I deactivate it and I haven't run into this issue on a daily basis, so don't need the code all the time.

    Also added this code for allowing old YouTube embed code to not be stripped at import:

    $allowedposttags["object"] = array(
     "width" => array(),
     "height" => array()
    );
    
    $allowedposttags["param"] = array(
     "name" => array(),
     "value" => array()
    );
    
    $allowedposttags["embed"] = array(
     "src" => array(),
     "type" => array(),
     "wmode" => array(),
     "width" => array(),
     "height" => array()
    );

    So got a solution, but could miss some code if it's not using a precise format.

    Seems a bit over the top that the importer would even use the functions to strip HTML tags etc... when you consider it's going to be the site owner importing posts. Take the span code with the id attribute, I've been using that code for well over a year and have never had WordPress delete the id when added to a post or a comment directly (like I mentioned had an issue with another plugin stripping the code). So for the importer to strip the code is more than WordPress core does normally when creating a post.

    If anyone knows a way to completely disable the code in the /wp-includes/kses.php file when the importer is running please post a solution.

    David

Topic Closed

This topic has been closed to new replies.

About this Plugin

About this Topic

Tags