WordPress.org

Ready to get started?Download WordPress

Forums

HTML Import 2
[resolved] Don't want to import selected tag or H1 heading (7 posts)

  1. mikal42
    Member
    Posted 2 years ago #

    Everything works fine except the following:
    All of the content I am importing is surrounded with a div with the ID "content"
    I therefore select this in the Content settings but instead of importing content BETWEEN the tags it imports the tags as well. Seeing as these tags are already in use it breaks the page layout! This seems rather dumb...
    Also each page has an h1 heading (as you would expect) but instead of being just used in the wordpress h1 class="entry-title" it gets repeated underneath this. This also seems rather dumb.

    Does anyone know a simple way around this, am I missing something somewhere?

    I have around 1600 pages to import and it is only these two items that are causing any problem. Surely there is a way around them!

    http://wordpress.org/extend/plugins/import-html-pages/

  2. Mark Tuttle
    Member
    Posted 2 years ago #

    I'm not the plugin author, but

    1. I think selecting the entire node <div id="content">...</div> is a reasonable design decision. Can you say why this is a problem? Do you now have two document nodes with the same id "content"? Is this as simple as modifying your theme to omit the extra <div id="content"> </div> wrapper?

    2. For the duplicate title, I suspect this is not a problem with the plugin. I suspect the problem is that your static pages (as mine did) contain both <title>title string</title> and <h1>title string</h1> and most WordPress themes repeat the title at the top of the body with a line like

    <h1 class="entry-title"><?php the_title(); ?></h1>

    So one quick solution is just to delete this line from the theme files. It is also possible to write a small script to iterate over the pages in the database to strip the initial <h1>...</h1> element from $page['post_content'] for each $page in the database.

  3. mikal42
    Member
    Posted 2 years ago #

    I've found some solutions now, firstly I used a text replacement tool to rename id="content" to id="pagecontent" (could have been anything really). A similar plugin for Drupal just takes the content within the div and not the div itself, which does seem a more sensible idea.

    I'm not sure about writing a script to get rid of the <h1>...</h1> element as I'm not really up on PHP.

    What I did do though was to set it in the CSS as display: none: and then used another declaration to overide it for the h1 entry title. Messy, but it works, thanks anyway...

  4. mikal42
    Member
    Posted 2 years ago #

    Renaming id="content" to id="pagecontent" was, of course in the original static files.

    Hopefully these comments will help anyone else with a similar problem...

  5. Stephanie Leary
    Member
    Plugin Author

    Posted 2 years ago #

    I'm not really happy with the fact that PHP's XML class (which is what the plugin uses to read your HTML file) includes the surrounding tag when selecting a node. I will try to remove it in a future version of the plugin.

  6. Mark Tuttle
    Member
    Posted 1 year ago #

    I did not appreciate the consequence of this problem until I tried importing files this month using the tag 'body' to select the whole web page for importing. Now the body tag appears in post_content in the database, and now WordPress is generating invalid html with one 'body' tag nested inside the WordPress 'body' tag.

  7. Stephanie Leary
    Member
    Plugin Author

    Posted 1 year ago #

    Yes, that's the most common situation where this becomes a huge problem. I'm working on a solution.

Topic Closed

This topic has been closed to new replies.

About this Plugin

About this Topic