WordPress.org

Ready to get started?Download WordPress

Forums

HTML Import 2
[resolved] Problem importing pages with images (4 posts)

  1. Miquel
    Member
    Posted 1 year ago #

    Hello, I'm trying to convert my website into a WordPress site, and found your HTML IMPORT 2 plugin really promising. However, I'm experiencing problems on the import process because, after tunning the importer settings as much as I could, it seems to import well the HTML and some of the images referenced with absolute path, but not the images referenced with relative paths.

    So if the original page is:
    http://nostramar.org/marenostrum/acercade/colaboradores/aunland/index.htm

    And I import it into WordPress using your plugin, it renders:
    http://nostramar.org/acerca-de/colaboradores/aunland/

    It seems to skip importing all the images BUT the former header and footer (it was a Frontpage website) which were referenced as absolute path, and are still linked to that path. If I disable foreign images to be shown (with a Firefox setting) the result is a perfect HTML page with no images.

    By checking the original HTML to be imported, images are referenced by "relative path":

    <div align="left">
            <table border="0" cellpadding="0" cellspacing="0" width="350" align="left">
              <tr>
                <td>
                  <p align="center"><img border="0" src="Foto1_Desierto-Tunez.jpg" width="350" height="231"><br>
                  <font face="Verdana" size="2">En el desierto de Túnez</font></td>
              </tr>
            </table>
          </div>
          <p class="MsoNormal"><span lang="FR" style="font-family: Verdana">
          <font size="2">Hola,</font><o:p><font size="2"> </font>
          </o:p>
          </span></p>
          <p class="MsoNormal"><span style="font-family: Verdana" lang="FR">
          <font size="2">Me llamo...

    Please, advice on how to import images into media library automatically, I'm doing tests, but this is a relatively big website with over 1000 pages, so getting images as well as HTML would be great.

    Thanks in advance for your help, and for this unique plugin.

    http://wordpress.org/extend/plugins/import-html-pages/

  2. Miquel
    Member
    Posted 1 year ago #

    Ups. I forgot to include the equivalent imported code in the previous question:

    <div id="post-23147" class="post-23147 page type-page status-publish hentry">
    											<h1 class="entry-title">Annie Unland</h1>
    
    					<div class="entry-content">
    						<p><img src="http://marenostrum.org/imagenes/barrasuperior.gif" alt="M@re Nostrum"/></p>
    <p>  		  			  		          Annie Unland
    <p><img src="../../../imagenes/bandamed.gif"/>      </p>
    <p><img src="Foto1_Desierto-Tunez.jpg"/>En el desierto de Túnez</p>
    <p>        Hola,
    </p>
    <p>        Me llamo...

    You may observe no translation has been performed. After further studying the case I see there are other pages with the images properly imported, images and all. Why this difference? Pages correctly imported are pretty similar to those that are not properly imported.

    I have also observed that original pages have a folder hierarchy, but this is not completely respected during the import process (most of times it is, but not always), I see pages that lack the last hierarchy level:
    A page that after importing should be at
    http://nostramar.org/acerca-de/colaboradores/aunland/
    I found it to be imported at
    http://nostramar.org/colaboradores/aunland/
    so it must be manually adjusted, not a real problem.

    Thanks a lot (and excuse me for my English, which is not as good as it should be).

  3. Stephanie Leary
    Member
    Plugin Author

    Posted 1 year ago #

    This was a bug in 2.3. It should be fixed in 2.4. Let me know if you still have issues with images with absolute URLs!

  4. Miquel
    Member
    Posted 1 year ago #

    The images with absolute URLs now imports just fine, Thank you.

    I still have the issue with certain pages with the URL like this:
    http://nostramar.org/marenostrum/buceo/canarias/fuerteventura/index.htm
    that were linked by this URL in the text, but were linked like differently in the breadcrums, like this:
    http://nostramar.org/marenostrum/buceo/canarias/fuerteventura/

    And the result is lots of errors in the importing process and a page with no images at all:
    http://nostramar.org/buceo-2/canarias/fuerteventura/

    The source code for this imported page shows a properly converted absolute URL link, and many others kept as relative to the HTML original page (like bandamed.gif or e.gif):

    <div class="entry-content">
    <p><img src="http://nostramar.org/wp-content/uploads/2013/02/barrasuperior.gif" alt="M@re Nostrum"/></p>
    <p>Fuerteventura, jable y salitre<br/>por José Barrera Artiles<br/>Fotos de Rafael Herrero Massieu<br/>Publicado en la revista SCUBA Nº30, Enero 1997<p/>
    <img src="../../../imagenes/bandamed.gif"/>
    <p><img src="../../../imagenes/lletres/e.gif"/>l jable, las grandes extensiones de arena blanca, cubre cada uno...

    I don't know how HTML Import processes original links to a given page, but I guess there's conflict with those pages that are accessed by name (with "index.htm") and also could be accessed by path (without "index.htm"). To the original web server they were the same page, but to HTML Import this seems to be a problem.

    When the imported page is not named "index.htm" in the original server, all these images are imported flawlessly.

    Do you think I must disable "breadcrumbs" in the original site so these pages could be properly imported? Or is this a coding problem that could be solved?

    Thanks again for your help and support.

Topic Closed

This topic has been closed to new replies.

About this Plugin

About this Topic