Support » Networking WordPress » Problem with some images after import from other MS

  • prebennor

    (@prebennor)


    Hello,

    I work on a multisite installation where we’re in the process of importing a new blogger. The blogger have a lot of content and the entire WXR export is quite large and divided in several smaller files.

    We have noticed an issue with some of the images we’re unsure on how to solve. The issue is that some images still point to the old domain for the uploads. So for example in the export of the attachments, this is the guid and attachment_url given for the attachment post (I have anonymized the URLs)

    <wp:post_id>10051</wp:post_id>
      <guid isPermaLink="false">https://previously.com/uploads/sites/<prev-ms-blogid>/2020/11/imagename.jpg</guid>
    <wp:attachment_url>https://previously.com/uploads/sites/<prev-ms-blogid>/2020/11/imagename.jpg</wp:attachment_url>
      <wp:post_parent>10050</wp:post_parent>

    This URL (before anonymizing the URL) gives an image on the old domain. So far so good.

    And the parent id (the post that includes the image) have this HTML for the image inclusion in the export of that:

    <img class="alignnone size-full wp-image-10051" 
    src="https://previously.com/uploads/sites/<prev-ms-blogid>/2020/11/imagename.jpg" 
    alt="" width="7348" height="5472" />

    So it’s the exact same URL.

    After importing we would expect image to be available at:
    https://cdn.ourdomain.com/content/uploads/sites/<new-ms-blogid>/2020/11/imagename.jpg but it’s not.

    For some images it seems to use our version, but in a lot of instances the image src in post on our domain still refers to the old URL.

    When I go to review the folder /content/uploads/sites/<new-ms-blogid>/2020/11/ on our S3, it contains a lot of folders that are numbered, but I’m really unsure what the numbers refer to. For example “02063247/”, “02063306/”,
    “02063323/”, etc.

    Inside those folders are images, and in one of them (04123237), I eventually found imagename.jpg.

    So it’s not like we can do a simple search replace of https://previously.com/uploads/sites/<prev-ms-blogid&gt; in the posts to https://cdn.ourdomain.com/content/uploads/sites/<new-ms-blogid&gt; either, since the images are placed within a “randomly” numbered folder on our installation.

    Any knowledge bombs, tips or guidance anyone here can provide on how to resolve this situation? Some other questions:

    1) Why do a lot of the image urls in the posts still point to the old domain and not ours?

    2) How can we more efficiently test/debug this with a smaller sample of posts?

    3) Why have the images on our domain been put in a different folder structure?

    4) How can we debug this issue further and/or what further information can we provide in order to get better help here?

    Best regards, Preben

  • You must be logged in to reply to this topic.