• Dear all,
    maybe this question appears to be „dumb“ to some, yet I am totally stuck and maybe I am not able to see the forest for the trees anymore.

    The problem is handling links.
    However it is a bit more complicated than it looks on first sight (tl;dr further down).

    Short summary of the project:
    Relaunching a *very* old website. The old version is the kind that used framesets and was ‘designed’ with MS Frontpage nearly two decades ago.
    Underlying is a folder structure which contains pdf files that are linked to within the framesets.
    So far so good.
    Now comes the part where I have literally no idea any more:
    Keeping the old html-files is paramount. So it’s not as simple as to create a new custom post type or using articles.

    What I did:
    I created a template that gets the content of the html files (via file_get_contents), which works for me as of now.
    The issue however is that due to using relative paths, the actual links/URLs within the HTML files become altered due to the use of a wordpress page.
    Instead of
    https://domain.com/reports/filename.pdf
    they -of course- will be
    https://domain.com/page-name/reports/filename.pdf
    As already mentioned: I can’t edit the links within the WordPress page as I have to use and read out external html files.

    Now changing the (hundreds of) links manually or by script in the source files would be possible – if there wasn’t the need to actually be able to copy-and-paste the tables off the original html files, too (someone’s using a wysiwyg editor here – and no, it’s not me). Which means: Editing the original files with ../ before e.g. reports/filename.pdf would not work as the original links need to stay the way they are now. Which makes me wonder if there is *any* solution to that issue at all.

    tl;dr:
    – Forced to use external html files to get the content (via file_get_contents)
    – Due to WordPress pages -of course- adding their name to the URL, the original relative links in the html files (e.g. reports/filename.pdf) get changed from https://domainname.com/reports/filename.pdf to https://domainname.com/page-name/reports/filename.pdf when the WordPress page is being viewed
    – Changing/editing the relative links in the orignal html files from e.g. reports/filename.pdf to ../reports/filename.pdf is not an option as there still is a need to copy-and-paste the original tables off the original html files (via wysiwyg editor) to e.g. Excel.

    I have literally run out of ideas how to approach this and to get that link issue solved under these circumstances. Maybe someone of you has an idea or a helpful point of view as I am afraid that, well, I can’t see the forest for the trees as of now.

    Thanks in advance for your help.

    • This topic was modified 3 years, 11 months ago by Jan Dembowski.
Viewing 1 replies (of 1 total)
  • Moderator bcworkz

    (@bcworkz)

    It’s feasible to parse text being output with PHP to look for relative links and to add in a specific domain/path to create a correct, full absolute path.

    Getting file content for every page requested isn’t all that speedy compared to fetching content from a database. You might consider importing these files into WP. If that is done, fixing the relative paths can be done during import so that the text doesn’t need to be parsed over and over on every request.

    It should make managing content easier into the future if it’s stored as WP data instead of as files.

Viewing 1 replies (of 1 total)
  • The topic ‘Manipulating links of content being read out off external html files.’ is closed to new replies.