We're finding that on some pages in some of our blogs, the link URLs to PDF files are coming across in the form of:
and other times in the form of:
The latter is wrong, of course, but also presents a problem; some of these blogs are meant to be confidential, and the blogs.dir form allows those URLs to be reached directly through an HTTP request and bleed out into Google, so that they are findable by people without proper access.
The two versions of the links can be found in the very same page sometimes.
How and why would these URL formats be revealed to a blog user in the first place? And what's to be done?
My working theory right now is to use a RegEx program to find all instances of the blogs.dir-style URLs and change them to the right format, then lock down the blogs.dir directory with an .htaccess file to prevent people from accessing them directly.
A couple of mysteries. One, why does this formatting happen in the first place;