Support » Plugin: Media Cleaner » Tricky to use, but does its job

  • So here was my challenge: I had a feed aggregator with over 100,000 syndicated posts and around 300 GBytes of media files, many of which unattached, broken, or inaccessible, mostly because of changes made with some plugins (i.e. adding thumbnails automatically) or changing themes (since this often requires regenerating thumbnails and leaves ‘old’ images around). Some plugins also attempted to ‘optimize’ the images, but often failed and left garbage around. The situation became critical when I couldn’t sync the server with a secondary server, and my automatic backup mechanism failed because there were simply too many big files to backup…

    Media Cleaner to the rescue! 😸

    Now, Media Cleaner is not a magical tool. It does a lot of stuff automatically trying to fix most of the problems, by searching the database over and over again, see which attachments are in sync with posts (and which are not) and vice-versa, and so forth. Its memory requirements and overall speed depends on how messed up your installation is; you can tweak a few settings (essentially trading off processing speed by avoiding timeouts) but it’s not very trivial to figure out what works best in your specific scenario. I really spent a lot of time tweaking not only Media Cleaner but also the underlying server — WordPress, web server, PHP, etc. — to deal with the additional demand of memory and trying to avoid timeouts.

    It still wasn’t an easy task. To be sincere: I took 5 days or so until Media Cleaner, out of those 100,000 broken images & attachments, managed to show me some 1,600 requiring manual intervention. Does that mean that Media Cleaner did all the rest by itself? Not even close: I had to do basically everything on my own, manually logging in to the server and deleting files (I tried the free version of the plugin; allegedly, the pro version is able to delete files by itself, but only if it can actually find them… the vast majority of the files I had would not be detectable by Media Cleaner anyway, because they were completely orphaned), directly manipulating complex queries on the database to manually delete a lot of unnecessary things, and writing a few scripts to automate parts of the process when it simply took way too long to do things manually.

    Media Cleaner has a very annoying issue: dealing with timeouts. It is a properly coded plugin, which does not rely on a ‘page refresh’ to do a lot of issues (which would invariably fail with a timeout). Instead, like other plugins such as Regenerate Thumbnails and Better Search and Replace, it relies on doing ‘background’ tasks while giving visual feedback without requiring page refreshes, which is the appropriate way to deal with long-running tasks in this day and age (one wonders why WordPress itself does not do things that way…). However, there is a catch-22: it assumes that things will ‘always work’ after tweaking the performance settings; but you cannot say which settings will be correct before actually performing the first scan.

    Jeordy Meow did at least address this issue, but the solution is sub-optimal (at least for the version tested in early 2020!). When ‘something’ fails (either due to a timeout or an underlying error which Media Cleaner is unable to fix automatically), a nice dialogue box pops up, giving you the option either to ‘retry’ or to ‘ignore’ the entry it was currently processing.

    This has two drawbacks: firstly, you have no idea which entry is currently being processed; and, in my case, I had no clues about what was causing the errors (Media Cleaner can also write its own debugging logs, but… mine were always empty); and, secondly, ‘retry’ will only work if, say, you’re able to make a change in the underlying system before continuing (for example, imagine that you suddenly got out of disk space; you may be able to launch a terminal, delete some superfluous files elsewhere, and attempt a ‘retry’). ‘Continue’ is the way to go if you have an utterly broken entry, and you will not lose the scanning work done so far by Media Cleaner. This is a rather robust solution: it works across the many processing stages, and you can pick up from where you were, even if you happen not to be in front of your desk when the dialogue box pops up.

    With perversely garbled websites like my own, Media Cleaner will stop a lot of times without being able to process any entries. This can happen at ‘any’ time, so, unless I spend all the time glued to the screen, ready to click the ‘ignore’ button, it will stop and wait for the user to make a choice. This means that the whole procedure, which will take several hours on a large site, will take even longer if you’re not always looking at the Media Cleaner admin page. A far better option would have been to create a retry/ignore option somewhere (like in many applications, where you get an option to ‘remember the last choice’) so that you can safely abandon your workstation and do other things while you patiently wait until Media Cleaner is finished…

    Allegedly, you can schedule these tasks with the pro version, but I can imagine that if a scheduled task ‘fails’, it will simply stop, to be run the next time — and not really ‘fix’ anything.

    It’s arguable that ‘timeouts’ may be fixed so that Media Cleaner will never encounter any; but while this may be true for smaller-sized sites, the problem with sites with 100,000 entries or more is that there is no way to set up ‘enough’ memory and a sufficiently long timeout. At some point, you will encounter physical limits which cannot be changed; in such cases, the only option is somehow instructing Media Cleaner to keep on going and do whatever it can do, skipping everything it cannot fix, but writing ‘somewhere’ what it could not fix automatically so that a human being can address these issues manually. If out of 100,000 entries just a few dozens are left out… well, that is manageable by a human. In my case, I have absolutely no idea what Media Cleaner actually did. I know that, at the end of those 5 days, the scan finished, and I had 1,600 images to review manually (none of which were actually correctly flagged, but that’s not Media Cleaner’s fault, just the way a certain plugin actually juggles images around, thus confusing Media Cleaner). I have no idea what happened to the rest of the images; nor how many had been actually ‘fixed’. There is simply no feedback whatsoever on what Media Cleaner actually does (at least not in the free version!).

    At the end of the day, as said, I had to manually do all the deleting by myself, from a terminal, working with scripts or sometimes simply with direct database queries. Media Cleaner wasn’t especially useful in my specific case and it wasted some five days of work.

    Then again, as said, the particular setup of this website (running for over a decade!) meant that everything was garbled beyond the ability to automatically fix things. To give you an idea of how bad it was: besides dealing with media files, I also had to get rid of actual blog posts, which required a different plugin (Media Cleaner, as the name implies, only deals with media — even though, from the point of view of WordPress, ‘media’ — an attachment — is just a ‘different’ kind of post; technically one would assume that a tool that works for one post type could work with any other post type…). I actually tried two different ones — both of which had a real tough time in deleting all useless posts, and none had any performance-tweaking options as Media Cleaner has. The only thing one could do was to delete, say, 1,000 entries at a time — which takes impossibly long to do if you have hundreds of thousands of entries. So, compared to those plugins, Media Cleaner was much better in handling the issue; just because I had no success in ‘fixing’ my ‘impossible’ situation, it doesn’t mean that it cannot fix yours — as you can see from the reviews, a lot of people routinely have success with Media Cleaner!

    In conclusion: from my point of view, there are some rough edges on Media Cleaner which are annoying. I can imagine that a few of these issues are solved in the pro version — but it’s never a good idea to be forced to pay for getting a ‘solution’ for what appears to be a UI design failure. Not having tested the pro version, I cannot say if that would fix my problem; I can just say that I’d seriously suspect that it wouldn’t. Media Cleaner pro is not cheap, for something that doesn’t guarantee any success. I have a few ideas on how Jeordy could improve his business model so that people using the free version of the plugin feel confident that the pro version is really worth the price, but I’m not going into that…

    Still, the overall interface feels very polished, it’s indeed a very complex plugin which took a lot of time to write and debug, and just because it didn’t work in my case, it would be most unfair to rate it badly. I gave it four stars
    because I think that a few things could be more polished when dealing with the timeout issue I mentioned (i.e. adding an option to ‘always ignore’ or ‘always retry’ so that you can do unattended recoveries) and a lot more feedback ought to be given (what’s the point in having a ‘special’ log file if actually nothing interesting is written to it…?) — at the very least, a list of media attachments that were processed and a list of those which were not processed would be a requirement to get five stars. In other words, one ought not to spend five days trying to see what Media Cleaner can do with a garbled WordPress installation and have no clue, at the end of the day, what exactly has been fixed and what hasn’t.

    Oh, last but definitely not least, I really adore the cartoons used by Jeordy Meow! 😸

  • You must be logged in to reply to this review.