Support » Everything else WordPress » media library sort by image size, duplicates

  • Resolved todindiana

    (@todindiana)


    I’m trying to clean up my media image library before installing a plugin that will create virtual folders so I can bring some order to things. I also plan on using an app that will reduce the image size to w=650 before uploading images.

    I use an iPad Air as my primary device so screen space is limited, as are multiple windows.

    What I’d like to do is sort the images by size – showing me large files that I need to remove.

    The other request is to sort by similarity so I can easily find duplicates.

    I know that I can root through various folders via FTP but that might lead to other problems.

    Any suggestions for finding a way of sorting by size and (hopefully) by similarity?

    Thank you,

    Tod

Viewing 6 replies - 1 through 6 (of 6 total)
  • Here are some plugins I found useful:
    Admin Cols or Admin Columns Pro.
    Media Deduper or Media Deduper Pro.

    (If your library is as big a mess as mine, you’ll want:
    Admin Cols Pro
    Media Deduper or Media Deduper Pro
    WP Sheet Editor – Media Library (no free version available)
    )

    Admin Cols PRO lets you sort by size or ANY field, and lets you see fields the Library normally does not show you.
    – But, Media Deduper Pro may be all you need.
    The free version will index your entire media library and list any items that are exact copies. You can then select the ones you want to delete and be done.
    HOWEVER, it will not stop you from deleting the version that is attached to a post. So if those “attachment” status is useful to you (it is not to me), you will need to be extra careful.
    Their PRO version ensures that you do not delete the version that is “attached” or can update the post so the version you keep is the newly registered “attached” image.

    However, there may be other info associated with an image that does not show up in the Media Deduper “Manage Duplicates” page.

    So I recommend Also Use the Admin Columns plugin.

    First try Admin Columns to see how it works.
    Basically you can add additional custom columns to Media, Posts, Pages, etc.
    For example, if you have All In One SEO plugin, you can add columns showing the SEO spiel you wrote for that image, and you can edit the SEO blurb right from the media page!

    ALTERNATIVE 1: (cheaper option, but a bit more limited);
    Just use the Admin Columns (not Pro version) to scan info.
    1. Use Media Deduper Pro plugin to locate/list the duplicate files.

    (unfortunately, this plugin does not show info from as many fields as Admin Cols, and Admin Cols cannot customize the Media Deduper “Manage Duplicates” page under the Library. Hence I recommend checking those other fields via Admin Cols. Note that even Admin Cols may not show EVERY possible bit of info another plugin has about the image, but it is comprehensive for most use cases).

    2. Copy the Title or filename of one of the duplicated files (just the part of the filename/title that is similar to all duplicated files).

    3. Open Library page and search for the text you copied. This will limit the files the library shows you to those duplicates (plus any similarly named/titled files).

    4. Scan all the columns (of the duplicate files you’re interested in) to ensure you will not loose any data.
    Copy over any missing data into the file you want to keep.

    5.Then go back to Media Dedupe Pro and delete the duplicated files from there.

    Alternative 2: (uses Admin Cols PRO)
    Admin Columns Pro allows you to SORT on ANY column.
    You can also inline edit or Bulk Edit most columns.
    It also has advanced filtering to help out.

    If you have Media Deduper installed and activated, you can look at the hash value and image size MEDIA DEDUPER calculated for each image when it ran its index.
    To get that you add 2 columns: Custom -> mmd_hash and Custom: -> mmd_size.

    So…with Media Deduper or Media Deduper Pro AND Admin Columns Pro,
    you can SORT on mmd_hash to group all duplicates together (or filter on filename or mmd_hash to limit your list)

    Then scan ALL the columns of items with the SAME mmd_hash (or mmd_size) to see if there are any differences.
    You can then inline, fields on the image you want to keep, to insert data from duplicates that you are going to delete, before deleting them.

    You can also bulk edit fields for similar, but different images. Say you have a different crop, or a slightly different angle, but you want the “Description” or “AIOSEO Title” for all the images to be the same. You can bulk edit right there.

    • This reply was modified 4 months, 1 week ago by SherylHohman. Reason: formatting, probably some text edits too, but do not remember what
    Thread Starter todindiana

    (@todindiana)

    @sherylhohman
    Thank you for taking the time to write such an extremely detailed set of solutions.

    I bought the pro version of Media Deduper, and installed the free version of Admin Cols. The first thing I noted was that none of my images were listed as attached to any of my sliders or posts!

    I was about to buy the pro version for $90 (!!) and add the two custom columns you suggest when suddenly my site started displaying 508 error (Resource Limit Reached), so I’m locked out for now.

    I’m going to chat with my host’s tech support later to get that sorted out.

    I’ll update this when I’ve added the columns and looked to see where images are attached.

    Thanks again! You are tonight’s hero!

    Tod

    Thread Starter todindiana

    (@todindiana)

    @sherylhohman Later…

    You wrote

    If you have Media Deduper installed and activated, you can look at the hash value and image size MEDIA DEDUPER calculated for each image when it ran its index.
    To get that you add 2 columns: Custom -> mmd_hash and Custom: -> mmd_size.

    I am stuck here. Where do I find these two column types? I have the Admin Cols Settings screen open and see the way to add columns at the bottom of the list. However from there I select Add Column, then click on the Type, which drops down a list of field types, but these two aren’t listed.

    Can you provide some deeper details?

    NB: I’m using an iPad Air to do all this.

    Thanks,

    Tod

    Thread Starter todindiana

    (@todindiana)

    @sherylhohman

    I’ve been contact with the dev of Admin Cols, who guided me through adding mmd_hash and mmd_size columns.

    He was extremely helpful and I’ve now got things set up as you suggested above. Your and the dev’s help have done a lot in bringing order to my media chaos.

    Thank you for your help.

    Tod

    SherylHohman

    (@sherylhohman)

    Fantastic! I just now saw your string of updates to this thread. Incredibly happy I was able to help in some way.

    Below is interesting info, for geeks. It may or may not be useful or interesting to most people.

    Note that I have since learned that Media Deduper MAY be using MD5 algorithm for is hash function. It turns out that MD5 is the least accurate of the standard hash functions, in terms of indicating exact file matches. It is also the fastest.
    It can give two files are very similar, but not exact matches, the same hash result. That is good enough for most use cases. I have no idea if this could result in 2 files with a noticeable difference, being marked as the same”, or not. It’s also unclear how often actual “collisions” would occur. My guess is not often, else it would not be useful.
    I presume the plugin developers made the most useful/reasonable choice when they wrote this software.

    If you have run lossy compression algorithms on images (as an aside, by default WP degrades ALL images to 90% at upload, and it’s noticeable on most of my images. Photographers usually manually compress their images, and not knowing that WP further compresses their carefully tunes images results in puzzling disatisfaction and confusion), image compression plugins may use LOSSY (visual changes) or lossless (Zero degradation of the compressed image) compression techniques AFTER WP performs it’s LOSSY decorating at upload.

    If it’s important to detect the BEST quality images from slightly degraded images, MD5 may give some false results that you might care about. I do not know if visually different images can result in the same MD5 hash or not. Only that technically different bit-by-bit files can.
    Better algorithms are SHA-1, much better is SHA-256. Complete accuracy is Bit-by-bit.

    I was tested a LOSSLESS image compression plugin I hadn’t tried before on a small duplicate gif I was about to delete. It had a transparent background. The plugin squeezed (sqooze?) around 4% savings (again it was a small file to start with). But the resulting image thumbnail now had a black background, instead of white/transparent. TBF, the ACTUAL image still looked the same, just the WP Media Library thumbnail image was affected.
    However, the point is MD5 hash was the same for both files. The actual file size was not. Given the choice, i’d rather not keep the one showing a background (from the Media Library thumbnail image), unless it really did have a black opaque background.
    Other files may have less apparent differences from the thumb image, yet have a noticeable difference in the actual full size image.

    Whether it could have a noticeable/visible difference (to a discerning photographer), yet have the same MD5 hash, I do not know.
    I’m just putting this here as thorough information, for anyone who may find it insightful (and for my future reference/notes on the topic), whether for theoretical or practical use.

    Also note that I have not yet looked into Deduper’s plugin code to confirm which algorithm they use for sure (I assume MD5 b/c the field name, and because my experience suggests it’s at least not a bit-by-bit comparison.)

    Again, it’s unlikely to matter to most people. The trade-off is, the more accurate the hash function is (as a proxy for EXACT duplicates vs very similar) the longer it takes to run on each file. For large libraries it’d likely be unreasonable to run a hash algorithm that did not result in hash collisions.
    In rare cases where one might want to discern exact duplicates, you’d need to use different software, and maybe want to run it on a small subset of images at a time, not a whole library.

    Unix/Linux OS has built in tools for that. On Windows, there is a FANTASTIC file explorer called XYPlorer. It’s so amazing, IMHO, it ought to be standard EVERYWHERE. In fact I once decided against switching to Apple/Mac as my main computer, in part because XYPlorer is ONLY available for Windows…
    Anyway, it has a duplicate file finder, which allows you to select from 1 of about a half dozen hash functions to use for image comparisons. It also lets you choose how close of a match you care about (90%, 100%, etc), among other things.

    One other trivia tidbit about hash and IMAGES, that does not hold true for other file types: The more similar the image, the more similar the resulting hash. So sorting by hash, actually sorts by image similarity. (for any given hash function).
    Changing a single word in a text file, on the other hand, can result in a wildly different hash of the files.

    Again, this post is for informational purposes – I wish it was specified in the about for the plugin, but that’s because I’m a bit of a geek, and value info. I suspect it likely doesn’t matter to most. Also, I do not even know for certain know what function they use. It’s just good to know there are different functions, and that the tradeoffs are speed vs accuracy (false matches).
    I do not know if the difference in accuracy has any impact (or even what the collison frequency is). I expect the accuracy impact is likely rather low, and the speed savings is likely significant.

    All the Best!

    Thread Starter todindiana

    (@todindiana)

    @sherylhohman

    Hey, thank you very much for the interesting and geeky details. I understood most concepts you explained, but the finer details are above my level.

    It was interesting to read about the details of uploading an image to WP. Do you have any thoughts about uploading by a third party app? I have one where I can embed an image directly into the text, and the app will publish the whole post – text and image. I’ve also used a couple apps that batch resize images from their original 3k x 4k px to 550 x 700 px, and upload them. I assume that this process is lossy as the apps are targeted at us iPad amateurs (i.e cheap!)

    My sites don’t need the highest quality images as they are there for family viewing, so a certain level of loss or degradation is acceptable.

    I’d give you a couple links to my sites but not through this public forum. Is there any way to do PMs or exchange email addy’s? That way you could take a peek at what I’ve done imagewise.

    Slightly OT: do you know of a plugin or other tool that will reveal where an image is used? The standard WP column “Uploaded To” is totally useless. If I could browse through my images and know if they are used by a gallery or slider, that would be a huge help in culling images. The dev at Admin Cols says Admin Cols doesn’t do this.

    Thanks again for all your details and advice.

    Tod

Viewing 6 replies - 1 through 6 (of 6 total)
  • You must be logged in to reply to this topic.