WordPress.org

Forums

Deleting Post Revisions: do NOT use the a,b,c JOIN code you see everywhere (17 posts)

  1. kitchin
    Member
    Posted 1 year ago #

    Post revisions bother some people, especially on larger sites because they add lots of rows to wp_posts. A common bit of MySQL code posted in these forums, plugins and elsewhere goes like this:

    ### do not use: ###
    # DELETE a,b,c
    # FROM wp_posts a
    # LEFT JOIN wp_term_relationships b ON (a.ID = b.object_id)
    # LEFT JOIN wp_postmeta c ON (a.ID = c.post_id)
    # WHERE a.post_type = 'revision';

    The problem here is that wp_term_relationships is used for both posts and links, two different tables. Sometimes the object_id refers to wp_links.link_id, not wp_posts.ID. And object_id's are not unique in wp_term_relationships. So there is a chance, especially with lower-numbered wp_post.ID's, that you will delete a relationship needed to make Dashboard / Links work. The DELETE will remove two rows that match object_id, one for a post and one for a link.

    One plugin with this bug is
    http://wordpress.org/plugins/better-delete-revision/

    Another plugin just deletes revisions and does not clean up meta and terms at all, which is not ideal:
    http://wordpress.org/plugins/delete-revision/

    These plugins try to do things the right way, by using WP's own term functions:
    http://wordpress.org/plugins/rvg-optimize-database/
    http://wordpress.org/plugins/revision-control/

    Here is a proposal to put a revision deletion function in WP core:
    http://wordpress.org/ideas/topic/native-function-to-delete-post-revisions

  2. kitchin
    Member
    Posted 1 year ago #

    If you want to use MySQL to cleanup tables instead of a plugin, here are some queries that will work better than the a,b,c code. Be careful, WP can be extended in all kinds of ways, and remember, you are deleting stuff! Also, if you are using a caching plugin, you will need to clear that in the plugin, due to: http://codex.wordpress.org/Class_Reference/WP_Object_Cache#Persistent_Caching

    The strategy is to delete revisions and then cleanup stray meta and terms in separate queries.

    DELETE
    FROM wp_posts
    WHERE post_type = 'revision';

    ########################
    First, wp_postmeta.
    ########################

    The good plugin above (rvg-optimize-database) uses

    DELETE FROM $wpdb->postmeta
    WHERE post_id NOT IN (SELECT ID FROM $wpdb->posts)

    That seems pretty good. You'll have the sub the actual tables names if you're doping this in PHPMyAdmin or elsewhere outside of WP. I had:

    SELECT *
    FROM wp_postmeta pm
    LEFT JOIN wp_posts p ON pm.post_id = p.ID
    WHERE p.ID IS NULL;
    # WP does a good job bookeeping postmeta,
    # so you probably will have zero results here, even after deleting revisions.
    # I tested a wp_postmeta with 2.5 million rows.
    # If you do find strays, you know how to "DELETE pm" them.

    ############################
    Now, wp_term_relationships.
    ############################

    The rvg plugin uses a WP function that handles caching and other issues. Here's a a shortened version:

    $tags = get_terms('post_tag', array('hide_empty' => 0));
    for($i=0; $i<count($tags); $i++) {
      if($tags[$i]->count < 1) {
        wp_delete_term($tags[$i]->term_id,'post_tag');
      }
    }

    Looks like rvg's plugin does not handle categories and custom taxonomies though.

    Now here's a full MySQL bleary-eyed technique.

    If you want to be bold and assume "link_category" is the only non-post
    use of wp_term_relationships, then do this (use PHPMyAdmin so you get paging):

    SELECT *
    FROM wp_term_relationships tr
    LEFT JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id
    LEFT JOIN wp_terms t ON tt.term_id = t.term_id
    LEFT JOIN wp_posts p ON p.ID = tr.object_id
    WHERE p.ID IS NULL AND tt.taxonomy != 'link_category';
    # If you do find strays, you know how to "DELETE tr" them.

    Under the same assumption, here are stray terms with no posts, such as empty categories:

    SELECT *
    FROM wp_terms t
    LEFT JOIN wp_term_taxonomy tt ON tt.term_id = t.term_id
    LEFT JOIN wp_term_relationships tr ON tr.term_taxonomy_id = tt.term_taxonomy_id
    WHERE tr.object_id IS NULL;
  3. kitchin
    Member
    Posted 1 year ago #

    And here's the careful way. I should mention, none of my code handles comments and commentmeta's. Those tables are handled in the rvg plugin.

    The more careful way is to look at each taxonomy, and don't assume "not a link_category" means "post".

    Look at wp_term_relationships. All it has is numbers, so you need to JOIN to find out which rows describe wp_links and which are for wp_posts. The key idea here is that wp_term_relationships is only used for taxonomies (we hope), so first get a list of those:

    SELECT taxonomy, COUNT(taxonomy)
    FROM wp_term_taxonomy
    GROUP BY taxonomy;
    ## here are my results ##
    # category 		10
    # link_category 	3
    # post_tag 		14705
    # foo_menu		...
    # bar_taxonomy		...

    "foo_menu": If you have a custom menu it will be a taxonomy, and the menu items will be posts, with a custom post_type. For example "foo_menu" will be in wp_term_taxonomy.taxonomy, and there will be a corresponding "foo_menu_item" in wp_posts.post_type. That is, one row in wp_term_taxonomy, and multiple rows in wp_posts.

    "bar_taxonomy": Easier. If you have a custom taxonomy, it will usually act just like the built-in "post_tag" and "category" taxonomies. A wp_posts row can be assigned to a "bar_taxonomy" item, and the post_type could be "post", "page", "revision", etc.

    Recall post_type's:

    SELECT post_type, COUNT(post_type)
    FROM wp_posts
    GROUP BY post_type;
    # post 17061
    # page 9
    # attachment 6128
    # revision 27185
    # foo_menu_item ...

    You could also have custom post_type's not related to menus.

    OK, now let's start looking for strays.

    Use the "bold" query above as a model, but limit it to known taxonomies.

    SELECT *
    FROM wp_term_relationships tr
    LEFT JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id
    LEFT JOIN wp_terms t ON tt.term_id = t.term_id
    LEFT JOIN wp_posts p ON p.ID = tr.object_id
    WHERE p.ID IS NULL AND tt.taxonomy IN ('category', 'post_tag');

    Then there is

    SELECT *
    FROM wp_term_relationships tr
    LEFT JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id
    LEFT JOIN wp_terms t ON tt.term_id = t.term_id
    LEFT JOIN wp_posts p ON p.ID = tr.object_id
    WHERE p.ID IS NULL AND tt.taxonomy IN ('foo_menu', 'bar_taxonomy');

    If you're sure those taxonomies only connect to wp_posts, you can "DELETE tr" any results.

    If you find strays and delete them you could check for dead terms too, see the "bold" query above.

    How about "link_category"'s? Those JOIN to a different table:

    SELECT *
    FROM wp_term_relationships tr
    LEFT JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id
    LEFT JOIN wp_terms t ON tt.term_id = t.term_id
    LEFT JOIN wp_links k ON k.link_id = tr.object_id
    WHERE tt.taxonomy = 'link_category';

    And there's nothing to delete because wp_links does not have a concept of revisions.

  4. kitchin
    Member
    Posted 1 year ago #

    Ah, one more thing. The other good plugin above uses this nice code to clean up all taxonomies, including tags, categories and custom.

    http://wordpress.org/plugins/revision-control/

    function delete_terms($revision_id, $rev) {
      if ( ! $post = get_post($rev->post_parent) )
        return;
      // Delete the parent posts taxonomies from the revision.
      wp_delete_object_term_relationships($revision_id, get_object_taxonomies($post->post_type) );
    }

    Assuming it works, that is great. Taxonomies know what post_type's they are allowed in.

  5. Manuel Razzari
    Member
    Posted 1 year ago #

    You're right that the "a,b,c" approach is very naïve, and it's shocking to see how many plugins replicate this bug. I was just looking a "wp-clean-up" plugin which simply does a DELETE WHERE post_type = 'revision', not caring about any of these details.

    The delete_terms function in revision-control is odd as well, as wp_delete_post_revision is already calling wp_delete_object_term_relationships, so I'm not sure it's needed at all.

    That said, it's more likely that simply deleting from the posts table is enough, as usually revisions don't touch other tables. But in case they do, we need to do it the right way!

    So, armed with the knowledge from your post, I think all we need is to let WP take care of this the native way, which will also trigger deletion hooks which some plugins may use:

    // Get all revisions in this blog - but not autosaves, which may include valuable content.<br />
    $revision_ids = $wpdb->get_col($wpdb->prepare( "SELECT ID FROM $wpdb->posts WHERE post_type = 'revision' AND post_name NOT LIKE '%-autosave%'") );<br />
    foreach ($revision_ids as $revision_id){<br />
       // This will invoke wp_delete_post, which takes care of related meta, taxonomy terms, etc.<br />
       wp_delete_post_revision($revision_id);<br />
    }

    I've wrapped this up in this snippet which deletes all revisions, including all blogs if it's a multisite network).

  6. kitchin
    Member
    Posted 1 year ago #

    Yes! Using the WP functions is better than calling MySQL DELETE directly. Let WP try the various cleanup and consistency things it does in wp-includes/post.php. For my purposes, the WHERE clause is

    WHERE post_type = 'revision' AND post_date < NOW() - INTERVAL 1 YEAR

    instead of the thing about autosaves.

  7. kitchin
    Member
    Posted 1 year ago #

    Actually your snippet *only* works on multisite. Otherwise $wpdb->blogs is not defined and you get a bad query. I have a fix but you probably have better one.

  8. Manuel Razzari
    Member
    Posted 1 year ago #

    You're right, I hadn't bothered to test the snippet on a single-blog install :$

    I've now updated the snippet with support for both single site and multisite.

  9. gWorldz
    Member
    Posted 1 year ago #

    How difficult would it be to turn this into a plugin and add to tools menu?

    If it was a plugin what needs to be considered about the code. Are there any issues timeouts?

    Is there any chance it could display all teh revisions that it removes during the *wait* period?

    Sorry for resurrecting an old post but this was an elegant solution to my problem and having it packaged up in a nice little plugin I could recommend to clients would be awesome.

    How hard would it be to get it to leave a set number of revisions for each post?

  10. kitchin
    Member
    Posted 1 year ago #

    The proper way to do it is with Ajax calls so you watch as it works and the page does not have to reload. You've pointed out the right issue, that's for sure. You don't know scale of the task when you're designing the logic.

    You see the Ajaxy thing in Dashboard, for instance if you update a bunch of plugins or themes all at once.

  11. Name.ly
    Member
    Posted 1 year ago #

    If you are using multisite installation, then this code might be useful to disable revisions only on some blogs:

    // Disable revisions in some particular theme /wp-includes/revision.php
    add_filter ( 'wp_revisions_to_keep', 'my_theme_wp_revisions_to_keep', 999, 2 );
    function my_theme_wp_revisions_to_keep ( $num, $post ) {
    	return 0;
    }
  12. sootsnoot
    Member
    Posted 12 months ago #

    @Manuel Razzari I notice the current snippet calls switch_to_blog() to access the revisions in each particular blog of a multi-site installation. But it doesn't call restore_current_blog() to switch back after each one.

    According to the codex (and things I've read on the web and seen in the code), although it might work for what you're doing, it's really not correct usage of the API, and could have undesirable consequences.

    Just before your return from function delete_revisions_for_blog, you really should add:

    if (is_multisite()){
    		restore_current_blog();
    	}
  13. Manuel Razzari
    Member
    Posted 12 months ago #

    @Sootsnoot,

    Please note that this snippet is not a plugin, instead it's meant to be called as a standalone script... Like http://example.com/tmp/multisite_delete_revisions.php.

    So there's really no point in "restoring" anything, as the script will just die in a blank "Done!" page after succeeding.

  14. sootsnoot
    Member
    Posted 12 months ago #

    I agree that there's no practical point to restoring resources before exit.

    However, at least in principle the switch_to_blog() api is specified to work correctly only one level deep. By not calling restore_current_blog() before another call to switch_to_blog(), the subsequent calls to switch_to_blog() are "allowed to fail". Plus posting fragile code that others might copy for use in a different context isn't a very good idea.

    That's how this thread got started with the "a, b, c JOIN" code that worked okay in a very limited context, and then spread like wildfire to contexts where it wouldn't work. Only three extra source lines to code it in exemplary form :-)

  15. Manuel Razzari
    Member
    Posted 12 months ago #

    What you're saying makes total sense in the context of a plugin.

    I'd think that if someone puts this code into a context where this matters —ie, a plugin— they'll be careful enough to restore the blog, as this code will noticeably break their site.

    Switch_to_blog is an expensive (slow) function call, adding restore_current_blog would duplicate that effort. This can add up pretty quickly, as we're potentially looping through a large list of blogs here.

    Maybe what I should be doing is adding your suggestion, but in a comment...?

  16. kitchin
    Member
    Posted 12 months ago #

    And just to be clear "a, b, c JOIN" is bad code even without multi-user.

    There's a grand plan to rewrite the WP taxoxonomy code. I wouldn't predict any real API changes til WP 5.0 at the earliest! Look how grand it is:
    http://make.wordpress.org/core/2013/07/28/potential-roadmap-for-taxonomy-meta-and-post-relationships/
    Summary: the reason for two term tables never led to anything (taxonomies to share items). And it causes a bug when you rename tax items because item identities are not fixed.

    One use of the complicated taxonomy tables did get implemented: WP Links. That feature is now available only if you are upgrading an old WP, or get the official legacy plugin. I have one customer who really likes WP Links.

  17. sootsnoot
    Member
    Posted 12 months ago #

    @Manuel Razzari Yes, a comment would make me happy :-)

    In my particular use case, I only want to purge revisions for a small fixed set of "system" blogs. The potentially large number of user-created blogs have revisions limited to 20, and I never plan to purge them. So the time needed to run the script is not a major concern.

    BTW, my multi-site is based on subdirectories rather than subdomains, so I had to select path as well as blog_id and domain in the get_results call on the $wp_>blogs table, and display it along with the domain in the output, in order to make the output meaningful.

Topic Closed

This topic has been closed to new replies.

About this Topic