WordPress.org

Ready to get started?Download WordPress

Forums

Deleting Post Revisions: do NOT use the a,b,c JOIN code you see everywhere (11 posts)

  1. kitchin
    Member
    Posted 11 months ago #

    Post revisions bother some people, especially on larger sites because they add lots of rows to wp_posts. A common bit of MySQL code posted in these forums, plugins and elsewhere goes like this:

    ### do not use: ###
    # DELETE a,b,c
    # FROM wp_posts a
    # LEFT JOIN wp_term_relationships b ON (a.ID = b.object_id)
    # LEFT JOIN wp_postmeta c ON (a.ID = c.post_id)
    # WHERE a.post_type = 'revision';

    The problem here is that wp_term_relationships is used for both posts and links, two different tables. Sometimes the object_id refers to wp_links.link_id, not wp_posts.ID. And object_id's are not unique in wp_term_relationships. So there is a chance, especially with lower-numbered wp_post.ID's, that you will delete a relationship needed to make Dashboard / Links work. The DELETE will remove two rows that match object_id, one for a post and one for a link.

    One plugin with this bug is
    http://wordpress.org/plugins/better-delete-revision/

    Another plugin just deletes revisions and does not clean up meta and terms at all, which is not ideal:
    http://wordpress.org/plugins/delete-revision/

    These plugins try to do things the right way, by using WP's own term functions:
    http://wordpress.org/plugins/rvg-optimize-database/
    http://wordpress.org/plugins/revision-control/

    Here is a proposal to put a revision deletion function in WP core:
    http://wordpress.org/ideas/topic/native-function-to-delete-post-revisions

  2. kitchin
    Member
    Posted 11 months ago #

    If you want to use MySQL to cleanup tables instead of a plugin, here are some queries that will work better than the a,b,c code. Be careful, WP can be extended in all kinds of ways, and remember, you are deleting stuff! Also, if you are using a caching plugin, you will need to clear that in the plugin, due to: http://codex.wordpress.org/Class_Reference/WP_Object_Cache#Persistent_Caching

    The strategy is to delete revisions and then cleanup stray meta and terms in separate queries.

    DELETE
    FROM wp_posts
    WHERE post_type = 'revision';

    ########################
    First, wp_postmeta.
    ########################

    The good plugin above (rvg-optimize-database) uses

    DELETE FROM $wpdb->postmeta
    WHERE post_id NOT IN (SELECT ID FROM $wpdb->posts)

    That seems pretty good. You'll have the sub the actual tables names if you're doping this in PHPMyAdmin or elsewhere outside of WP. I had:

    SELECT *
    FROM wp_postmeta pm
    LEFT JOIN wp_posts p ON pm.post_id = p.ID
    WHERE p.ID IS NULL;
    # WP does a good job bookeeping postmeta,
    # so you probably will have zero results here, even after deleting revisions.
    # I tested a wp_postmeta with 2.5 million rows.
    # If you do find strays, you know how to "DELETE pm" them.

    ############################
    Now, wp_term_relationships.
    ############################

    The rvg plugin uses a WP function that handles caching and other issues. Here's a a shortened version:

    $tags = get_terms('post_tag', array('hide_empty' => 0));
    for($i=0; $i<count($tags); $i++) {
      if($tags[$i]->count < 1) {
        wp_delete_term($tags[$i]->term_id,'post_tag');
      }
    }

    Looks like rvg's plugin does not handle categories and custom taxonomies though.

    Now here's a full MySQL bleary-eyed technique.

    If you want to be bold and assume "link_category" is the only non-post
    use of wp_term_relationships, then do this (use PHPMyAdmin so you get paging):

    SELECT *
    FROM wp_term_relationships tr
    LEFT JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id
    LEFT JOIN wp_terms t ON tt.term_id = t.term_id
    LEFT JOIN wp_posts p ON p.ID = tr.object_id
    WHERE p.ID IS NULL AND tt.taxonomy != 'link_category';
    # If you do find strays, you know how to "DELETE tr" them.

    Under the same assumption, here are stray terms with no posts, such as empty categories:

    SELECT *
    FROM wp_terms t
    LEFT JOIN wp_term_taxonomy tt ON tt.term_id = t.term_id
    LEFT JOIN wp_term_relationships tr ON tr.term_taxonomy_id = tt.term_taxonomy_id
    WHERE tr.object_id IS NULL;
  3. kitchin
    Member
    Posted 11 months ago #

    And here's the careful way. I should mention, none of my code handles comments and commentmeta's. Those tables are handled in the rvg plugin.

    The more careful way is to look at each taxonomy, and don't assume "not a link_category" means "post".

    Look at wp_term_relationships. All it has is numbers, so you need to JOIN to find out which rows describe wp_links and which are for wp_posts. The key idea here is that wp_term_relationships is only used for taxonomies (we hope), so first get a list of those:

    SELECT taxonomy, COUNT(taxonomy)
    FROM wp_term_taxonomy
    GROUP BY taxonomy;
    ## here are my results ##
    # category 		10
    # link_category 	3
    # post_tag 		14705
    # foo_menu		...
    # bar_taxonomy		...

    "foo_menu": If you have a custom menu it will be a taxonomy, and the menu items will be posts, with a custom post_type. For example "foo_menu" will be in wp_term_taxonomy.taxonomy, and there will be a corresponding "foo_menu_item" in wp_posts.post_type. That is, one row in wp_term_taxonomy, and multiple rows in wp_posts.

    "bar_taxonomy": Easier. If you have a custom taxonomy, it will usually act just like the built-in "post_tag" and "category" taxonomies. A wp_posts row can be assigned to a "bar_taxonomy" item, and the post_type could be "post", "page", "revision", etc.

    Recall post_type's:

    SELECT post_type, COUNT(post_type)
    FROM wp_posts
    GROUP BY post_type;
    # post 17061
    # page 9
    # attachment 6128
    # revision 27185
    # foo_menu_item ...

    You could also have custom post_type's not related to menus.

    OK, now let's start looking for strays.

    Use the "bold" query above as a model, but limit it to known taxonomies.

    SELECT *
    FROM wp_term_relationships tr
    LEFT JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id
    LEFT JOIN wp_terms t ON tt.term_id = t.term_id
    LEFT JOIN wp_posts p ON p.ID = tr.object_id
    WHERE p.ID IS NULL AND tt.taxonomy IN ('category', 'post_tag');

    Then there is

    SELECT *
    FROM wp_term_relationships tr
    LEFT JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id
    LEFT JOIN wp_terms t ON tt.term_id = t.term_id
    LEFT JOIN wp_posts p ON p.ID = tr.object_id
    WHERE p.ID IS NULL AND tt.taxonomy IN ('foo_menu', 'bar_taxonomy');

    If you're sure those taxonomies only connect to wp_posts, you can "DELETE tr" any results.

    If you find strays and delete them you could check for dead terms too, see the "bold" query above.

    How about "link_category"'s? Those JOIN to a different table:

    SELECT *
    FROM wp_term_relationships tr
    LEFT JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id
    LEFT JOIN wp_terms t ON tt.term_id = t.term_id
    LEFT JOIN wp_links k ON k.link_id = tr.object_id
    WHERE tt.taxonomy = 'link_category';

    And there's nothing to delete because wp_links does not have a concept of revisions.

  4. kitchin
    Member
    Posted 11 months ago #

    Ah, one more thing. The other good plugin above uses this nice code to clean up all taxonomies, including tags, categories and custom.

    http://wordpress.org/plugins/revision-control/

    function delete_terms($revision_id, $rev) {
      if ( ! $post = get_post($rev->post_parent) )
        return;
      // Delete the parent posts taxonomies from the revision.
      wp_delete_object_term_relationships($revision_id, get_object_taxonomies($post->post_type) );
    }

    Assuming it works, that is great. Taxonomies know what post_type's they are allowed in.

  5. Manuel Razzari
    Member
    Posted 5 months ago #

    You're right that the "a,b,c" approach is very naïve, and it's shocking to see how many plugins replicate this bug. I was just looking a "wp-clean-up" plugin which simply does a DELETE WHERE post_type = 'revision', not caring about any of these details.

    The delete_terms function in revision-control is odd as well, as wp_delete_post_revision is already calling wp_delete_object_term_relationships, so I'm not sure it's needed at all.

    That said, it's more likely that simply deleting from the posts table is enough, as usually revisions don't touch other tables. But in case they do, we need to do it the right way!

    So, armed with the knowledge from your post, I think all we need is to let WP take care of this the native way, which will also trigger deletion hooks which some plugins may use:

    // Get all revisions in this blog - but not autosaves, which may include valuable content.<br />
    $revision_ids = $wpdb->get_col($wpdb->prepare( "SELECT ID FROM $wpdb->posts WHERE post_type = 'revision' AND post_name NOT LIKE '%-autosave%'") );<br />
    foreach ($revision_ids as $revision_id){<br />
       // This will invoke wp_delete_post, which takes care of related meta, taxonomy terms, etc.<br />
       wp_delete_post_revision($revision_id);<br />
    }

    I've wrapped this up in this snippet which deletes all revisions, including all blogs if it's a multisite network).

  6. kitchin
    Member
    Posted 4 months ago #

    Yes! Using the WP functions is better than calling MySQL DELETE directly. Let WP try the various cleanup and consistency things it does in wp-includes/post.php. For my purposes, the WHERE clause is

    WHERE post_type = 'revision' AND post_date < NOW() - INTERVAL 1 YEAR

    instead of the thing about autosaves.

  7. kitchin
    Member
    Posted 4 months ago #

    Actually your snippet *only* works on multisite. Otherwise $wpdb->blogs is not defined and you get a bad query. I have a fix but you probably have better one.

  8. Manuel Razzari
    Member
    Posted 4 months ago #

    You're right, I hadn't bothered to test the snippet on a single-blog install :$

    I've now updated the snippet with support for both single site and multisite.

  9. gWorldz
    Member
    Posted 2 months ago #

    How difficult would it be to turn this into a plugin and add to tools menu?

    If it was a plugin what needs to be considered about the code. Are there any issues timeouts?

    Is there any chance it could display all teh revisions that it removes during the *wait* period?

    Sorry for resurrecting an old post but this was an elegant solution to my problem and having it packaged up in a nice little plugin I could recommend to clients would be awesome.

    How hard would it be to get it to leave a set number of revisions for each post?

  10. kitchin
    Member
    Posted 2 months ago #

    The proper way to do it is with Ajax calls so you watch as it works and the page does not have to reload. You've pointed out the right issue, that's for sure. You don't know scale of the task when you're designing the logic.

    You see the Ajaxy thing in Dashboard, for instance if you update a bunch of plugins or themes all at once.

  11. Name.ly
    Member
    Posted 1 day ago #

    If you are using multisite installation, then this code might be useful to disable revisions only on some blogs:

    // Disable revisions in some particular theme /wp-includes/revision.php
    add_filter ( 'wp_revisions_to_keep', 'my_theme_wp_revisions_to_keep', 999, 2 );
    function my_theme_wp_revisions_to_keep ( $num, $post ) {
    	return 0;
    }

Reply

You must log in to post.

About this Topic