Support » Fixing WordPress » Deleting Post Revisions: do NOT use the a,b,c JOIN code you see everywhere
Deleting Post Revisions: do NOT use the a,b,c JOIN code you see everywhere
-
Post revisions bother some people, especially on larger sites because they add lots of rows to
wp_posts
. A common bit of MySQL code posted in these forums, plugins and elsewhere goes like this:### do not use: ### # DELETE a,b,c # FROM wp_posts a # LEFT JOIN wp_term_relationships b ON (a.ID = b.object_id) # LEFT JOIN wp_postmeta c ON (a.ID = c.post_id) # WHERE a.post_type = 'revision';
The problem here is that
wp_term_relationships
is used for both posts and links, two different tables. Sometimes theobject_id
refers towp_links.link_id
, notwp_posts.ID
. Andobject_id
‘s are not unique inwp_term_relationships
. So there is a chance, especially with lower-numberedwp_post.ID
‘s, that you will delete a relationship needed to make Dashboard / Links work. The DELETE will remove two rows that matchobject_id
, one for a post and one for a link.One plugin with this bug is
http://wordpress.org/plugins/better-delete-revision/Another plugin just deletes revisions and does not clean up meta and terms at all, which is not ideal:
http://wordpress.org/plugins/delete-revision/These plugins try to do things the right way, by using WP’s own term functions:
http://wordpress.org/plugins/rvg-optimize-database/
http://wordpress.org/plugins/revision-control/Here is a proposal to put a revision deletion function in WP core:
http://wordpress.org/ideas/topic/native-function-to-delete-post-revisions
-
If you want to use MySQL to cleanup tables instead of a plugin, here are some queries that will work better than the
a,b,c
code. Be careful, WP can be extended in all kinds of ways, and remember, you are deleting stuff! Also, if you are using a caching plugin, you will need to clear that in the plugin, due to: http://codex.wordpress.org/Class_Reference/WP_Object_Cache#Persistent_CachingThe strategy is to delete revisions and then cleanup stray meta and terms in separate queries.
DELETE FROM wp_posts WHERE post_type = 'revision';
########################
First, wp_postmeta.
########################The good plugin above (rvg-optimize-database) uses
DELETE FROM $wpdb->postmeta WHERE post_id NOT IN (SELECT ID FROM $wpdb->posts)
That seems pretty good. You’ll have the sub the actual tables names if you’re doping this in PHPMyAdmin or elsewhere outside of WP. I had:
SELECT * FROM wp_postmeta pm LEFT JOIN wp_posts p ON pm.post_id = p.ID WHERE p.ID IS NULL; # WP does a good job bookeeping postmeta, # so you probably will have zero results here, even after deleting revisions. # I tested a wp_postmeta with 2.5 million rows. # If you do find strays, you know how to "DELETE pm" them.
############################
Now, wp_term_relationships.
############################The rvg plugin uses a WP function that handles caching and other issues. Here’s a a shortened version:
$tags = get_terms('post_tag', array('hide_empty' => 0)); for($i=0; $i<count($tags); $i++) { if($tags[$i]->count < 1) { wp_delete_term($tags[$i]->term_id,'post_tag'); } }
Looks like rvg’s plugin does not handle categories and custom taxonomies though.
Now here’s a full MySQL bleary-eyed technique.
If you want to be bold and assume “link_category” is the only non-post
use ofwp_term_relationships
, then do this (use PHPMyAdmin so you get paging):SELECT * FROM wp_term_relationships tr LEFT JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id LEFT JOIN wp_terms t ON tt.term_id = t.term_id LEFT JOIN wp_posts p ON p.ID = tr.object_id WHERE p.ID IS NULL AND tt.taxonomy != 'link_category'; # If you do find strays, you know how to "DELETE tr" them.
Under the same assumption, here are stray terms with no posts, such as empty categories:
SELECT * FROM wp_terms t LEFT JOIN wp_term_taxonomy tt ON tt.term_id = t.term_id LEFT JOIN wp_term_relationships tr ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE tr.object_id IS NULL;
And here’s the careful way. I should mention, none of my code handles
comments
andcommentmeta
‘s. Those tables are handled in the rvg plugin.The more careful way is to look at each taxonomy, and don’t assume “not a link_category” means “post”.
Look at
wp_term_relationships
. All it has is numbers, so you need to JOIN to find out which rows describewp_links
and which are forwp_posts
. The key idea here is thatwp_term_relationships
is only used for taxonomies (we hope), so first get a list of those:SELECT taxonomy, COUNT(taxonomy) FROM wp_term_taxonomy GROUP BY taxonomy; ## here are my results ## # category 10 # link_category 3 # post_tag 14705 # foo_menu ... # bar_taxonomy ...
“foo_menu”: If you have a custom menu it will be a taxonomy, and the menu items will be posts, with a custom post_type. For example “foo_menu” will be in
wp_term_taxonomy.taxonomy
, and there will be a corresponding “foo_menu_item” inwp_posts.post_type
. That is, one row inwp_term_taxonomy
, and multiple rows inwp_posts
.“bar_taxonomy”: Easier. If you have a custom taxonomy, it will usually act just like the built-in “post_tag” and “category” taxonomies. A
wp_posts
row can be assigned to a “bar_taxonomy” item, and thepost_type
could be “post”, “page”, “revision”, etc.Recall
post_type
‘s:SELECT post_type, COUNT(post_type)
FROM wp_posts
GROUP BY post_type;
# post 17061
# page 9
# attachment 6128
# revision 27185
# foo_menu_item …You could also have custom
post_type
‘s not related to menus.OK, now let’s start looking for strays.
Use the “bold” query above as a model, but limit it to known taxonomies.
SELECT * FROM wp_term_relationships tr LEFT JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id LEFT JOIN wp_terms t ON tt.term_id = t.term_id LEFT JOIN wp_posts p ON p.ID = tr.object_id WHERE p.ID IS NULL AND tt.taxonomy IN ('category', 'post_tag');
Then there is
SELECT * FROM wp_term_relationships tr LEFT JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id LEFT JOIN wp_terms t ON tt.term_id = t.term_id LEFT JOIN wp_posts p ON p.ID = tr.object_id WHERE p.ID IS NULL AND tt.taxonomy IN ('foo_menu', 'bar_taxonomy');
If you’re sure those taxonomies only connect to
wp_posts
, you can “DELETE tr” any results.If you find strays and delete them you could check for dead terms too, see the “bold” query above.
How about “link_category”‘s? Those JOIN to a different table:
SELECT * FROM wp_term_relationships tr LEFT JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id LEFT JOIN wp_terms t ON tt.term_id = t.term_id LEFT JOIN wp_links k ON k.link_id = tr.object_id WHERE tt.taxonomy = 'link_category';
And there’s nothing to delete because
wp_links
does not have a concept of revisions.Ah, one more thing. The other good plugin above uses this nice code to clean up all taxonomies, including tags, categories and custom.
http://wordpress.org/plugins/revision-control/
function delete_terms($revision_id, $rev) { if ( ! $post = get_post($rev->post_parent) ) return; // Delete the parent posts taxonomies from the revision. wp_delete_object_term_relationships($revision_id, get_object_taxonomies($post->post_type) ); }
Assuming it works, that is great. Taxonomies know what
post_type
‘s they are allowed in.You’re right that the “a,b,c” approach is very naïve, and it’s shocking to see how many plugins replicate this bug. I was just looking a “wp-clean-up” plugin which simply does a
DELETE WHERE post_type = 'revision'
, not caring about any of these details.The
delete_terms
function in revision-control is odd as well, aswp_delete_post_revision
is already callingwp_delete_object_term_relationships
, so I’m not sure it’s needed at all.That said, it’s more likely that simply deleting from the posts table is enough, as usually revisions don’t touch other tables. But in case they do, we need to do it the right way!
So, armed with the knowledge from your post, I think all we need is to let WP take care of this the native way, which will also trigger deletion hooks which some plugins may use:
// Get all revisions in this blog - but not autosaves, which may include valuable content.<br /> $revision_ids = $wpdb->get_col($wpdb->prepare( "SELECT ID FROM $wpdb->posts WHERE post_type = 'revision' AND post_name NOT LIKE '%-autosave%'") );<br /> foreach ($revision_ids as $revision_id){<br /> // This will invoke wp_delete_post, which takes care of related meta, taxonomy terms, etc.<br /> wp_delete_post_revision($revision_id);<br /> }
I’ve wrapped this up in this snippet which deletes all revisions, including all blogs if it’s a multisite network).
Yes! Using the WP functions is better than calling MySQL DELETE directly. Let WP try the various cleanup and consistency things it does in wp-includes/post.php. For my purposes, the WHERE clause is
WHERE post_type = 'revision' AND post_date < NOW() - INTERVAL 1 YEAR
instead of the thing about autosaves.
Actually your snippet *only* works on multisite. Otherwise $wpdb->blogs is not defined and you get a bad query. I have a fix but you probably have better one.
You’re right, I hadn’t bothered to test the snippet on a single-blog install :$
I’ve now updated the snippet with support for both single site and multisite.
How difficult would it be to turn this into a plugin and add to tools menu?
If it was a plugin what needs to be considered about the code. Are there any issues timeouts?
Is there any chance it could display all teh revisions that it removes during the *wait* period?
Sorry for resurrecting an old post but this was an elegant solution to my problem and having it packaged up in a nice little plugin I could recommend to clients would be awesome.
How hard would it be to get it to leave a set number of revisions for each post?
The proper way to do it is with Ajax calls so you watch as it works and the page does not have to reload. You’ve pointed out the right issue, that’s for sure. You don’t know scale of the task when you’re designing the logic.
You see the Ajaxy thing in Dashboard, for instance if you update a bunch of plugins or themes all at once.
If you are using multisite installation, then this code might be useful to disable revisions only on some blogs:
// Disable revisions in some particular theme /wp-includes/revision.php add_filter ( 'wp_revisions_to_keep', 'my_theme_wp_revisions_to_keep', 999, 2 ); function my_theme_wp_revisions_to_keep ( $num, $post ) { return 0; }
@manuel Razzari I notice the current snippet calls switch_to_blog() to access the revisions in each particular blog of a multi-site installation. But it doesn’t call restore_current_blog() to switch back after each one.
According to the codex (and things I’ve read on the web and seen in the code), although it might work for what you’re doing, it’s really not correct usage of the API, and could have undesirable consequences.
Just before your return from function delete_revisions_for_blog, you really should add:
if (is_multisite()){ restore_current_blog(); }
Please note that this snippet is not a plugin, instead it’s meant to be called as a standalone script… Like http://example.com/tmp/multisite_delete_revisions.php.
So there’s really no point in “restoring” anything, as the script will just die in a blank “Done!” page after succeeding.
I agree that there’s no practical point to restoring resources before exit.
However, at least in principle the switch_to_blog() api is specified to work correctly only one level deep. By not calling restore_current_blog() before another call to switch_to_blog(), the subsequent calls to switch_to_blog() are “allowed to fail”. Plus posting fragile code that others might copy for use in a different context isn’t a very good idea.
That’s how this thread got started with the “a, b, c JOIN” code that worked okay in a very limited context, and then spread like wildfire to contexts where it wouldn’t work. Only three extra source lines to code it in exemplary form 🙂
What you’re saying makes total sense in the context of a plugin.
I’d think that if someone puts this code into a context where this matters —ie, a plugin— they’ll be careful enough to restore the blog, as this code will noticeably break their site.
Switch_to_blog is an expensive (slow) function call, adding restore_current_blog would duplicate that effort. This can add up pretty quickly, as we’re potentially looping through a large list of blogs here.
Maybe what I should be doing is adding your suggestion, but in a comment…?
And just to be clear “a, b, c JOIN” is bad code even without multi-user.
There’s a grand plan to rewrite the WP taxoxonomy code. I wouldn’t predict any real API changes til WP 5.0 at the earliest! Look how grand it is:
http://make.wordpress.org/core/2013/07/28/potential-roadmap-for-taxonomy-meta-and-post-relationships/
Summary: the reason for two term tables never led to anything (taxonomies to share items). And it causes a bug when you rename tax items because item identities are not fixed.One use of the complicated taxonomy tables did get implemented: WP Links. That feature is now available only if you are upgrading an old WP, or get the official legacy plugin. I have one customer who really likes WP Links.
- The topic ‘Deleting Post Revisions: do NOT use the a,b,c JOIN code you see everywhere’ is closed to new replies.