Support » Fixing WordPress » Database Bloat – Revisions

  • MySql database has become stuffed with post and page revisions to the point that I am losing track of which post is which and which page is which!

    I only have three posts and five pages, but the database has over 180 post records of various revisions. True, we refined our posts and pages by doing a lot of on-line editing revisions. They ALL appear to be accessed as a URL. (?Page_id=**)

    Is there any way to clean out this garbage from the database? We do not need to keep revisions.

    Thank you.

Viewing 15 replies - 1 through 15 (of 20 total)
  • Moderator Samuel Wood (Otto)

    (@otto42) Admin

    First question: Why do you think that it matters how many posts/pages/revisions/whatever are in the database? Databases are fast and efficient, and text compresses easily, you know.

    Second question: Did you bother to search for the answer?

    A very informative thread and I thank you for that link.

    As to creating a monster of a mess in a database, that is uncalled for. No, I don’t have multiple authors working on the same posts. I do have another site with a quarter of a million posts on it though and I cannot imagine turning that into two million posts by storing all revisions. Insane! That’s why Microsoft operating systems have become so bloated.

    I bet you have a messy desk too. 🙂

    On a brighter note, I do appreciate WordPress and it is excellent software. I also realize that all help here is voluntary and that sometimes it must have its trying moments.


    Moderator Samuel Wood (Otto)

    (@otto42) Admin



    See, you think it’s ridiculous to store a lot of data, but you don’t exactly explain why that to be the case. I think that storing as much data as is available is better than not storing it. Because when you need that data, you have it.

    I’ll bet you delete old emails too… 😉

    Also, ever looked at the database of any Wiki software? Every revision is stored there too. This is not uncommon practice. I admit that it’s a relatively new idea, but computers are fast and storage is large. There’s no good reason to delete stuff, and plenty of good reasons to keep it.

    No, I don’t erase old emails, except for spam. But then, I don’t store all the edits or revisions that ever went into each of those emails BEFORE they were sent to me. That would be nuts!

    I’ll bet there are two camps to this opinion and I learned that this is a recent change. Someone’s idea gone sour. It only needs a configuration option to satisfy everyone. That and a clean-up tool to deal with the mess.

    I suppose with a Wiki system you would want to keep a record of all the hair-brains who ever came along with an edit. Just realize that on other systems there will always be two opinions on this matter.

    Data redundancy is never a good thing, even if modern databases are able to handle big amounts of data. Plus: not everyone has access to databases that have unlimited space, especially on shared hosting.

    The sensible course of action would have been to disable versioning by default and let the users who really need it activate it.

    That way no one would have to deal with clogged databases.

    Hey, I noticed this comment in the 1st post, “They ALL appear to be accessed as a URL. (?Page_id=**)”

    I was wondering, does anyone know if each revision saved counts-up my Post ID by 1?

    I’m asking because the Post ID # happens to be used in my pretty permalinks…. and I like having sequential URL’s. Each post is an episode and they’re numbered in the URL automatically that way. If the new revisions count-up the ID# then soon my URL will have a really long number, instead of sequential Episode 205 that I want to be on.

    But I’m not sure if this is the number that counts-up with each post revision, anyone know? Thanks

    @ Otto42:

    You really should stop harping on users whom want this stopped or disabled. Makes, its none of your concern on why this request is needed or wanted really dude. Just answer the question in a professional manner. Yes its helpful to inform some users as to whats happening in the back end of things, but being cocky and rude to users is not the way to handle these forum requests. Not everyone has your knowledge.

    Also, if people use cformII or other items with heavy database query’s this post revision causes a stall in sites loading. I need revisions to quit and I need to clean up my DB now. SO, there are reasons to not store so much data as a query hits too much. Just answer the question, and if you can’t answer it in a professional manner, please let someone else provide information. Thank you.

    Now for the last part of that question……….how do you clean up the DB for users that do not know how to go into phpmyadmin and do query’s?

    Moderator Samuel Wood (Otto)

    (@otto42) Admin

    @ekborg: Thank you for your opinion, however, I disagree with your initial premise. Helping a person with a technical issue often involves showing them that they were thinking about it wrong in the first place. Simply answering their question without providing any background or explanation just means that they keep coming to you with questions and never learn anything.

    My goal is to educate as well as inform. Also, I’ve been doing support for a long, long time, and I’ll continue to do it in the way I see fit. You may see my manner as “cocky”, but that’s you seeing it, not me putting it there. Remember that text is a non-emotionally expressive medium… I’m not being “cocky”, I’m attempting to help people see how they are wrong. This usually works better than simply flat-out telling them why they are wrong.

    Also, you are wrong about how databases work. Indexed databases do not have the problems that you are describing. Overloaded databases will have that problem, but that’s a function of the number of queries and total CPU load, not a function of the size of the dataset. Database queries don’t scale linearly on the dataset size, they scale as the O(logN) if it’s on an indexed column. Look up “binary search tree”.

    As for the last part of the question, look:

    And seriously, one search with Google (“disable revisions”) gives five different ways to do all this, in the top five results. Please, search before asking what has already been answered over and over again.

    So try fixing broken links — which post revision has the bad link? Search and search through many MB of old records just to find 1. Repeat process several hundred times. Dump author posts. Is it fixed? Run site map utility. No. Repeat until fixed.. DAYS later.

    And yes Google for one does indeed try to index all the revisions – not only does your site now have a massive number 404s, but broken links are everywhere the search engines and spiders look. So of course your site gets demoted. I’m sure the corporate entities using WP will simply adore this aspect of this so-called (developer) feature. If other single users were asking for post revisions, I doubt they had this version in mind.

    Then of course, the author (admin) who fixes posts other than his own, now owns the revision. If the admin goes away (is deleted) for what ever reason, then so do all the fixes and the original author’s post reverts to the original broken links and all.

    My host only allows a small fixed-size MSQL db, as do millions of other WP ‘ordinary’ users. Won’t they be surprised when they exceed the host limits! So how many of them are going to hand edit their dbs? Zero. Time to look else where for another system, even if it means going back to hand html coding. Nice feature? Not! Refusing (wontfix) to even entertain the idea of fixing revisions with on/off switch, is the height of arrogance.

    As for the customer service/support, if this board is any example of what to expect from WP developers (and certain other arrogant and abrasive WP bigwigs) going forward – ie the aforementioned rudeness and arrogance, it is time for someone very high up to step in and replace the current “support” team with one more customer friendly.

    This whole affair is like APPL telling its customers that the next version of MacOSX will no long use a GUI, move to a Linux interface telling customers to learn to use the Terminal from now on.

    Eventually more and more people will either revert to older versions or find another system – systems which, picking up on the arrogance found at WP, will listen and supply some better product. WP will fade. The price you pay for refusing to listen, for minimizing your customers concerns.

    I, for one, am sick of this schtick from WP personnel, and DO plan to eventually move on.

    Yeah, I can hear it now … WP: See ya!

    I’m a newbie, but a valid reason for me to be able to keep the database smaller is when you want to ex/import. My database is over 2 Mb is size and it could be that I’m doing something wrong, but due to my provider or due to PHP limitations, I’m only allowed to import files no bigger than 2048Kb. So if I have to do an import, I have to do that table for table or hack into my .sql files.

    Moderator Ipstenu (Mika Epstein)


    🏳️‍🌈 Halfelf Rogue & Plugin Review Team Rep

    I use Revision Control – I do see the benefit of this, but the revisions are useless for me. (And yes, I run a purge of revisions on a WikiDB too now and then, where it’s less useful to keep the old ones)

    That said, I’ve never had an SEO or Sitemap problem with the revisions, so that rather astounds me.

    Same. Revisions aren’t always needed. Integrating the Revision control plugin would be quite useful.

    I installed Revision Control, but it seems WordPress 2.7.1 bypasses it. Even though all pages created are set to “No revision”, I still get the red warning “A revision of this page exists! etc.”. Any ideas?

    Moderator Samuel Wood (Otto)

    (@otto42) Admin

    The “autosave” creates a single revision for a post. New autosaves delete older ones. That is probably the extra revision you are seeing.

    There’s a bug in the tracker associated with this, in that it can create an autosave with no changes to the post at all. Hopefully this will get fixed before 2.8 or 2.9.

    As for controlling revisions, there’s very few edge cases where any sort of control over them is needed, and that’s what plugins are for, as I see it. If one revision control system takes the lead, then it might be integrated into core in some future version, but I would not count on it happening before 3.0.

    Also, I don’t know what anybody is talking about regarding SEO and such, because revisions are not viewable by the general public. Only blog admins and authors and such can see them. They don’t get indexed by search engines.

Viewing 15 replies - 1 through 15 (of 20 total)
  • The topic ‘Database Bloat – Revisions’ is closed to new replies.