Support » Plugins » Hacks » Reasons for bigint datatype in wordpress id columns

  • (Posted this in the wrong sub forum first, sorry. Should be deleted there.)

    I’m still getting into the WordPress framework, developing my first “real” Plugin and while thinking about my table structure couldn’t help to notice that all WordPress core tables use bigint as datatype for any id column.

    I’m wondering if anyone knows the reasoning behind this. For comments or post it may make sense, but for users? If a platform is expected to exceed 4 Billion users I doubt that WordPress will be the software of choice.

    The only possible argument I can imagine would be performance benefits from using uniformly sized primary keys throughout the database. But that’s just a very weak guess, because honestly I have no clue about the inner workings of any particular DBMS.

    Why is it “bothering” me? If the usually sufficient int type would be used the fields would only require 4 Bytes of storage instead of 8. What’s 4 Bytes? Imagine a big WordPress instance with 10 Million comments. Just looking at them that’s id and post id, parent comment and author id (default to 0), per comment. So 10M*4*4B makes 160MB of wasted space.

    It’s still not a big number for current storage solutions but together with the other unnecessarily big fields it accumulates over the entire database. I just find it strange that it’s the default setup for a largely distributed CMS. Seems like a needless waste of resources. Because I would hazard the guess that 99% of all WordPress installations will never accumulate even close to 4 Billion rows in any of their core tables.

    I realize this comes off like a rant, but I’m really interested to know if there are any good reasons for this design decision. If anybody has more insight to it, sharing would be much appreciated!

Viewing 2 replies - 1 through 2 (of 2 total)
  • Probably the same reason why all meta and content columns use longtext, and all dates use datetime. I’m sure there was a reason why such inefficient data types were used, but I have never figured it out. It’s probably the same reason why the postmeta and usermeta tables have such inefficient schemas, and why the DBAL is so inefficient. (Lest you think it has gotten better with time, take a look at how session support was added to WP4.)

    If I had to guess, I’d say that database coding and schema design were the weak points of the original WP developers, and fixing the database inefficiencies has never been a priority. PHP 7.0 handles objects much more efficiently, which will minimize the DBAL issues, and that will help. But it may be a while before PHP 7.0 is part of mainstream OS distros…CentOS 7 comes with PHP 5.4!

    The problem today is that fixing the inefficiencies will break a large number of plugins and themes. One could say that WordPress has been a victim of its own success…

    There are very good reasons for BIGINT.

    For example, suppose you want to make a master-master WordPress system, with eventual-consistency (i.e. AP from the CAP theorem). One of the challenges there is merging records created on both sides. A typical solution is to use AUTOINC, but with a step-size of 2, then one server is “even” and the other is “odd.” Another typical solution is to realize you should plan for new servers joining the group, or more than two servers, and thus use a step of 10 or 20 (as many master-master MySQL systems do today). Suddenly you need 20x the number-space, so 4 billion records becomes 40MM records, which actually isn’t that much if you’re considering enterprise-scale sites.

    Now let’s get even smarter. AUTOINC is also the wrong choice for master-master, especially if you want to be able to merge from many sources, for example merging two unrelated WordPress sites. In that case you want GUID instead of a number. With 64 bits, you can basically do this. Yes it would be better if it were eg 128-bits, but it’s reasonable, and still compatible with int-oriented systems, and perhaps more efficient in the PHP layer.

    Finally, your point about taking up more RAM is true, but not as significant as you’re making it out to be. Using your example of 10M comments, you’re worried about an extra 4 bytes per record. But note that a comment record is already 100x larger than that, especially when you consider the arbitrary-sized text of the comment. So you’re talking about reducing the comment table size by 1%. From that perspective, it’s doesn’t seem that important! 🙂

Viewing 2 replies - 1 through 2 (of 2 total)
  • The topic ‘Reasons for bigint datatype in wordpress id columns’ is closed to new replies.