At our site we have had the Gengo plugin since we launched in October 2007. Gengo will redirect any URL that hasn't a language parameter in the URL querystring. Some URLs are excluded by default, but not wp-trackback.php and wp-cron.php.
Since when cron.php spawns wp-cron.php it won't follow redirects, our wp-cron.ph never got to execute. Result: No trackbacks and no pingbacks. And a queue of jobs hidden in the database.
It wasn't until very recently (February 2008) that we discovered that trackbacks and pingbacks didn't work. Then we found the source of this error, and added wp-cron.php to the Gengo exclusions list.
Our site then exploded in a rush to to execute about 2000 cron jobs residing in the options table, with option_name
cron. Suddenly blogs over over was pingbacked from old posts. So far, so fine, and as expected
But every page request to WordPress checks the cron option to see if there are jobs to to. An if it is, the wp-cron.php are called as an internal request to get them executed.
wp-cron.php then checks if doig_cron is less than time() and if it is, exits. If not, it sets a value of time()+30 to the
doing_cron option, to prevent doing parallell cron jobs for those 30 seconds, at least.
This is a very bad way to try to prevent this. Because, what if the executing all the jobs really takes much more that 30 seconds? A new cron job will get started i parallell. And another one, and another one endlessly, as long as the first cron job is not finished and able to rremove all jobs done from the queue. The processes will all run concurrently and probably interfere with each other. Probably this will have the result that none of them ever gets finished, and the loads add up until the server is on it's knees.
Our site was closed down by the web host for more than twelve hours and we had to beg to get up again, promising we had found the source of the heavy load.
The fundamental flaw in the code, as far as I can see, is in the function wp_cron() in cron.php. As I said, this is called almost every time WordPress is requested (started). But it doesn't check if the
doing_cron is set to a value higher than
time(). If it had, this would have prevented calling wp-cron.ph and prevented the overloading.
And 30 seconds is pure arbitrary, I guess. Using this number makes the assumption that no series of cron job will ever run for more than 30 seconds. This may not always be true, and when it isn't, it may bring down the server. No, it WILL bring the server to it's knees quite easily.
My ad hoc fix:
wp-cron.php: Adds 360 seconds (even that gives no guarantee)
cron.php, at the beginning of
// Prevent spawning cron if it's already active:
if ( get_option('doing_cron') > time() )
The best approach to solve this a robust way is probably to let wp-cron.php update doing_cron for each action, so that doig_cron alwys prohibits cron from getting called for 30 seconds after the last cron job has started.
I guess that the many reports that wp-cron.php is overloading their system (CPU, hits, database queries) happens when the length of array of cron jobs have already been buildt up to a huge number, and then something suddenly gets fixed that releases this unprevented chain reaction.
Or nuclear core meltdown, if we was running a power plant.