Viewing 15 replies - 1 through 15 (of 17 total)
  • Plugin Contributor Frederick Townes

    (@fredericktownes)

    Which modules do you have enabled and with which cache engines? If you use Dbcache and object cache have you tried with them disabled?

    Thread Starter jlomaga

    (@jlomaga)

    I’m using Page, Database, Object and Browser. For all I use Memcached.

    So I tested with out Dbcache and Object Cache enabled and it appears to fix the issue. I also tried with one or the other only and it causes the missed post issue.

    If I disable Object cache, scheduled posts work.

    archon810

    (@archon810)

    There’s definitely a bug in the way W3TC handles object caching, at least with memcached, where the transient either doesn’t get set right or read right.

    Specifically, in this case, either set_transient( 'doing_cron', $doing_wp_cron ) in cron.php doesn’t work or get_transient('doing_cron') in the same file doesn’t. I’m getting flooded with wp-cron PHP threads all doing the same schedules and shooting up the load all the time because the doing_cron lock which usually resides in the database and is correctly parsed out of it doesn’t actually work.

    I’m still digging, but Frederick, I’m available to work with your team to finally squash this bug. I’ve seen references to W3TC object caching breaking transients for years now.

    archon810

    (@archon810)

    Confirmed that switching W3TC to APC object caching immediately solves the issue with doing_cron cache. Switching to memcached immediately resurfaces the issue.

    It’s as if something is either wiping this value from cache or prevents it from being extracted from cache properly in cron.php.

    I tried printing some other transients at the same time doing_cron transient is asked from cache by cron.php, and to my surprise, those other transients were fine. It’s as if doing_cron is special, as it’s the only one suffering from this disappearance mystery.

    archon810

    (@archon810)

    Unbelievable, perhaps memcached itself is to blame here. Restarting the memcached server resulted in doing_cron values properly populating and coming back from the cache.

    Now I’m really curious as to what exactly goes wrong – does the memcached server have a bug where it no longer writes certain things after some time or under certain conditions? Or it erases certain things very fast at some point. It’s supposed to be FILO cache, and with 512MB allocated to it, I don’t understand how the values could be lost in a matter of a split second…

    I’m running memcached 1.4.15. I suppose I can update to the latest 1.4.17, but I am not sure what would fix it in the long run. Maybe an hourly memcached restart…

    archon810

    (@archon810)

    I think I’ve pretty much narrowed this down to the lack of available connections, which, if reached once and doing_cron starts failing to come back in get_transient, instantly gets worse because at that point wp-cron.php starts getting pounded many times a second and exacerbating the situation.

    The fix was to enable logging on memcached with -v to see in the future if the memcached server is running out of connections, and increase the number of allowed connections as well as a few other options for better performance:

    -c 4096 -t 8 -R 100

    Default values were:
    -c (max connections): 1024
    -t (threads): 4
    -R (maximum number of requests per event): 20

    What a day…

    solid_snake

    (@solid_snake)

    I can confirm this as well. This is affecting plugin update notifications as well. When I disable object cache the plugin update notification works.

    archon810

    (@archon810)

    A quick way to test this is to enable WP_DEBUG to start writing to debug.log and then add this line to wp-includes/cron.php at the top of spawn_cron():
    error_log("DEBUG doing_cron transient: " . get_transient('doing_cron') . ");

    Then grep the logs for this and see if there’s a value printed. If memcached is misbehaving, you won’t see the value at all.

    solid_snake

    (@solid_snake)

    Sorry, I am not a technical person. So I don’t understand anything what you mean to say. But I am sure we don’t use memecache.

    Do you think this is a bug in W3tc?

    archon810

    (@archon810)

    I’m not 100% sure, but for me, switching object caching to APC, for example, sorted the issue instantly, then back to memcached, and it’s broken. Not until I figured out the memcached servers were seemingly misbehaving, at least in my case.

    JochenT

    (@jochent)

    I’m using page+object+db cache and preloading with memcached and can not confirm any ot these problems. Although the size and number of connections for memcached should be set properly.

    If size is to small the ‘Evicted’ counter in the stats is nonzero and grows with time. You can display stats using the command memcached-tool 127.0.0.1:11211 on the console. See Memcache Telnet Interface for more details.

    Also Memcached Internals for End Users gives a clue how memcached works internally.

    archon810

    (@archon810)

    @jochent The eviction counter is not really an indication of a problem – it simply means older data will be ejected, but data just put in recently shouldn’t be (unless the memory allocated to memcached is really tiny).

    In fact, if the server is running for a really long time, and the data being written doesn’t use the same keys (it all depends on your plugins and theme code), eventually older data will be evicted, and that’s fine in theory most of the time, though that also means that things like transients that are meant to stick around for a long time may disappear before it’s time for them to, which would have happened with persistent storage.

    The real issue in my case was the number of allowed connections, which on a busy server was simply outpaced by the number of threads connecting to the memcached server.

    JochenT

    (@jochent)

    @archon810 – When enabling preloading for the first time I had the problem, that the slab classes where most of the posts had been cached, were much to small due to the previous caching behaviour. So after about 75% of the posts had been preloaded the first posts already got evicted. Increasing the cache size and restarting memcached solved this problem.

    archon810

    (@archon810)

    Understood.

    In my case, memcached has 3x 512MB servers allocated for a total of 1.5GB, which is quite a bit.

Viewing 15 replies - 1 through 15 (of 17 total)
  • The topic ‘WP-Cron Not Running’ is closed to new replies.