• My server CPU load used to spike insanely high whenever I ran wp-o-matic. I investigated the code, and there’s a scaling issue with the isDuplicate function. I’ll post the fix below!

    First, I’ll explain the problem by explaining how wp-o-matic works on Your Site when processing a feed from Random Site:

    • First, Random Site publishes, say, the 10 most recent posts to its RSS feed
    • Then, wp-o-matic will pull in those 10 posts by sucking in the Random Site RSS feed, as you’ve configured it…
    • Next, wp-o-matic doesn’t know whether those 10 Random Site posts are *new* posts, or have already been published at Your Site. So it has to check this. The way wp-o-matic currently checks this is to compare each of those 10 posts in the RSS feed to the existing posts on Your Site that have been pulled from Random Site.

    That is where the problem arises. The way wp-o-matic checks if new posts are duplicates generates a LOT of queries. Specifically, wp-o-matic queries the database one time for every post that has come from that feed, multiplied by the number of posts in the RSS feed. For instance, I have syndicated the content of one prolific blogger who has written 5,408 posts that are published on my site. The RSS feed has 10 entries. Thus, for this one feed, wp-o-matic generates 54,080 queries!

    To resolve this, I rewrote the first part of function processFeed in the main wpomatic.php file (copy/paste below, up to the line where it says $count++;). With this code, wp-o-matic will vastly reduce its database load.

    P.S. You can also delete the isDuplicate function if you want, as it’s not called elsewhere.

    function processFeed(&$campaign, &$feed)
    {
    global $wpdb;
    @set_time_limit(0);
    // Log
    $this->log('Processing feed ' . $feed->title . ' (ID: ' . $feed->id . ')');
    // Access the feed
    $simplepie = $this->fetchFeed($feed->url, false, $campaign->max);
    // Get posts (last is first)
    $items = array();
    $count = 0;
    // Get all rows for this campaign all at once, BUT LIMITED to the last 30 posts
    $rows = $wpdb->get_results("SELECT * FROM {$this->db['campaign_post']} "
    . "WHERE campaign_id = {$campaign->id} AND feed_id = {$feed->id} ORDER BY post_id DESC LIMIT 30 ");
    foreach($simplepie->get_items() as $item)
    {
    if($feed->hash == $this->getItemHash($item))
    {
    if($count == 0) $this->log('No new posts');
    break;
    }
    // look to see if $item is a duplicate of an existing $rows row
    foreach($rows as $row)
    {
    if($row->hash == $this->getItemHash($item))
    {
    $this->log('Filtering duplicate post');
    break;
    }
    }
    $count++;

Viewing 13 replies - 1 through 13 (of 13 total)
  • Fantastic! This is just what I wanted to read today. My host suspended me for using too much CPU, and I can only think that it must be WPoMatic.

    Hopefully this will do the trick. Are there any other things I should do to minimuse CPU load caused by WP-o-Matic?

    Thread Starter bobbyh

    (@bobbyh)

    mv5869, here are a few other things you can do:

    * Ironically, running the cron job more often can lower load spikes. However, if you do this, you need to stagger your campaigns so they’re not all running at the same time. Click into each campaign and click the “options” tab. Set the Frequency to a number such that all the campaigns aren’t running every time the cron runs. I set the Frequency to a number larger than the frequency of the cron job, so not all the campaigns are checked at the same time.

    * If your server still has trouble with load, you can disable the cron job and run Campaigns manually one at a time by clicking Fetch for each Campaign. Make sure each Campaign finishes before running the next one. If your server is really weak, you might need to do this during off-peak traffic.

    Good luck! Let me know if my code works for you! πŸ™‚

    bobbyh – Many thanks for the advice, and for the great script.

    I spent today updating all my sites with this, and changing the time in the options tab so that few of them run together. Now that I know how the original wpomatic ran, I am sure that that was the problem. I have large feeds and several thousand posts in the database, so its not surprising it slowed down so much.

    Now I just need to monitor the CPU for a few days and see if that has made enough difference. If not I expect my host will suspend me anyway!

    Thanks again.

    Anyone see @appletalk around here? Any chance he’ll release a 3.1-tested version of this, that implements your fixes?

    Thread Starter bobbyh

    (@bobbyh)

    @mv5869: Can you confirm that the new code works on your site(s)? πŸ™‚

    Yes it works perfecttly. It has been 2 days now and I have seen no more CPU problems at all since applying your changes.

    I havent yet re-enabled the scheduled cron but will do that over the next day or two and monitor it carefully.

    @bobbyh are you the developer? Sorry if I confused something there. Either way, would love to see an updated version that works with 3.1.

    Thread Starter bobbyh

    (@bobbyh)

    @tevya, I am not the original developer. I just fixed a function in his code so the plugin works without pegging the CPU at 100% when there are lots of existing posts…

    Anyway, I can verify the code works fine with 3.1, which is the WP version I am currently using on my wp-o-matic site.

    So where can I get a copy of it that includes your code? I’m not a programmer, and prefer to avoid editing plugins when possible.

    Thread Starter bobbyh

    (@bobbyh)

    Here’s the edited version of the main wpomatic.php file: http://pastebin.com/MQnjfW6k

    The other files are the same.

    Hello,

    I’m from Brazil and I am using a translator. I’m a beginner / layman, and from what I understand the above, something has to be modified to not having downloaded countless posts from each feed. I need the wp-o-matic, I download only the last day. How to do it step by step? well explained for a beginner? – My pictures do not appear on the homepage, because I have thumbnails, only the second page. – What to do?

    Help

    thanks i just come here is here anything to lower the cpuload and ya its here

    Computing a hash and just running a hash lookup really is the right thing to do, I was looking into the code of wpematico (specifically the file, wpematico_dojob.php) and it seems this never made it into the new code?

    The “isDuplicate” function is still there and it is using the post’s title (apparently not hashed) to “check” if a posting is a duplicate or not.

    It seems however that fragments of your patch were tested, but are now commented.

    Overall, hashing is the right approach – it would probably even make sense to persistently store each computed hash alongside each posting (using a custom field), so that the hashes don’t need to be recomputed every time.

    you need to stagger your campaigns so they’re not all running at the same time.

    True, this is another important thing, it would actually be great if the plugin could support “auto-staggering/load optimization”, just by specifying a frequency (per hour/day/week/month) so that the plugin itself could distribute the load dynamically and possibly also randomly, without specifying the cron jobs manually.

    – woccax

Viewing 13 replies - 1 through 13 (of 13 total)
  • The topic ‘Fix for High CPU Load for wp-o-matic’ is closed to new replies.