Support » Developing with WordPress » is_serialized() behavior with serialized objects?

  • whitewinterwolf

    (@whitewinterwolf)


    While the current documentation for is_serialized() doesn’t mention it, the function by design returns false for a subset of valid serialized objects.

    The fact that this behavior is not documented and limited to only a subset of objects seems very odd to me and I wonder what is the actual, expected behavior for this function within the WordPress dev community (personally I’m more an IT Sec guy than a dev)?

    ____________________

    Possibility 1: is_serialized() should return false for all serialized objects

    The fact that is_serialized() returns false for a subset of valid serialized object is not by chance but is enforced through a unit test:

    $this->assertFalse( is_serialized( 'C:16:"Serialized_Class":6:{a:0:{}}' ) );

    This test is linked to the ticket #17375 and has been implemented as an attempt to prevent insecure deserialization exploits, taking the OWASP web pages as reference.

    However, the sad thing is that this attempt doesn’t fix the issue at all.

    Let’s take OWASP’s very first example from the provided reference:

    class Example1
    {
       public $cache_file;
    
       function __construct()
       {
          // some PHP code...
          return true;
       }
    
       function __destruct()
       {
          $file = "/var/www/cache/tmp/{$this->cache_file}";
          if (file_exists($file)) @unlink($file);
       }
    }
    
    var_dump(serialize(new Example1()));
    // Output: string(39) "O:8:"Example1":1:{s:10:"cache_file";N;}"

    The serialized string doesn’t start with the dreaded ‘C’, but with an ‘O’ which is allowed by the is_serialized() function:

    case 'O' :

    Therefore, rejecting ‘C’ serialized strings is not enough.

    If we ignore for now the side-effects would also reject ‘O’ by removing this case statement be enough? No, not even close: relying only on the first character is not sufficient to prevent insecure deserialization exploits.

    @nacin mentions this to motivate and illustrate this filtering:

    – Right now, someone can insert the string C:16:”Serialized_Class”:50:{a:3:{i:0;s:3:”one”;i:1;s:3:”two”;i:2;s:5:”three”;}}. It will not pass is_serialized(), which means it will be treated as a simple scalar that can be inserted into the DB.
    – If we add this patch to a future WordPress version, then any existing C: strings will suddenly be unserializable. See the same is_serialized() in maybe_unserialize(). Thus, we would be turning a harmless string (on original insert) into a sleeping exploit waiting to be unserialized (on a future select).

    Indeed, the raw hypothetical string mentioned will effectively be rejected:

    var_dump(is_serialized('C:16:"Serialized_Class":50:{a:3:{i:0;s:3:"one";i:1;s:3:"two";i:2;s:5:"three";}}'));
    // Output: bool(false)

    However the workaround for an attacker would be trivial:

    var_dump(is_serialized('a:1:{i:0;C:16:"Serialized_Class":50:{a:3:{i:0;s:3:"one";i:1;s:3:"two";i:2;s:5:"three";}}}'));
    // Output: bool(true)

    Here, wrapping the malicious payload inside an array is sufficient to bypass this insecure deserialization protection mechanism, and this is just a trivial example of what can be done (and no, partial-coverage doesn’t have any security benefit: either the is_serialized() function aims to protect against insecure deserialization, is documented as such and flaws are vulnerabilities, or it does not aim to bring such protection).

    Therefore, to be effective, this mechanism within the is_serialized() function should really parse the whole serialized string in search for potentially harmful content. This would be unrealistic, precisely why in PHP 7.0.0 the prototype of the unserialize() has been updated to provide more control over object deserialization.

    Nevertheless, the question remains asked: should is_serialized() return false for all serialized objects to enforce a protection against insecure deserialization?

    ____________________

    Possibility 2: is_serialized() should stick to its documented behavior: “False if not serialized and true if it was.”

    IMHO, a function named is_serialized() should limit itself to checking whether a given is or isn’t a valid serialized string:

    • It should not lie to the caller and return false for certain valid serialized strings following an undocumented behavior.
    • It should not rely on fragile heuristics and return true for strings which will fail when attempting to unserialize them, generating another room for unexpected and potentially vulnerable behavior.

    ____________________

    Possibility 3: is_serialized() should provide an additional parameter, $allow_objects, allowing the caller the select the desired behavior

    This would be the ideal as it would allow the caller to select the desired behavior between the possibilities 1 and 2 (if we assume there is a clean and safe way to implement possibility 1).

    ____________________

    My take on this

    My feeling on this is that is_serialized() should limit itself to what it’s name says: check whether or not a given string is a serialized string in a clean and standard manner.

    Doing so would allow to use proper, dedicated third-party protection mechanism against a larger scope of unserialize-related exploits (not only insecure deserialization but also various memory corruptions), as offered by Snuffleupagus for instance (the heir of Suhosin), which is currently incompatible with WordPress due to this ambiguity on the exact role of the is_serialized() function.

    But as I said above, I’m not a professional WordPress developer, so I may very well miss something and that’s why input from people more knowledgeable than me on the subject would be warmly welcome 🙂 !

Viewing 2 replies - 1 through 2 (of 2 total)
  • Moderator Marius L. J.

    (@clorith)

    Hi there @whitewinterwolf

    I noticed you were already working on some patches here, the support forums are great for making suggestions on features you’d like to see, or enhancements that would be nice, but the actual technical implementation discussion of such should really happen in the ticket it self.

    For others interested in this issue, the topic author has created trac ticket #53295, which is the place to be 🙂

    Thread Starter whitewinterwolf

    (@whitewinterwolf)

    Hi @clorith,

    I was not very sure, as my question here is really about the expected behavior of this function independently of any patch, but the conversation will continue on the ticket.

    Thanks!

Viewing 2 replies - 1 through 2 (of 2 total)
  • You must be logged in to reply to this topic.