WordPress.org

Ready to get started?Download WordPress

Forums

Gravity Forms Salesforce Add-on
[resolved] Cleaning UTF-8 for SOAP submission (6 posts)

  1. gmcinnes
    Member
    Posted 1 year ago #

    So, some characters that are valid UTF-8 are not valid in XML. Specifically, control characters like 0xb (which was what was tripping me up) but also other ones.

    To fix this I used this snippet: http://www.consil.co.uk/files/2010/02/clean_utf8_xml_string.php_.txt

    then wrapped each call to htmlspecialchars in the create function in clean_utf8_xml_string() like:

    $merge_vars[$var_tag] = clean_utf8_xml_string(htmlspecialchars($entry[$field_id]);

    http://wordpress.org/extend/plugins/gravity-forms-salesforce/

  2. gmcinnes
    Member
    Posted 1 year ago #

    by the way, the error message if you get bit by this would look something like:

    PHP Fatal error: Uncaught SoapFault exception: [soapenv:Client] An invalid XML
    character (Unicode: 0xb) was found in the element content of the document.

  3. gmcinnes
    Member
    Posted 1 year ago #

    Here is a patch for this against 2.1.1

    ---   salesforce-api.php	2012-11-27 14:37:45.000000000 -0500
    +++ salesforce-api.php	2012-11-27 16:39:48.000000000 -0500
    @@ -1,4 +1,67 @@
     <?php
    +/**
    + * Pinched from: http://www.consil.co.uk/files/2010/02/clean_utf8_xml_string.php_.txt
    + * Name: clean_utf8_xml_string
    + * Purpose: to remove or transform bytes or characters in a UTF-8 stream that
    + * will cause problems when parsed as XML. Not every UTF-8 character is a valid
    + * XML character.
    + * Author: Jason Judge
    + * Licence: GPL V3
    + * Created: 2010-01-07
    + *
    + * Takes a UTF-8 string and replaces any character that is not valid in an XML document.
    + *
    + * Note this does not require PCRE unicode libraries, which are often a problem on hosted
    + * servers.
    + *
    + * Inspiration taken from http://stackoverflow.com/questions/1401317/remove-non-uft8-characters-from-string
    + */
    +
    +function clean_utf8_xml_string($matches)
    +{
    +    // This part handles the callback.
    +    if (is_array($matches)) {
    +        if (isset($matches[1]) && $matches[1] !== '') {
    +            // Compatibility characters.
    +            // Return as-is for now, but could map it to another character.
    +            return $matches[1];
    +        } elseif (isset($matches[2]) && $matches[2] !== '') {
    +            // Valid UTF-8 for XML
    +            return $matches[2];
    +        } elseif (isset($matches[3]) && $matches[3] !== '') {
    +            // Invalid single-byte characters.
    +            // Instead of removing these, we can assume they are another character set and map them.
    +            // Assume they are ISO8859-1 for now, but this could be parameterized.
    +            return iconv('ISO-8859-1', 'UTF-8', $matches[3]);
    +        } elseif (isset($matches[4]) && $matches[4] !== '') {
    +            // Control characters - no mappings - so return a replacement character.
    +            // You may wish to return something different, or nothing at all.
    +            return '?';
    +        }
    +    }
    +
    +    // This part handles the first instance.
    +    if (is_string($matches)) {
    +        return preg_replace_callback('/'
    +            // Ranges recommended to avoid - "compatibility characters".
    +            // See http://www.w3.org/TR/REC-xml/ for the character ranges.
    +            . '([\x7F-\x84]|[\x86-\x9F]|[\xFD][\xD0-\xEF]|[\x1F\x2F\x3F\x4F\x5F\x6F\x7F\x8F\x9F\xAF\xBF\xCF\xDF\xEF\xFF\x10][\xFF][\xFE\xFF])'
    +
    +            // Broad valid UTF-8 multi-byte ranges.
    +            . '|([\x09\x0A\x0D]|[\x20-\x7F]|[\xC0-\xDF][\x80-\xBF]|[\xE0-\xEF][\x80-\xBF]{2}|[\xF0-\xF7][\x80-\xBF]{3})'
    +
    +            // Invalid single-byte characters which are likely to be extended ASCII and may be convertable to UTF-8 equivalents.
    +            . '|([\x80-\xBF]|[\xC0-\xFF])'
    +
    +            // Fall-through - whatever is left, which should be single-byte control characters.
    +            . '|(.)'
    +
    +            // If this is used as a static method, then replace __FUNCTION__ with __CLASS__ . '::' . __METHOD__
    +            . '/', __FUNCTION__, $matches
    +        );
    +    }
    +}
    +
     /*
     Plugin Name: Gravity Forms Salesforce API Add-On
     Plugin URI: http://www.seodenver.com/salesforce/
    @@ -1359,16 +1422,17 @@
                     }
                 }
             }
    -
    +        $merge_vars = array_map(function($merge_var) { return clean_utf8_xml_string($merge_var); }, $merge_vars);
    +
             $account = new SObject();
    
             $account->fields = $merge_vars;
    
             // Object type
             $account->type = $feed['meta']['contact_object_name'];
    -
    +
             $result = $api->create(array($account));
    -
    +
             $debug = '';
             if(self::is_debug()) {
                 $debug = '<pre>'.print_r(array(
  4. Zack Katz
    Member
    Plugin Author

    Posted 1 year ago #

    I've implemented this differently, using mb_convert_encoding() instead.

    $merge_vars = array_map(function($merge_var) { return mb_convert_encoding($merge_var, "UTF-8"); }, $merge_vars);

    This will be added to the next version.

  5. gmcinnes
    Member
    Posted 1 year ago #

    Hmm. I'm not sure this will do the job. The point is that the string to be sent *already is* valid UTF-8. But there are some UTF-8 chars that are not valid XML. Specifically, there are UTF-8 control plane chars that need to be removed.

    I suspect your approach will leave those control plane chars in place.

  6. Zack Katz
    Member
    Plugin Author

    Posted 1 year ago #

    If it still doesn't work for you, please send your Word doc and XML form output to support@katz.co - I think this works, though.

Topic Closed

This topic has been closed to new replies.

About this Plugin

About this Topic

Tags

No tags yet.