• Resolved Guido

    (@guido07111975)


    Hi,

    I’m having trouble sanitizing the subject field and text area of my contact form plugin, using the native WP filters.

    Common characters such as Double quotes, Apostrophes and Ampersands are converted to HTML entities. This doesn’t look nice in form submissions.

    For text field I use sanitize_text_field() and for text area wp_kses_post().

    Is there a native filter that allows these common HTML characters?
    Or should I somehow convert them back to regular text, before creating/sending the form submission?

    Guido

Viewing 15 replies - 1 through 15 (of 15 total)
  • Moderator bcworkz

    (@bcworkz)

    Heya Guido!

    In my experience those functions do not alter double quotes and apostrophes/single quotes. The ampersand is changed to a HTML entity by wp_kses_post(), but not sanitize_text_field(). Not that this changes your question. The scope of application might be more limited than you thought. Or there may be additional filtering going on due to your theme or other plugins on your dev/test site.

    Doesn’t look nice in form submissions because they are plain text?

    There is no filter that I’m aware of that allows certain common characters through without converting to HTML entities. However, there is a global array of HTML entity names that are permitted: $allowedentitynames If you were to remove entities you don’t want converted, it might meet your needs. I’ve not tested this myself. It would be a good idea to restore the original entities when you are done since other code may be relying upon this list for other purposes.

    My inclination is to use “pre_kses” and “sanitize_text_field” filters to seek out undesirable entities and restore the respective characters. Or run the entire thing through html_entity_decode() to wholesale restore all characters. While it may seem inefficient to run through kses or similar only to undo the conversions afterwards, there are other benefits from kses that are good to implement in sanitizing text.

    Of course the other option is to write your own sanitation function that does much of what kses does except for encoding certain characters. There are a number of WP functions you could still use to some benefit that do not do entity conversions, wp_check_invalid_utf8() for example. Thus you don’t have to totally code your own sanitation function from the ground up. Just be sure to call the right functions from within your sanitation function.

    Thread Starter Guido

    (@guido07111975)

    Hi BC,

    Thanks for your response. How are you doing?!

    Did not know about wp_check_invalid_utf8() but it seems to have the same effect as sanitize_text_field() so it doesn’t help much.

    But I’m mistaken regarding the converting to HTML entities… I now notice the ampersad is only converted when using wp_kses_post(). As you’ve mentioned already.

    There’s a backslash \ added before the other characters. So for example " becomes \". But only in the form submissions which are send to me via mail. I’m using the same filtering for a custom text widget but no abnormal behaviour on my website.

    Any thoughts?

    Guido

    Moderator bcworkz

    (@bcworkz)

    It’s normal for form data coming into PHP via $_POST to be slashed. You simply run data through stripslashes(). If you are placing form data into a plain text email, you don’t need much sanitation since code injection attacks aren’t possible in plain text. It would be a good idea to run data through strip_tags() just in case the user added HTML tags, which have no place in plain text.

    For data destined for the DB, the data does need to be slashed again, but it’s a good idea to first stripslashes() to ensure you are starting from an unslashed state. You do not want to accidentally double slash data. In any case, if using WP functions to add data to the DB, the slashing is either handled for you or is added as part of $wpdb->prepare().

    If the data is destined for a web page or HTML email, there is no problem with the entities conversion. In fact, you need to run data at least through htmlspecialchars(), which converts ampersands, quotes, apostrophes, and less than signs to entities. This is necessary to prevent code injection attacks. Because entities display correctly as characters in HTML, there should be no problem with entity conversions.

    What sanitation you do is dependent upon the destination of the data. At least do stripslashes() on incoming form data. Do more as required by the destination. Further validation may be warranted, depending on the nature of the data asked for. For example, if a quantity count is expected, only characters 0-9 need be allowed, which would never be slashed. An exception to the do stripslashes() on everything rule, though no harm would come by doing stripslashes() anyway.

    Thread Starter Guido

    (@guido07111975)

    Hi BC,

    Using stripslashes() works great indeed, gonna use this for sure. This only solves the backslash issue but it’s a great start. I already use strip_tags() sometimes, maybe I can add some allowable tags, such as line-breaks, so I can use it for my text area as well. Will work on that.

    I hoped WordPress itself had filters I could use, but apparently in this case I have to work with native PHP filters.

    Yes, my form submission is plain text, but currently I’m using the same variables throughout my form, and they’re already sanitized. So I might have to rebuild that part. Add variables which I only use for my form submission.

    Thanks for helping me!

    Guido

    Thread Starter Guido

    (@guido07111975)

    Hi BC / @bcworkz

    As you know my form submission is send as text/plain.

    I’m not succeeding in properly displaying a textual string from the database in the form submission, because special characters are still being displayed as HTML entities.

    For example, this is how I get the blog name:

    
    $blogname = esc_attr( get_option('blogname') );
    

    (I don’t use get_bloginfo('name') because of it’s filters)

    Then while building the form submission I use this:

    
    html_entity_decode($blogname)
    

    But this has no effect.

    Any thoughts?

    Guido

    Moderator bcworkz

    (@bcworkz)

    Could it be the text was entity encoded twice? In the following, disregard the underscores I placed in entities to prevent the entities from being rendered as a single characters.
    UTF-8: Alla Människor
    Encoded once: Alla M&_auml;nniskor
    Encoded twice: Alla M&_amp;auml;nniskor
    Decoding the last example once will yield the second example, giving the appearance the decode did not work.

    How did the entities get there to start with? There’s no need to encode entities for saving in a UTF-8 DB. While someone might manually type some entities when inputting blog names or whatever, a single entity decode will resolve that. No one is going to manually type in &_amp;auml; double encoding 🙂

    Doing nothing upon saving (WP functions handle the required slashing), then entity decoding once upon retrieval just for good measure should be adequate for all expected inputs.

    Thread Starter Guido

    (@guido07111975)

    When I take a second look at my example, I fully understand.. bad practice.

    I have worked on it some more, and only having trouble with retrieving the blogname from DB and printing it without HTML entities.

    So Guido's website becomes Guido & # 0 3 9 ; s website in my form submission.

    So I tried $blogname = wp_kses_post( get_option('blogname') ); but again no success.

    Guido

    Moderator bcworkz

    (@bcworkz)

    Ah, yes, apparently WP runs that field through htmlspecialchars() before saving. Oddly, htmlspecialchars_decode() does not seem to undo this. No idea why. This is only an issue with the special chars, other non or extended latin chars are fine. In particular the apostrophe is problematic, the other special chars are much less likely to be used (but one never knows what users will do).

    The only thing I can figure is to do something like
    $blogname = str_replace('&_#039;', "'", $blogname ); // no underscore in 'needle'
    You can expand that to include the other special chars in search and replacement arrays, there aren’t all that many. While this may seem ridiculously crude, it’s not unprecedented. Look at the source for remove_accents() 🙂 I say do whatever works!

    Thread Starter Guido

    (@guido07111975)

    Hi BC,

    I already thought I was losing my mind because I wasn’t able to find a fix for this. But fortunately it’s not just me who’s having trouble fixing this 😉

    I’m now using preg_replace to strip backslashes from form submission, because stripslashes() or similar native WP filters stripslashes_deep() and wp_unslash() didn’t work as expected. Both native filters might not even be best practice in this case, I guess.

    
    preg_replace( '/\\\\/', '', $value );
    

    So I have to use preg_replace for the blogname as well… maybe not the cleanest fix but it does the job. I do want it because an apostrophe is used in site names quite often.

    Thanks man 🙂

    Guido

    Moderator bcworkz

    (@bcworkz)

    No problem.

    A reminder: try to use str_replace() instead of preg_replace() if you can. While preg_replace() is really cool and powerful, it’s not very efficient 🙂

    Thread Starter Guido

    (@guido07111975)

    I still have a LOT to learn 😉

    Besides this, it’s crazy WordPress still adds escaping backslashes to certain characters when using $POST, but I guess this has something to do with backwards compatibility..

    Guido

    Thread Starter Guido

    (@guido07111975)

    but I guess this has something to do with backwards compatibility..

    Nope, it’s a security thing used for escaping special characters… to help noobs like me… am I right?!

    Guido

    Thread Starter Guido

    (@guido07111975)

    Hi @bcworkz

    Oddly, htmlspecialchars_decode() does not seem to undo this. No idea why.

    I’m now almost certain it does not convert the single quote back because I need to set a flag as well? Will try this asap.

    Last question, you think there will be a conflict by using the stripslashes directly after escaping the variable? Example:

    
    <input name="form_name" value="'.esc_attr(stripslashes($value)).'" />
    

    Guido

    Moderator bcworkz

    (@bcworkz)

    TIL about flags for htmlspecialchars_decode() 🙂 Thanks for digging further.

    I don’t see any conflict. It prevents slashes from getting through if their existence is a possibility and stripslashes() has no effect if there are no slashes.

    What I always do is immediately stripslashes when getting a value from $_POST or whatever superglobal. Then sanitize and validate before using WP functions to save the data, which SQL escapes as necessary. Then, when using reciprocal WP functions to get the data for output, which unescapes as necessary, I’m sure the data is in the same unslashed state that I started with and I can simply esc_attr() (or other appropriate output sanitation) without concern for the slashed state.

    I guess it amounts to the same thing as your example, it’s just a matter of how much code happens in between. What’s important is to be consistent and keep slashing and unslashing balanced. (realizing PHP does the initial slashing)

    Thread Starter Guido

    (@guido07111975)

    Hi BC,

    What I always do is immediately stripslashes when getting a value from $_POST or whatever superglobal.

    I thought that was bad practice in my case, because I also was using those values for storage in database, using wp_insert_post().

    I started a new thread about the sanitizing that wp_insert_post() uses and Hugh responded and told me it also sanitizes quotes before storage in DB.

    So yes, I now stripslash (almost) directly after $_POST 🙂

    Thanks again, case closed!

    Guido

Viewing 15 replies - 1 through 15 (of 15 total)
  • The topic ‘Sanitizing text field and text area’ is closed to new replies.