Support » Developing with WordPress » Sanitizing text field and text area

  • Resolved Guido

    (@guido07111975)



    Hi,

    I’m having trouble sanitizing the subject field and text area of my contact form plugin, using the native WP filters.

    Common characters such as Double quotes, Apostrophes and Ampersands are converted to HTML entities. This doesn’t look nice in form submissions.

    For text field I use sanitize_text_field() and for text area wp_kses_post().

    Is there a native filter that allows these common HTML characters?
    Or should I somehow convert them back to regular text, before creating/sending the form submission?

    Guido

Viewing 7 replies - 1 through 7 (of 7 total)
  • Moderator bcworkz

    (@bcworkz)

    Heya Guido!

    In my experience those functions do not alter double quotes and apostrophes/single quotes. The ampersand is changed to a HTML entity by wp_kses_post(), but not sanitize_text_field(). Not that this changes your question. The scope of application might be more limited than you thought. Or there may be additional filtering going on due to your theme or other plugins on your dev/test site.

    Doesn’t look nice in form submissions because they are plain text?

    There is no filter that I’m aware of that allows certain common characters through without converting to HTML entities. However, there is a global array of HTML entity names that are permitted: $allowedentitynames If you were to remove entities you don’t want converted, it might meet your needs. I’ve not tested this myself. It would be a good idea to restore the original entities when you are done since other code may be relying upon this list for other purposes.

    My inclination is to use “pre_kses” and “sanitize_text_field” filters to seek out undesirable entities and restore the respective characters. Or run the entire thing through html_entity_decode() to wholesale restore all characters. While it may seem inefficient to run through kses or similar only to undo the conversions afterwards, there are other benefits from kses that are good to implement in sanitizing text.

    Of course the other option is to write your own sanitation function that does much of what kses does except for encoding certain characters. There are a number of WP functions you could still use to some benefit that do not do entity conversions, wp_check_invalid_utf8() for example. Thus you don’t have to totally code your own sanitation function from the ground up. Just be sure to call the right functions from within your sanitation function.

    Guido

    (@guido07111975)

    Hi BC,

    Thanks for your response. How are you doing?!

    Did not know about wp_check_invalid_utf8() but it seems to have the same effect as sanitize_text_field() so it doesn’t help much.

    But I’m mistaken regarding the converting to HTML entities… I now notice the ampersad is only converted when using wp_kses_post(). As you’ve mentioned already.

    There’s a backslash \ added before the other characters. So for example " becomes \". But only in the form submissions which are send to me via mail. I’m using the same filtering for a custom text widget but no abnormal behaviour on my website.

    Any thoughts?

    Guido

    Moderator bcworkz

    (@bcworkz)

    It’s normal for form data coming into PHP via $_POST to be slashed. You simply run data through stripslashes(). If you are placing form data into a plain text email, you don’t need much sanitation since code injection attacks aren’t possible in plain text. It would be a good idea to run data through strip_tags() just in case the user added HTML tags, which have no place in plain text.

    For data destined for the DB, the data does need to be slashed again, but it’s a good idea to first stripslashes() to ensure you are starting from an unslashed state. You do not want to accidentally double slash data. In any case, if using WP functions to add data to the DB, the slashing is either handled for you or is added as part of $wpdb->prepare().

    If the data is destined for a web page or HTML email, there is no problem with the entities conversion. In fact, you need to run data at least through htmlspecialchars(), which converts ampersands, quotes, apostrophes, and less than signs to entities. This is necessary to prevent code injection attacks. Because entities display correctly as characters in HTML, there should be no problem with entity conversions.

    What sanitation you do is dependent upon the destination of the data. At least do stripslashes() on incoming form data. Do more as required by the destination. Further validation may be warranted, depending on the nature of the data asked for. For example, if a quantity count is expected, only characters 0-9 need be allowed, which would never be slashed. An exception to the do stripslashes() on everything rule, though no harm would come by doing stripslashes() anyway.

    Guido

    (@guido07111975)

    Hi BC,

    Using stripslashes() works great indeed, gonna use this for sure. This only solves the backslash issue but it’s a great start. I already use strip_tags() sometimes, maybe I can add some allowable tags, such as line-breaks, so I can use it for my text area as well. Will work on that.

    I hoped WordPress itself had filters I could use, but apparently in this case I have to work with native PHP filters.

    Yes, my form submission is plain text, but currently I’m using the same variables throughout my form, and they’re already sanitized. So I might have to rebuild that part. Add variables which I only use for my form submission.

    Thanks for helping me!

    Guido

    Guido

    (@guido07111975)

    Hi BC / @bcworkz

    As you know my form submission is send as text/plain.

    I’m not succeeding in properly displaying a textual string from the database in the form submission, because special characters are still being displayed as HTML entities.

    For example, this is how I get the blog name:

    
    $blogname = esc_attr( get_option('blogname') );
    

    (I don’t use get_bloginfo('name') because of it’s filters)

    Then while building the form submission I use this:

    
    html_entity_decode($blogname)
    

    But this has no effect.

    Any thoughts?

    Guido

    Moderator bcworkz

    (@bcworkz)

    Could it be the text was entity encoded twice? In the following, disregard the underscores I placed in entities to prevent the entities from being rendered as a single characters.
    UTF-8: Alla Människor
    Encoded once: Alla M&_auml;nniskor
    Encoded twice: Alla M&_amp;auml;nniskor
    Decoding the last example once will yield the second example, giving the appearance the decode did not work.

    How did the entities get there to start with? There’s no need to encode entities for saving in a UTF-8 DB. While someone might manually type some entities when inputting blog names or whatever, a single entity decode will resolve that. No one is going to manually type in &_amp;auml; double encoding 🙂

    Doing nothing upon saving (WP functions handle the required slashing), then entity decoding once upon retrieval just for good measure should be adequate for all expected inputs.

    Guido

    (@guido07111975)

    When I take a second look at my example, I fully understand.. bad practice.

    I have worked on it some more, and only having trouble with retrieving the blogname from DB and printing it without HTML entities.

    So Guido's website becomes Guido & # 0 3 9 ; s website in my form submission.

    So I tried $blogname = wp_kses_post( get_option('blogname') ); but again no success.

    Guido

Viewing 7 replies - 1 through 7 (of 7 total)
  • You must be logged in to reply to this topic.