Support » Fixing WordPress » sanitize_title() hack for non-english langs

  • The problem: the original function sanitize_title() (on functions-formatting.php) was “eating” the letters with tildes, acutes, etc. E.g. the title “Ésto provoca una emoción tremenda” give “sto-provoca-una-emocin-tremenda”, with the additional loss of possible rank on the search engines.
    The solution: I have done some modifications…
    function sanitize_title($title) {
    global $wpdb, $tableposts, $action; //get some globals for final checking
    $disallow = array('&','¡','¿','ð','þ','ß'); //exceptions for our 'magic' regex
    $allow = array('&','¡','¿','','','sz'); //exceptions cleaned
    $title = strtolower($title);
    $title = htmlentities($title); // we get all the tildes, acutes, etc.
    $title = str_replace($disallow, $allow, $title); //kill the exceptions
    $title = preg_replace("/&(\w{1})(\w+);/i", '\\1', $title); // get the base letter and replace the tildes, acutes, etc.
    $title = preg_replace('/&.+?;/', '', $title); // kill remainig entities
    $title = preg_replace('/[^a-z0-9_ -]/', '', $title); //added "_" support
    $title = preg_replace('/\s+/', ' ', $title);
    $title = str_replace('.', '', $title);
    $title = trim($title);
    $title = str_replace(' ', '-', $title);
    if ($action!='editpost'){ //if not editing post, check if there are some post with the same name. If there is, add "_#" at the end (this is hack)
    $cuenta = $wpdb->get_results("SELECT ID, post_name FROM $tableposts WHERE post_name='$title'");
    if (count($cuenta)>0) $title = $title.'_'.(count($cuenta)+1);
    }
    return $title;
    }

    There is the solution for 1 more problem… the final “if” checks if the sanitized title exists… If it exists the function add a “_#” (where # is a number) at the end… this solves the problem to people that is using a permalink structure like: /archives/%postname%/
    This “hack” add a better support for non-english languages.
    Note: Maybe this is not the best way to solve the two problems, but I’m not a programmer, just a “designer with a nice hobby” 😉
    Greets!

Viewing 2 replies - 1 through 2 (of 2 total)
  • Thread Starter ala_747

    (@ala_747)

    Mmmmmmmmmmmmm…. bad non-english language support on the forums too, I think…
    Could any admin fix the above code, please? All the & gives &
    Thanks!

    Thanks for the idea. I have come up with a similar solution, that will be committed on CVS once 1.2 is released (because that fix hasn’t been tested enough).
    Consider this fixed, just not for 1.2.

Viewing 2 replies - 1 through 2 (of 2 total)
  • The topic ‘sanitize_title() hack for non-english langs’ is closed to new replies.