• Resolved Novice999

    (@novice999)


    Hi, everyone,
    after automatically updating WP from 2.8.4 (English) to 3.4.1 (Portuguese), I googled the tag that takes me to the top of the list and noticed that asian-like characters appeared in replacement of things like “ê”, “ô”, “ç”, and so on,
    that used to be there before. Nevertheless, it seems to be restricted to search engines, and, actually, to the very first, top entry only. The site itself does not show any perceivable change.
    The same happens if I try it with yahoo and altavista, and I still get the top entry.
    Any hint would be very much appreciated.

Viewing 15 replies - 1 through 15 (of 16 total)
  • Could you show us an example?

    Thread Starter Novice999

    (@novice999)

    Hi Peter,
    if you google the text “fce uerj”, You will get, at top of the entries listed, something like “Faculdade de CiXXncias EconYYmicas – FCE” where one should read “Faculdade de Ciências Econômicas – FCE”. In the two first lines of the paragraph that sits just below the site address, some other asian-like characters appears as well.
    As I said, it seem to be restricted to the first entry only.
    If you go to either yahoo or altavista, you will get the same results.
    Interesting enough is the fact that the site itself does not show any strange characters (you know the address already).
    Thank you for the interest.

    Each website has a language attribute in it’s html tag. This lets search engines know what language your site is in.

    You can see this in the source code of your site: (2nd line)
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="ru">

    Your site seems to be set to ru – russian. So google is trying to make sense of special Portugese characters while thinking it’s Cyrillic. That’s the problem.

    As for a solution:
    Normally, wordpress gets it’s language attribute from the installation. The wp_config.php has this line of code:
    define ('WPLANG', 'ENG');
    Yours should say
    define ('WPLANG', 'pt_BR');
    I think, double check this, I just looked at the source code of http://br.wordpress.org/.

    Following this, your header.php in your theme template file should have this line (or something like it):
    <html <?php language_attributes(); ?>>

    So, please check your config.php and tell me what language it has. If it’s set correctly (pt-BR or just pt) then you’ll probably have to manually change the header.php file. If it’s not correct, let me know. I’m not sure what changing the language in the wp-config will do. You might need do make a mysql to change the database but let me know, we’ll take it from there.

    Thread Starter Novice999

    (@novice999)

    Hi, Peter,

    Yes, this is what the site/header source code shows (up to the 2nd line):

    (1) <!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Transitional//EN” “http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd”&gt;
    (2) <html xmlns=”http://www.w3.org/1999/xhtml&#8221; xml:lang=”ru”>,

    whereas the WordPress(Pt-BR) side code reads:

    (1) <!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN” “http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd”&gt;
    (2) <html xmlns=”http://www.w3.org/1999/xhtml&#8221; dir=”ltr” lang=”pt-BR”>

    As you pointed out, there is this lang=”ru” instead of lang=”pt-BR”, besides a few other mismatches in the two lines.

    In the wp-config.php, what I find in regard to languages is:
    define (‘WPLANG’, ”);
    and
    define(‘DB_CHARSET’, ‘utf8’);
    define(‘DB_COLLATE’, ”);
    all of which may be of interest as well, I think.

    So, I am reporting these findings back to you as you told me to.

    Again, thank you so much for your being so helpful.

    Thread Starter Novice999

    (@novice999)

    PS: I did not find the
    <html <?php language_attributes(); ?>>
    in the header.php.
    Is it something that should have been there already or something you are going to insert?
    Sorry for not talking about that in my previous post.

    Thread Starter Novice999

    (@novice999)

    I would like to add something else. In a backup that had been made before the update was done, the config.php code shows the line
    define (‘WPLANG’, ‘pt_BR’);
    as you said it should.
    Sorry again.

    this probably means that the line <html xmlns=”http://www.w3.org/1999/xhtml&#8221; xml:lang=”ru”> is hardcoded inside your template file, the header.php. Can you see it?

    Thread Starter Novice999

    (@novice999)

    Yes,
    the code
    <html xmlns=”http://www.w3.org/1999/xhtml&#8221; xml:lang=”ru”>,
    without any changes, appears in:
    1. site source code (2nd line) of the current home-page (*);
    2. header.php – old, from backup, v. 2.8.4, English, before update, and
    3. header.php – new, after update, v. 3.4.1, Portuguese.

    * – where it can be seen a diamond (with a question mark inside it) replacing each of the special characters used in the Portuguese language, all of which are shown by Google as asian-like, or Cyrillic characters, as you said.

    There is something else. In a prevoius post, I reported that the config.php file had the line
    define(‘WPLANG’, ”).
    I had found that line in a local coy of the site, in which, probably, I did not change it when editing the file for the localhost (that is, to run locally on Xampp).
    Actually, the config.php reads
    define (‘WPLANG’, ‘pt=BR’)
    in both the backup and the current, on-line site. Please accetpt my apologies.

    So, do you think I ought to be editing header.php, replacing “ru” with “pt-BR”, as a first trial?
    Thank you.

    Yes, edit them, not as a trail but as a solution.

    There might be other header template files that look like:
    header-single.php or header-page.php or header-….php
    If so, change them too.

    Once changed, check your source code to see if it says pt-BR.
    (btw in your config it shold say pt_BR, not pt=BR)

    Finally, check your meta description to see if the special chars are correctly displayed. If so, it’ll take google some time to display them correctly.

    If not, let me know.

    Thread Starter Novice999

    (@novice999)

    Hi, Peter_L,
    Sorry for the waiting period.

    Unfortunatelly, the suggested editing did not work.

    As for the “header-something” files, the only one regarding languages and explicitly related to the header template that I could find was header_http_style.inc.php, where one can read in lines 34 and 35 (actually one line):

    34 <!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Transitional//EN”
    35 “http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd”&gt;

    and 36(one long line):

    36 <html xmlns=”http://www.w3.org/1999/xhtml&#8221; xml:lang=”<?php echo $GLOBALS[‘available_languages’][$GLOBALS[‘lang’]][2]; ?>” lang=”<?php echo $GLOBALS[‘available_languages’][$GLOBALS[‘lang’]][2]; ?>” dir=”<?php echo $GLOBALS[‘text_dir’]; ?>”>

    To me, it seems that those lines are sort of parsed/interpreted/compiled before they are written on to the header.php, according to some char table where Portuguese special chars are missing. But, as a Novice, I am probably wrong.

    I would appreciate if you would give me any other sugggestions.

    Thank you so much, once again.

    Hey, sorry for the late response. I didn’t get an email notification.

    Don’t touch the header_http_style.inc.php file. That’s a core file. The template files are the ones in the themes/mythemename/ folder.

    Can you summarize the problem once more? Where are you at solving this problem?

    Thread Starter Novice999

    (@novice999)

    Hi, sorry again for not answering earlier.
    The problem is that Google shows Asian-like characters in replacement of Portuguese language special characters (á, é, ô, ç, â…). If you type “fce uerj” at Google´s search box, you get those characters in the very first line of the first entry. The first occurence is a replacemet of what should be an “ê”; Next, a replacement of an “ô”. You can see it, once again, at the link “Contato”, where a “ã” and a “º” (the Portuguese equivalent ordinal symble for “st”, “nd”, rd, an “th”, in English. In that case, 8th floor)
    In fact, at first there was the “pt-BR” thing, in the header.php file, as you pointed out. There was a similar problem with the wp-config.php file too. Both files were edited, as suggested, but that did not change anything. Even the source-code (loaded site) was still showing the strange characters.
    What I did was edit the lines in the header.php that were presenting the problem. Now the source-code (loaded) no longer shows any strange characters. However, Google still shows them. Do you think it is a matter of waiting for them to update the registered data the have got in their files? Is there any procedure to get them to update the information they have?
    Thank you for the assistance.

    The source code does seem to be OK now.

    If you type “site:www.fce.uerj.br/” in any search engine it will show you a list of the pages of the site it has in it’s index.

    You’ll see that in at least half of the results the special chars are correctly displayed.

    So, I’m thinking, it will be ok. Just give google some more time to update it’s index.

    I don’t think you can speed this process up. Perhaps if you try google webmaster tools. There is also a language setting in there.

    Thread Starter Novice999

    (@novice999)

    Hi, Peter_L,
    now it seems that the strange characters have gone. I think that it was a matter of giving Google and other search engines some time for the updating of their indexes, exactly as you said.
    Shoud we close the topic?
    Once again, thank you very much for your help.

    Cool. Mark as resolved.

Viewing 15 replies - 1 through 15 (of 16 total)
  • The topic ‘Version update caused character change only in search engines´ entries.’ is closed to new replies.