WordPress.org

Ready to get started?Download WordPress

Forums

WP Super Cache
still not corrected... bug about unicode-range letters in permalink (55 posts)

  1. qdinar
    Member
    Posted 3 years ago #

    even before trying [B] and int:escape i have made prg:customfunction this time. it thought it has worked before i have started to write these posts, but now again i am trying it and i see that it does not work, but only i have successed to make it working beginning with empty apache rules.
    configuration:
    (but let i write configuration:)

    RewriteEngine On
    RewriteMap aylandirow2 prg:/var/www/localhost/test/localhost/aylandirow2.php
    RewriteRule ^/(.*)$ /var/www/localhost/test/localhost/${aylandirow2:$1}
    RewriteLogLevel 3
    RewriteLog /var/log/apache2/rewrite.log

    the aylandirow2.php's content:

    #!/usr/bin/php
    <?php
    set_time_limit(0);
    $fdin=fopen("php://stdin","r");
    $fdout=fopen("php://stdout","w");
    set_file_buffer($fdout,0);
    while($i=fgets($fdin)){
    	$i2=explode('/',rtrim($i));
    	$i2o=count($i2);
    	for($j=0;$j<$i2o;$j++){
    		$i2[$j]=strtolower(urlencode($i2[$j]));
    	}
    	$o=implode('/',$i2)."\n";
    	fputs($fdout,$o);
    }
    ?>

    i have found the way to make it in internet near 1 year ago and have not searched now again, just used code from my computer.
    aylandirow2.php is runnable and it's owner is www-data.

    i have just added "strtolower(" here, the big hex digits i said about probably was for this, this has generatet big hex digits without "strtolower(..)". and btw as i remember i have seen in httpfox in firefox, that firefox randomly requests with big or small hex digits, but that is not important, anyway, rewrite rules and this function get unicode.

    and i also have tried to understand [NE] flag but could not find any change of behavior by it, i think maybe there indeed is not any because my php works with fcgid.

  2. Donncha O Caoimh
    Member
    Plugin Author

    Posted 3 years ago #

    Grab the development version from the download page in about 30 minutes. I added an strtolower() around the supercache directory so when comments are left the right directory is deleted.

  3. qdinar
    Member
    Posted 3 years ago #

    i see that i had written "this has generatet", i think now how can i write so...

    not right directory was deleted before? i have not known that, could you give link to topic about that. i said in this topic in post http://wordpress.org/support/topic/plugin-wp-super-cache-still-not-corrected-bug-about-unicode-range-letters-in-permalink?replies=32#post-1685751 , that adding "urldecode" only in one place breaks blog. but this was only my hack. i thought that adding "urldecodes" in other places are needed to complete that modification so that it works well. do you mean that you have added "strtolowers" in addition to making something with url encoding, and now non-latin slugs are served by rewrite rules only?

  4. Donncha O Caoimh
    Member
    Plugin Author

    Posted 3 years ago #

    The cached file with the encoded characters in the directory name will be served by PHP, not the rewrite rules. I think it's a reasonable compromise and almost as fast as mod_rewrite.

    See this thread too.

  5. skippybosco
    Member
    Posted 3 years ago #

    @donncha: Is this the recent SVN update I just saw posted (337197)

    You mention that cached files with encoded characters in the directory name are served by PHP, is this a new change to the plugin or has it always been this way?

    Any ideas of the performance implications of this change? I ask as the majority of our URLs have some form of encoded characters. Where we see oddities is when there is a mixed character set (starts with english, ends with chinese, etc) which results in multiple directories being created as you describe here. Trying to decide if we're better in the long run to switch to all english based URLs and take the potential SEO hit or what creative options there are.

  6. Donncha O Caoimh
    Member
    Plugin Author

    Posted 3 years ago #

    Well, the PHP code to serve files has been in there for months so those files would have been served by that method.

    Performance wise, it's practically as fast as using mod_rewrite. Your site won't be able to handle a deluge of sudden traffic quite as well but under normal conditions it'll be just fine. The caching mode on the "Easy" settings page enables PHP caching rather than mod_rewrite just because it's easier to set up.

    Don't go changing URLs, you won't notice any speed difference.

  7. skippybosco
    Member
    Posted 3 years ago #

    @donncha: Thanks for the reply.

    I ask as we typically see 3,000 concurrent users online during our peak (which lasts 3-5 hours) so we're always looking for opportunities to reduce any performance impacting processes.

    At those levels can I assume that mod_rewrite would be more efficient or would it still be negligible?

  8. Donncha O Caoimh
    Member
    Plugin Author

    Posted 3 years ago #

    If your server is powerful enough to handle that many users then you won't notice any difference between the two caching methods.

  9. qdinar
    Member
    Posted 3 years ago #

    i have tested, compared with apache bench:
    [Log moderated as per the Forum Rules. Please use the pastebin]

    i have used development version of super cache and wordpress 3.0.1 , i have turned rewrite rules on.
    apache bench always requests without "last modified" ie for fresh file, that is ok, because we know that 304 headers are probably fast enough, so whether i turned 304 option on is not important, it is like off.

    compare:
    Time taken for tests: 0.100 vs 1.164 seconds. etc, all other results differ near 10 times. rewrite rules has won.

  10. qdinar
    Member
    Posted 3 years ago #

    bug report: 304 option disappears after switched to rewrite rules, but {the 304 option value that was set while {serving by php} was selected} is used. so somebody can forget that there is 304 option and do not know that it is turned off, and he will not think that for turning it on he need to switch to serving by php. but 304 option with rewrite rule mode is important only if the blog has many non-latin slugs.

  11. Donncha O Caoimh
    Member
    Plugin Author

    Posted 3 years ago #

    True, I should add a warning about that. It's impossible to support 304 headers with mod_rewrite rules alone.

  12. qdinar
    Member
    Posted 3 years ago #

    It's impossible to support 304 headers with mod_rewrite rules alone.

    no! check with httpfox! it works.

    my code in previous post is deleted, in rules it is said only 10 lines are allowed. then i post commands i used:

    ab -c 5 -n 50 http://wp.localhost/2010/10/01/ttt/
    ab -c 5 -n 50 http://wp.localhost/2011/01/20/%D3%A9/
  13. qdinar
    Member
    Posted 3 years ago #

    even you can check your test site, http://cuteandinsane.com/2010/12/hello-world/ , it sends 304 and it is made by super cache as said in comment in source. i add after several minutes: this is not enough, but now i have checked: and it is not written in header that it is sent by php (after i delete browser's local cache and then have requested and got fresh page).

  14. Donncha O Caoimh
    Member
    Plugin Author

    Posted 3 years ago #

    That's good then! The hello world post doesn't have any unusual characters in the URL so it's served using mod_rewrite, but my test post is definitely served by PHP. I just replicated that 304 header on both posts.

    The handling of the 304 header seems to be a bit random when it's Apache itself handling the request. Sometimes it works, sometimes it doesn't.

  15. Donncha O Caoimh
    Member
    Plugin Author

    Posted 3 years ago #

    Oh, and it doesn't matter that the 304 option is removed when mod_rewrite caching is enabled as the plugin has nothing to do with the 304 header then.

  16. qdinar
    Member
    Posted 3 years ago #

    no, only partially so. but for example if all posts are with unicode addresses, all blog will be served by php and will depend on 304 option.

  17. Donncha O Caoimh
    Member
    Plugin Author

    Posted 3 years ago #

    Good point, I'll change the option visibility later. Thanks!

  18. qdinar
    Member
    Posted 3 years ago #

    donncha, please make optional using unescaped filenames, so that it is not possible to turn on on windows, because there is probably ntfs with many special characters. but in ext3 and ext4 there is not many, only that i know are / , . and .. . which are probably already excluded from slugs. if unescaped filenames are used then no need to change rewrite rules, non-latin slugs will work without php. i think, maybe, it does not work if ext2(?)/3/4 partition is mounted with incorrect options so that utf-8 on it does not work. also you can make test that tries to write and read file with non-latin letters. other pros of unescaped filenames: they will become shorter and human-readable.

  19. Donncha O Caoimh
    Member
    Plugin Author

    Posted 3 years ago #

    I'd rather not just because of all the unknowns. Have you tried the latest dev version? The PHP powered pages should be fast enough.

  20. qdinar
    Member
    Posted 3 years ago #

    you can leave escaped all characters above some value, for example something like all characters above 32767 ie that are bigger than most big 2 byte ucs.
    also you can make even more safe: allow for first time only all additional latin and cyrillic and other alphabet character ranges, that look clear for you, to be unescaped, then make allowed characters wider and wider with new versions. escaped slugs still will be served by php.

    do you have other unknowns?

    and you should make it optional and warn : use it at your own risk.

    and if problems will appear they will be probably os's utf-8 encoding bug, not supercache's. why you should bother and don't use unicode?

    and i think also wordpress could move to use unescaped strings in post_name field and treat urlencoding like just a way of encoding that should be used only in http headers. even in html they are not needed, unicode in href attribute works, only it is needed there rarely to encode "#", "?", """, "'", space probably works unescaped while it is in quotes.

    maybe i will try to make that again myself, if you do not.

    no, i have not tried latest dev version. i do not think it has become much faster. rewrite rules were 10 times faster, as i said.

  21. qdinar
    Member
    Posted 3 years ago #

    i have found configuration for htaccess to serve files with urlencoded paths with rewrite rules, not running wordpress php files (for wp-super-cache), but it needs custom function as executable file to rewrite, that should be run from rewrite rules, so server admin should make that executable file for wordpress installations, and turn that executable on in apache configuration.
    custom function:

    #!/usr/bin/php
    <?php
    set_time_limit(0);
    $fdin=fopen("php://stdin","r");
    $fdout=fopen("php://stdout","w");
    set_file_buffer($fdout,0);
    while($i=fgets($fdin)){
    	$i2=explode('/',rtrim($i));
    	$i2o=count($i2);
    	for($j=0;$j<$i2o;$j++){
    		$i2[$j]=str_replace('%','%25',strtolower(urlencode($i2[$j])));
    	}
    	$o=implode('/',$i2)."\n";
    	fputs($fdout,$o);
    }
    ?>

    owned by www-data, runnable(executable), /var/www/localhost/test/localhost/aylandirow3.php .
    this file is connected to apache this way:
    RewriteMap aylandirow3 prg:/var/www/localhost/test/localhost/aylandirow3.php
    this rule cannot be in htaccess. after creating new such file or modifying it, restarting apache is needed, as i know.
    custom function probably can be made with c, if made, then indeed no php will be used ) . and i have seen code for perl, how to do this.

    i have modified htaccess this way:
    replace

    RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html.gz -f
    RewriteRule ^(.*) "/wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html.gz" [L]

    with

    RewriteCond %{REQUEST_URI} !^/wp-content/cache/supercache/[^/]+/[^/]+/index\.html\.gz$
    RewriteRule ^(.*) "/wp-content/cache/supercache/%{HTTP_HOST}/${aylandirow3:$1}/index.html.gz" [L]

    and replace

    RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html -f
    RewriteRule ^(.*) "/wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html" [L]

    with

    RewriteCond %{REQUEST_URI} !^/wp-content/cache/supercache/[^/]+/[^/]+/index\.html\.gz$
    RewriteRule ^(.*) "/wp-content/cache/supercache/%{HTTP_HOST}/${aylandirow3:$1}/index.html" [L]

    end of modification.

    i have tested this with ab, and it is little slower than old configuration.
    old configuration:

    Time taken for tests:   0.353 seconds
    Requests per second:    141.45 [#/sec] (mean)
    Time per request:       35.348 [ms] (mean)
    Time per request:       7.070 [ms] (mean, across all concurrent requests)
    Transfer rate:          1843.00 [Kbytes/sec] received
    Percentage of the requests served within a certain time (ms)
      50%     27
    ...
     100%    101 (longest request)

    this configuration:

    Time taken for tests:   0.377 seconds
    Requests per second:    132.59 [#/sec] (mean)
    Time per request:       37.710 [ms] (mean)
    Time per request:       7.542 [ms] (mean, across all concurrent requests)
    Transfer rate:          1727.58 [Kbytes/sec] received
    Percentage of the requests served within a certain time (ms)
      50%     36
    ...
     100%     75 (longest request)
  22. qdinar
    Member
    Posted 3 years ago #

    while i have configured this way real site, main pages have reported that ....index.html.gz is not found, 1-3 hours my main pages has been not working ( .
    i have not found why it happened so, i have thoroughly compared real(production) and test sites.

    but now i have found other configuration that does not cause such problem:
    add:
    RewriteMap aylandirow2 prg:/var/www/localhost/test/localhost/aylandirow2.php
    (near previous RewriteMap)
    aylandirow2.php's content is:

    #!/usr/bin/php
    <?php
    set_time_limit(0);
    $fdin=fopen("php://stdin","r");
    $fdout=fopen("php://stdout","w");
    set_file_buffer($fdout,0);
    while($i=fgets($fdin)){
    	$i2=explode('/',rtrim($i));
    	$i2o=count($i2);
    	for($j=0;$j<$i2o;$j++){
    		$i2[$j]=strtolower(urlencode($i2[$j]));
    	}
    	$o=implode('/',$i2)."\n";
    	fputs($fdout,$o);
    }
    ?>

    and change rules this way:

    ...
    RewriteCond %{HTTP:Accept-Encoding} gzip
    RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{HTTP_HOST}/${aylandirow2:$1}/index.html.gz -f
    RewriteRule ^(.*) "/wp-content/cache/supercache/%{HTTP_HOST}/${aylandirow3:$1}/index.html.gz" [L]
    ...
    RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{HTTP_HOST}/${aylandirow2:$1}/index.html -f
    RewriteRule ^(.*) "/wp-content/cache/supercache/%{HTTP_HOST}/${aylandirow3:$1}/index.html" [L]

    that's all.

    i said that aylandirow2.php and aylandirow3.php owner's are www-data, this is not correct, it does not depend strictly on that, i wanted to say it should be accessible by www-data, and maybe, even if not, apache can use it because it can run and access files temporarily as root.

  23. qdinar
    Member
    Posted 3 years ago #

    this way also does not work ... it shows main page on all pages

    after nearly 15 minutes:

    no, it works.

    i have found bug: rewriteengine on was needed before rewritemap directives...

  24. Donncha O Caoimh
    Member
    Plugin Author

    Posted 3 years ago #

    If you have to call a PHP script like that from a mod_rewrite rule then you might as well use PHP caching. It's executing PHP code after all and the PHP caching code loads very early on.

    Also, there's no way I'm going to include a script that depends on finding a locally installed php binary. It would be a nightmare to auto-configure.

  25. qdinar
    Member
    Posted 3 years ago #

    you can set 2 php scripts in super cache folder and as you have made instruction to paste code into htaccess, same way you can make instruction also: to configure vhost with "allowoverride none", and also this way to serve unicode addresses faster, but only server admin (root) can install this scripts into apache configuration. this way has advantage over the way of saving caching pages with unicode filenames, that here is no problem with special characters ie on non-unicode filesystem and windows.

Topic Closed

This topic has been closed to new replies.

About this Plugin

About this Topic