Purging the FVM Cache and Interaction with AWS CloudFront CDN
-
Hello,
First, my compliments on this plugin. I’ve been very impressed by its design. It has the perfect mix of features to give granular control to the admin. I’m a developer and am using this for a new social media site which will be out in a few months. I’m using this for the css functionality at the moment, as getting this working with our js will take more time.
I have an important question for you regarding the mechanics of the cache and how it will interact in a CDN environment. Our infrastructure is setup at AWS, and we use CloudFront for our CDN. Whenever I am doing a code deployment I make sure to invalidate our CDN cache as the final step.
Now that I am using this plugin it adds another caching layer which must be purged during deployments. The question I have pertains to how the cache itself is created. In your documentation you say that the cache is created on the front-end, and that the plugin intercepts the files and then processes them, and then adds the new files to the cache. I do see how the cache itself is created as soon as a page is loaded on the site.
What I am concerned about here is a feedback loop that prevents me from updating the cache files in the plugin. When I purge the FVM cache I would expect that it fetches the original css files from the sever itself, and not from the end-user’s browser. So long as it is getting the files from the sever then it will pick up all file changes from my deployment. If however it was somehow getting the files from the end-user’s browser, then this could prevent the cache from updating as the person’s browser would be getting the files from our CDN. Even if I purge our CDN first, as soon as someone loads the site it would repopulate the CDN possibly before I could purge the FVM cache.
So long as FVM is getting the original files from the sever, and not the person’s browser, then there shouldn’t be any issue. I can simply deploy my new files, purge the FVM cache, then FVM creates the new cache from the newly deployed files, and then I invalidate my CDN cache, and finally the CDN caches the new files from the FVM cache. This is how I am hoping it is working. I need you to confirm this for me.
This can all get a little confusing with regards to two caches and the order in which files are generated and cached. Hopefully this is something you have already thought of and engineered the plugin for.
At this moment our site is not public, and so there are not people loading the pages and thus generating the FVM cache. But once we go live there will be pages being accessed constantly. I need to make sure that I can deploy our code and that new css is properly cached at all levels and served. Hopefully this all makes sense to you.
I look forward to hearing your answer…
All my best,
~ Michael
-
Hi Michael, some interesting points here.
The cache is created server side per “set of requirements” on each url.
It doesn’t matter which browser the user has or if someone is calling the url via CURL or some bot.First, I advise you to disable FVM during development.
Speed optimization is something you do, after deployment (or right before).How FVM works is simple:
a) Whenever anyone requests an url, the plugin hooks into the header and footer scripts, makes a list of what is going to be loaded on both header and footer and gives that “collection of files” an unique hash.
b) If your plugins and theme, always enqueue the same css and js files on all posts, that means all posts will share the same hash and therefore, the FVM cached files will be reused for all pages that require the same exact set of js and css files.
c) If for example, your home or a search page, or a category page, or even a custom post type includes other css or js files, that would be considered another hash, therefore FVM will generate another unique cache for that specific page.
d) Whenever you change the names or add/remove any css/js files, it will trigger a different hash, hence new files (and urls) will be generated.
e) Once FVM calculates which files to use, it minifies and merges them into a static file, dequeues the default ones and enqueues the new generated file.
f) The generated file has 2 components, an hash (unique string of the current page css and js enqueued urls) and a time, which is zero by default.
g) When you purge FVM caches, the time string changes and all previous caches by FVM will be purged. That means, FVM will reprocess the CSS and JS files again and even if the requirements are the same (no changes on css or js enqueued files), the generated cache file will have a different url.
h) When you hit purge on FVM, you always guarantee that the generated files will be fetched again, minified and merged from scratch AND the resulting enqueued files, will have a different url that is not yet cached on any CDN. For that reason, whenever you purge FVM, you do not need to worry about purging the CDN for the FVM generated files, since a new url would not be cached on the cdn and therefore trigger an origin request.
i) All this occurs server side, before any HTML is generated and sent to the browser. Browser is irrelevant.
—
Caveats:
FVM reads the files either directly from the disk (by absolute file path) OR if it fails (whatever reason), it will try to fetch the files by URL.
If you are behind a reverse proxy such as cloudflare or google cloud cdn… and FVM needs to fetch a css or JS file that couldn’t read from the disk, it will request the url of that file, which is probably going to be cached on the CDN.
This is however, similar behaviour as if you had no FVM.
If you do changes on a css or js file and that file is cached on the cdn, you won’t see those changes on your browser. The same goes for FVM when it fails to read the files locally… it will act like a browser and try to download the file, meaning that the cdn should be purged before purging FVM.
You can see on the status page the logs of the generated files.
It will tell you exactly, if it opened the file locally or remotely, for each file that was merged.If you don’t see anything on the log saying it was remote, you are safe to ignore cdn purging, but if you see any css or js file being fetched from the url, you should purge CDN first, followed by FVM cache instead.
Hi Raul,
Thank you very much for this detailed response. It has answered many of my questions. In fact, I think your answer here is so good you should make it part of your documentation! 😉
I’ve ready everything you wrote carefully and have a few remaining comments and questions...
1. Regarding having FVM enabled during development: I actually have a separate development and production environment. I have FVM enabled in both because it is important to run everything as if it were live. Otherwise we have no way of knowing if FVM is working properly and how it is affecting our systems. Our development environment dubs as a staging environment, so it must be absolutely identical to production, including having all plugins active that are active in production. We do our real development in local environments, and for that I simply disable all file processing in FVM so I can easily work on the css.
2. You have clearly answered my question about how the files are accessed prior to processing by the plugin. I checked the logs like you suggested to see if the files were being fetched locally or via URL. Luckily every single file path I see is an absolute path, and not a URL. So I assume this means all files are being fetched directly from the server. As you pointed out this means a CDN purge is technically not necessary since the new cache files will have unique URLs. Obviously all my css changes would also make it into the cache since the files are accessed directly from disk. FYI, I still need to purge our CDN after every deployment regardless. I’ll purge FVM before the CDN so the older cache files are also purged from the CDN (I know they wouldn’t be used either way).
3. While I was looking at the cache logs I noticed something odd. I’m seeing the same list of css files being cached multiple times, in separate cache files. It is the identical list of files. I thought if a page used the identical files that it would not create a duplicate cache file. You talked above this in #b of your answer. Do you think I have a potential problem here? I should mention that our site is enormous (a new social media platform), and it would be extremely bad if every page ended up having it’s own cache. My question below (#4) may be related to what is going on.
4. We’re running a load-balanced environment and I just want to make sure this won’t adversely affect FVM. I knew up-front that the cache itself had to be centralized between all servers (stateless), and have taken care of that. All the servers read/write the cache to a single location. But of course each server is running its own copy of FVM. I’m just wondering about how/when FVM generates that cache. If multiple users are accessing different severs simultaneously all running FVM, and it is writing the cache to the same location, then is it possible this could cause FVM to duplicate cache files? Is this perhaps the reason why I am seeing what looks like duplicate caches?
Just adding this thought: I can obviously monitor the cache here in our development site. It will become evident as I browse the site more (between deployments) whether or not the cache files are truly being unnecessarily duplicated.
5. Some of our css files are already minified. These are files that are using the proper min.css extension. Most of these files belong to plugins and scripts that had the css minified already (it’s not minified by another script or plugin). I haven’t noticed this causing any issues, but wanted to ask you about it. Could this pose a potential problem?
…..
I think that covers everything. I’m glad you’re the kind of developer who is willing to answer technical questions like this. I chose your plugin originally because I could tell it was designed to operate without interfering with existing systems. Frankly I am amazed by how well it is working already. Of course I still need to get this working with the JS, but that is going to be more challenging. I’ll experiment with that soon enough.
~ Michael
-
This reply was modified 6 years, 6 months ago by
Michael Samson.
Hi Raul,
Please read my above post first…
I just discovered that something as simple as loading different member profile pages is producing duplicate sets of cache files. We use BuddyPress, and so I visited some different member profiles. I noticed that the identical set of cache files were being produced for the separate profiles even though they share the same css. If the plugin is creating duplicate file sets like this we’re in big trouble. Imagine what would happen when we have millions of users.
This is definitely going to have to be looked at more closely. I’m now thinking the duplicate cache files I saw earlier were from visiting different member profiles on the site. Do you have any idea why this is happening?
~ Michael
Hi Michael, You raise some other good points here!
Will try to reply as possible.1) I would still disable FVM during development (css and js changes), then simply activate the plugin again and purge caches. All settings are saved on the database wp_options, between activating and deactivating the plugin.
2) Purge the CDN first, then FVM after that. If you need to do this programmatically, you can use wp-cli,
wp fvm purge
3) second post)
If every member page is creating its own css cache file, you must have some dynamic css going on somewhere and you need to exclude it via the ignore list (or make it static).
You need to audit what is really going on with your css code when FVM is disabled.a) Disable FVM
b) Open the profile page on your browser
c) Inspect your inlined css code, copy paste it to a txt file, take note of the style id or class.
d) Take note of the urls on your css files, including query strings.e) Refresh the browser a couple of times and repeat the process on the same exact profile. Are there any changes on your css code or css files urls?
For example, some themes enqueue php files instead of static css files. They include a query string that changes on every page load, in order to bypass cache and cdn.
If you see a css file that doesn’t end in
.css
or.css?something
chances are it’s PHP (this is bad practice, you should look into other themes or modify it to include a static file instead).On some themes or plugins, the inlined css code change per page too.
FVM also merges inlined css code that is enqueued with the parent css filewp_add_inline_style
, so if that code changes on each pageview or per profile, that makes it unique.From the top of my head, FVM attributes an unique hash to all inline css that it finds. On the log file, similar to the file path or url, there should be a small hash code when its processing some inline code.
If you empty the cache and refresh the same profile page over and over, you should be able to see if each pageview is unique or not (there would be a new cache for every pageview if it is).
If refreshing the same profile doesn’t create new cache files, and if there is no cache plugin going on that prevents it (such as w3 total cache), then you know that each pageview on the same page, doesn’t create a new cache file.
If visiting another profile creates another cache file, then you can do this:
– Deactivate FVM
– Open each profile on your browser and compare for any minimal changes, specifically on inlined css code, or css urls.
– I’m pretty certain that there will be a difference somewhere… so now you do this for a few more profiles and check if there are differences on every single profile or if there are differences on those profiles, simply because they have different info.You can then add whatever css url to the ignore list.
If the cod changing is an inlined css code (different hash on the logs), you add to the ignore list, the css file that it’s immediately before that.Sometimes plugins include a css file, and then add some inline css code that depends on its handle. By ignoring the parent file, the inlined code will also be ignored.
Is your staging available on the web so I can take a quick look?
—
4) Load balancing will work fine with FVM, however you should consider if you really need LB. Most of the time, having a single c5 instance and upscale it, it’s better and easier.
For example, I have clients pushing 5 Million pageviews a day on WordPress, running on a single c5.4xlarge instance on AWS. All requests have ajax, uncached requests too.
If you must do LB for whatever reason, then make sure you use EFS as a mount point for all machines, so your code stays consistent across instances. Database on Aurora or RDS too.
Here is the whitepaper from AWS for best practices.
https://d1.awsstatic.com/whitepapers/wordpress-best-practices-on-aws.pdf
Note however, EFS will be slower than having a single server… and it will probably be more expensive. But you are trading speed for consistency, and adding high availability (google for CAP theorem for some info).
If you use EFS and RDS/Aurora, FVM will share the same disk space for all machines, regardless of which machine is serving the request via the LB. Behaviour is similar to a single server, just a bit slower due to EFS.
If you have separate installs in different disks, and are uploading your assets to s3 for example… you are still probably using Aurora or RDS. That being the case, FVM will create a cache file for each server whenever needed or requested by the load balancer.
You won’t see duplicate files on the logs on FVM, because what you are seeing is the reply of a single server via the Load balancer (you have sticky sessions enabled, right?)
https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-sticky-sessions.htmlIf you are synchronizing files on demand between servers via some script, you risk consistency. Use EFS instead, or perhaps glusterfs as a common mount point.
Finally, if you have multiple servers and are lacking the sticky sessions, try to setup an hosts file on your system, so it calls a single server. The wp-admin area status page, makes an ajax request every few seconds, so if there is no sticky sessions, you may be requesting that list of existing files from multiple servers… hence you could see duplicates.
There should be no way that 2 files with the same exact name exist on the same server and same path. Either the hash is different (contents must differ somewhere) or your requesting updates from multiple servers, ie round robin or some other system.
Without a common mount point such as EFS, you may not be able to use any minification plugin that relies on static files, but you may still be able to use some other plugin that generates css using PHP code instead (but will be slow).
Hi Raul,
Wow, now that is a response! Thank you again.
I want to respond in detail, but that will have to wait until tomorrow as I am deep in the middle of another project here.
I’m glad I discovered these issue here today. At least now I know about them and can figure out what is going on. I did quickly take care of excluding our dynamic css files. Luckily I know our css files very well, and there were in fact two dynamic files. This is my fault as I should have excluded them earlier. I’m actually planning to eliminate the dynamic css entirely in the future and hard-code it. It’s on my to-do list. Both these dynamic css files are on the EFS.
So I excluded them but unfortunately it did not solve the problem. There is obviously something else going on here that makes the css unique. I looked at the entire list of css files, and everything left is hard-coded. I also don’t see any hash codes that would indicate inline css being cached. I wonder what is causing this.
Btw, about the EFS… you are right about how slow it is. Our old stack used to have the entire application on the EFS and we had to stop using it because it was choking the servers due to slow response time (even if we maintained burst credits). Our new stack is using Docker and ECS, and our entire application runs on the load balanced servers, with the exception of a few files that must be on the EFS to be stateless. You got me wondering about the idea of the EFS being slow and if we’re really benefiting from using FVM. Then again, once the cached files make it onto the CDN they are served from there and not directly from the EFS, so perhaps this is still an improvement in efficiency.
In any case I’ll look into this all further to determine the cause. Thank you for your offer to look at the site, but it is not yet public. I really can’t let anyone in there, even in the development site. I’m confident that with your feedback I’ll figure this out.
I’ll be in touch again tomorrow…
~ Michael
Glad it helped somehow.
If you are seeing 2 files being generated by FVM, but with different names… open them both on your editor and do a diff comparison, or use an online tool to check differences:
https://www.diffchecker.com/
Yes, EFS is great for consistency, but it’s slow for uncached requests.
You only get a baseline rate of 50 KB/s per GB of data in the disk.If you use a page cache plugin, do not use any disk mode… go for Amazon ElastiCache instead.
But if speed is your main concern and you trust AWS availability… (ie, if you can afford downtime when aws goes down) go for single instance, (c5 type only) and scrape all the rest of the structure. Thank me later when you see your aws bill going down and how much faster everything is.
Hi Raul,
Are you suggesting to download the cache zip files and then compare the two sets to see if there are any differences? That’s an interesting suggestion. It would pin-point any differences. I actually have diffchecker installed, but can also use notepadd++ since we’re not talking about that many files (it has a difference checker as well).
The EFS is honestly a nightmare for speed. It caused us so many problems with our earlier stack, and also cost a lot of time to figure out it was the EFS causing it all. The new Docker/ECS stack we’re using is amazing, and I spent a lot of time and money to have it built. We use CloudFormation here for everything, which I highly recommend for creating reproducible infrastructure.
We can’t use page caching because the entire site is dynamic (we’re a new social media platform). I mean, technically it is possible, but it would be very difficult to implement.
Regarding speed, yes, it is a concern which is why I installed FVM to help streamline our css at least. But we also need absolute uptime reliability, which is why we are load balancing. There’s also the fact we need to be able to scale fast. The good thing is that we can control everything at every level. We can use larger instance sizes if we want, control how many cluster nodes (container instances) we have, and also more granular control at the container level. I’ve spent two years here engineering our stack so that we could scale quickly and reliably.
It is interesting to hear you say how amazing the C5 instance type is. Perhaps once we launch (and have a larger budget) I will consider using those. It can still be a load balanced environment using containers, and also use the C5 type at the cluster node level. I can only imagine how expensive that instance size must be. Right now everything is using t3.small instances, but this is in a pre-launch state so it doesn’t matter.
Anyway, I said I wouldn’t start talking about all this and I did anyway. I’m definitely interested in anything we can do to speed up the site. Our site is media heavy and script heavy, and I’m always looking for ways to improve it’s speed.
Ok, I’ll talk with you more tomorrow after I’ve had time to study those cache files. Thank you so much for helping me here!
~ Michael
-
This reply was modified 6 years, 6 months ago by
Michael Samson.
-
This reply was modified 6 years, 6 months ago by
- The topic ‘Purging the FVM Cache and Interaction with AWS CloudFront CDN’ is closed to new replies.