Big increase in postmeta size
-
I noticed a big increase in postmeta size, about 50Mb for a database of 140Mb total running standard mode. Forgeting checked urls in AMP dashboard reduces postmeta size to normal again. Is it secure to forget urls?
-
Yes, feel free to forget validated URLs. They are there for you debug validation issues, but if there are no issues that you are needing to deal with, then you don’t need to keep the validated results.
The reason for the large postmeta is to populate the Stylesheets metabox on that screen. This requires storing the parsed representation of the stylesheets.
That being said, I can see there is an opportunity to de-duplicate this information across multiple URLs that have the same stylesheets. I’ll file an issue for that.
Also, we should probably garbage-collect any validated URLs that don’t have any unreviewed validation errors after a week since they were last checked. So this would be an auto-forget capability which is currently lacking. I’ll also file an issue for that.
Ok, thanks Weston. I also noticed a big increase in bandwidth, may that be related to Google crawling and storing AMP in cache? We have about 9,000 articles. Bandwidth increase about 3Gb per Day since we activate Standard AMP Mode on May 7th
Humm. Can you look at server logs to determine the user agents responsible for the bandwidth? It could be Googlebot crawling the AMP articles, but I’m not sure. The only other thing I can think of is that when saving posts/pages in WordPress, a loopback request is performed to obtain the validation results from the post/page being rendered on the frontend of the site. But I find it hard to believe this could result in 3Gb of data per day! These validation requests just fetch the HTML of the page and not any images.
Obtaining the user agents from the access logs would be the best way to determine the source of the increased traffic.
I’ve filed an issue for garbage-collecting the
amp_validated_urlposts: https://github.com/ampproject/amp-wp/issues/4779Hi Weston, I’m using Cloudflare free tier and so my logs show a lot of Cloudflare acceses. But I really don’t know how to check from where they really comes.
Maybe this can explain the High bandwidth? https://www.dropbox.com/s/002sqd24i5hcifb/Captura%20de%20pantalla%202020-05-26%20a%20las%2011.09.14.jpg?dl=0
-
This reply was modified 5 years, 11 months ago by
Guillermo Carvajal.
Ah yes, it looks like you’ve identified the traffic as being from Googlebot.
-
This reply was modified 5 years, 11 months ago by
The topic ‘Big increase in postmeta size’ is closed to new replies.