Hello!
I’ll try to answer your questions simply:
- The robots.txt file will be used, it has a higher priority. The settings will have no effect.
- Google and Yandex don’t need a sitemap. Categories aren’t listed because we need to perform complex actions to get it right; but, both Google and Yandex can figure this out automatically. For more information, please read this article.
I hope this answers your questions! Good luck with your website 🙂
Thank you very much! I’ll clarify:
1. True, I understand that we are only talking about settings that match both the robots.txt file and the plugin settings? For example, if article indexing is enabled in the robots.txt file, and I have disabled indexing in the plugin settings for a specific article, then this article will still be closed to search engines anyway?
2. Do I need a robots.txt file in principle, or can I completely rely on the plugin?
Here is my file:
User-agent: * # общие правила для роботов, кроме Яндекса и Google, т.к. для них правила ниже
Disallow: /cgi-bin # папка на хостинге
Disallow: /? # все параметры запроса на главной
Disallow: /wp- # все файлы WP: /wp-json/, /wp-includes, /wp-content/plugins
Disallow: *?s= # поиск
Disallow: *&s= # поиск
Disallow: /search/ # поиск
Disallow: /author/ # архив автора
Disallow: /users/ # архив авторов
Disallow: */trackback # трекбеки, уведомления в комментариях о появлении открытой ссылки на статью
Disallow: */feed # все фиды
Disallow: */rss # rss фид
Disallow: */embed # все встраивания
Disallow: /xmlrpc.php # файл WordPress API
Disallow: *utm*= # ссылки с utm-метками
Disallow: *openstat= # ссылки с метками openstat
Disallow: /readme.html # закрываем мануал по установке WordPress (лежит в корне)
Allow: */uploads # открываем папку с файлами uploads
User-agent: GoogleBot # правила для Google (комментарии не дублирую)
Disallow: /cgi-bin
Disallow: /?
Disallow: /wp-
Disallow: *?s=
Disallow: *&s=
Disallow: /search/
Disallow: /author/
Disallow: /users/
Disallow: */trackback
Disallow: */feed
Disallow: */rss
Disallow: */embed
Disallow: /xmlrpc.php
Disallow: *utm*=
Disallow: *openstat=
Disallow: /readme.html
Allow: */uploads
Allow: /*/*.js # открываем js-скрипты внутри /wp- (/*/ – для приоритета)
Allow: /*/*.css # открываем css-файлы внутри /wp- (/*/ – для приоритета)
Allow: /wp-*.png # картинки в плагинах, cache папке и т.д.
Allow: /wp-*.jpg # картинки в плагинах, cache папке и т.д.
Allow: /wp-*.jpeg # картинки в плагинах, cache папке и т.д.
Allow: /wp-*.gif # картинки в плагинах, cache папке и т.д.
Allow: /wp-admin/admin-ajax.php # используется плагинами, чтобы не блокировать JS и CSS
User-agent: Yandex # правила для Яндекса (комментарии не дублирую)
Disallow: /cgi-bin
Disallow: /?
Disallow: /wp-
Disallow: *?s=
Disallow: *&s=
Disallow: /search/
Disallow: /author/
Disallow: /users/
Disallow: */trackback
Disallow: */feed
Disallow: */rss
Disallow: */embed
Disallow: /xmlrpc.php
Disallow: /readme.html
Allow: */uploads
Allow: /*/*.js
Allow: /*/*.css
Allow: /wp-*.png
Allow: /wp-*.jpg
Allow: /wp-*.jpeg
Allow: /wp-*.gif
Allow: /wp-admin/admin-ajax.php
Clean-Param: utm_source&utm_medium&utm_campaign # Яндекс рекомендует не закрывать от индексирования, а удалять параметры меток
Clean-Param: openstat # аналогично
For example, if I closed the “search” indexing in the plugin settings, then will the plugin execute these commands?
Disallow: *?s=
Disallow: *&s=
Disallow: /search/
-
This reply was modified 3 years, 9 months ago by
Korovke.
Hello again!
Again, simple answers:
- Yes, I only talked about the
robots.txt
settings, not other “robots.” And yes, search engines always listen to blocking directives first (like noindex and disallow), from both “robots” or robots.txt
.
- You can rely on the plugin. Via “The SEO Framework,” we direct search engines via the Canonical URL tag and Robots meta tag and
X-Robots-Tag
HTTP headers. And yes, the plugin blocks (robots noindex) or redirects (canonical URL) those search queries when that option is enabled.
There’s nothing in the robots.txt
file you shared that isn’t blocked by “The SEO Framework.” However, you may want to consider blocking xmlrpc.php
via .htaccess
for both visitors and search engines instead.
Here’s an example of the recommended robots.txt
output: https://theseoframework.com/robots.txt
I hope this helps! 🙂
Hello! Yandex indexes the following pages:
1. /xmlrpc.php?rsd
2. /xmlrpc.php
3. /wp-json/
4. /wp-includes/wlwmanifest.xml
5. /feed/
6. /comments/feed/
7. A lot of pages like /wp-json/oembed/1.0
How to remove all this? I thought the plugin could handle all this.
Hello!
Low-quality content gets pushed down the search engine automatically because no one is looking for them. Even if visitors click on such a link in a search engine, they are likely to bounce back, and this hints search engines to push down the content even further. Even the robots.txt
file may appear in the search engine, but no one goes looking for it.
Yandex knows this, and I doubt you’ll get many, if any, clicks on the endpoints you listed.
Google has implemented something called “intent.” If there’s no intent for content to be found, Google will hide the content from their index unless there’s a direct site-query.
Just in case, let’s go over them:
For /xmlrpc.php
:
We should add a noindex header here. Thanks for notifying us! I’ve opened an issue for this: https://github.com/sybrew/the-seo-framework/issues/465.
For /feed
and /comments/feed
:
From TSF v4.0 (release today), the feeds will have X-Robots-Tag: noindex
header. For more information on this header, see https://yandex.ru/support/webmaster/controlling-robot/meta-robots.html.
For /wp-json
:
WordPress adds the noindex header already.
For static files, like wp-includes/wlwmanifest.xml
:
We have no control over these; neither does WordPress. I doubt these will rank in search engines, however. If you need search engines to stop accidentally indexing some of these files, add a wildcard entry to your robots.txt file, like /wp-includes/*.xml
.