Support » Plugin: The SEO Framework – Fast, Automated, Effortless. » Plugin and robots.txt file. And the archive in the sitemap

  • Resolved Korovke

    (@korovke)


    Dear author! My English is terrible, google translator to help me. I ask you to try to hear me and answer in plain language.
    1. If I install and configure your plugin, and at the same time upload my robots.txt file to the root folder of the site, what will be the first priority: plugin settings or robots.txt file?
    For example. In the plugin settings, I closed the indexing to media files, and in the robots.txt file, indexing for media files is open. In the end, for search engines, media files will be closed or open?

    2. In the plugin settings, I opened the indexing for ‘Categories’. Why are category links not showing up on the sitemap (sitemap.xml)?

    Thanks in advance for your answers.

    • This topic was modified 3 years, 9 months ago by Korovke.
Viewing 5 replies - 1 through 5 (of 5 total)
  • Plugin Author Sybre Waaijer

    (@cybr)

    Hello!

    I’ll try to answer your questions simply:

    1. The robots.txt file will be used, it has a higher priority. The settings will have no effect.
    2. Google and Yandex don’t need a sitemap. Categories aren’t listed because we need to perform complex actions to get it right; but, both Google and Yandex can figure this out automatically. For more information, please read this article.

    I hope this answers your questions! Good luck with your website 🙂

    Thread Starter Korovke

    (@korovke)

    Thank you very much! I’ll clarify:

    1. True, I understand that we are only talking about settings that match both the robots.txt file and the plugin settings? For example, if article indexing is enabled in the robots.txt file, and I have disabled indexing in the plugin settings for a specific article, then this article will still be closed to search engines anyway?

    2. Do I need a robots.txt file in principle, or can I completely rely on the plugin?
    Here is my file:

    User-agent: * # общие правила для роботов, кроме Яндекса и Google, т.к. для них правила ниже
    Disallow: /cgi-bin # папка на хостинге
    Disallow: /? # все параметры запроса на главной
    Disallow: /wp- # все файлы WP: /wp-json/, /wp-includes, /wp-content/plugins
    Disallow: *?s= # поиск
    Disallow: *&s= # поиск
    Disallow: /search/ # поиск
    Disallow: /author/ # архив автора
    Disallow: /users/ # архив авторов
    Disallow: */trackback # трекбеки, уведомления в комментариях о появлении открытой ссылки на статью
    Disallow: */feed # все фиды
    Disallow: */rss # rss фид
    Disallow: */embed # все встраивания
    Disallow: /xmlrpc.php # файл WordPress API
    Disallow: *utm*= # ссылки с utm-метками
    Disallow: *openstat= # ссылки с метками openstat
    Disallow: /readme.html # закрываем мануал по установке WordPress (лежит в корне)
    Allow: */uploads # открываем папку с файлами uploads

    User-agent: GoogleBot # правила для Google (комментарии не дублирую)
    Disallow: /cgi-bin
    Disallow: /?
    Disallow: /wp-
    Disallow: *?s=
    Disallow: *&s=
    Disallow: /search/
    Disallow: /author/
    Disallow: /users/
    Disallow: */trackback
    Disallow: */feed
    Disallow: */rss
    Disallow: */embed
    Disallow: /xmlrpc.php
    Disallow: *utm*=
    Disallow: *openstat=
    Disallow: /readme.html
    Allow: */uploads
    Allow: /*/*.js # открываем js-скрипты внутри /wp- (/*/ – для приоритета)
    Allow: /*/*.css # открываем css-файлы внутри /wp- (/*/ – для приоритета)
    Allow: /wp-*.png # картинки в плагинах, cache папке и т.д.
    Allow: /wp-*.jpg # картинки в плагинах, cache папке и т.д.
    Allow: /wp-*.jpeg # картинки в плагинах, cache папке и т.д.
    Allow: /wp-*.gif # картинки в плагинах, cache папке и т.д.
    Allow: /wp-admin/admin-ajax.php # используется плагинами, чтобы не блокировать JS и CSS

    User-agent: Yandex # правила для Яндекса (комментарии не дублирую)
    Disallow: /cgi-bin
    Disallow: /?
    Disallow: /wp-
    Disallow: *?s=
    Disallow: *&s=
    Disallow: /search/
    Disallow: /author/
    Disallow: /users/
    Disallow: */trackback
    Disallow: */feed
    Disallow: */rss
    Disallow: */embed
    Disallow: /xmlrpc.php
    Disallow: /readme.html
    Allow: */uploads
    Allow: /*/*.js
    Allow: /*/*.css
    Allow: /wp-*.png
    Allow: /wp-*.jpg
    Allow: /wp-*.jpeg
    Allow: /wp-*.gif
    Allow: /wp-admin/admin-ajax.php
    Clean-Param: utm_source&utm_medium&utm_campaign # Яндекс рекомендует не закрывать от индексирования, а удалять параметры меток
    Clean-Param: openstat # аналогично

    For example, if I closed the “search” indexing in the plugin settings, then will the plugin execute these commands?
    Disallow: *?s=
    Disallow: *&s=
    Disallow: /search/

    • This reply was modified 3 years, 9 months ago by Korovke.
    Plugin Author Sybre Waaijer

    (@cybr)

    Hello again!

    Again, simple answers:

    1. Yes, I only talked about the robots.txt settings, not other “robots.” And yes, search engines always listen to blocking directives first (like noindex and disallow), from both “robots” or robots.txt.
    2. You can rely on the plugin. Via “The SEO Framework,” we direct search engines via the Canonical URL tag and Robots meta tag and X-Robots-Tag HTTP headers. And yes, the plugin blocks (robots noindex) or redirects (canonical URL) those search queries when that option is enabled.

    There’s nothing in the robots.txt file you shared that isn’t blocked by “The SEO Framework.” However, you may want to consider blocking xmlrpc.php via .htaccess for both visitors and search engines instead.

    Here’s an example of the recommended robots.txt output: https://theseoframework.com/robots.txt

    I hope this helps! 🙂

    Thread Starter Korovke

    (@korovke)

    Hello! Yandex indexes the following pages:
    1. /xmlrpc.php?rsd
    2. /xmlrpc.php
    3. /wp-json/
    4. /wp-includes/wlwmanifest.xml
    5. /feed/
    6. /comments/feed/
    7. A lot of pages like /wp-json/oembed/1.0

    How to remove all this? I thought the plugin could handle all this.

    Plugin Author Sybre Waaijer

    (@cybr)

    Hello!

    Low-quality content gets pushed down the search engine automatically because no one is looking for them. Even if visitors click on such a link in a search engine, they are likely to bounce back, and this hints search engines to push down the content even further. Even the robots.txt file may appear in the search engine, but no one goes looking for it.

    Yandex knows this, and I doubt you’ll get many, if any, clicks on the endpoints you listed.

    Google has implemented something called “intent.” If there’s no intent for content to be found, Google will hide the content from their index unless there’s a direct site-query.

    Just in case, let’s go over them:

    For /xmlrpc.php:
    We should add a noindex header here. Thanks for notifying us! I’ve opened an issue for this: https://github.com/sybrew/the-seo-framework/issues/465.

    For /feed and /comments/feed:
    From TSF v4.0 (release today), the feeds will have X-Robots-Tag: noindex header. For more information on this header, see https://yandex.ru/support/webmaster/controlling-robot/meta-robots.html.

    For /wp-json:
    WordPress adds the noindex header already.

    For static files, like wp-includes/wlwmanifest.xml:
    We have no control over these; neither does WordPress. I doubt these will rank in search engines, however. If you need search engines to stop accidentally indexing some of these files, add a wildcard entry to your robots.txt file, like /wp-includes/*.xml.

Viewing 5 replies - 1 through 5 (of 5 total)
  • The topic ‘Plugin and robots.txt file. And the archive in the sitemap’ is closed to new replies.