When a URL that is blocked by the robots.txt file appears in an XML Sitemap, it poses a conflict for search engines.
Why is this important?
Your XML Sitemap should only include URLs that you want search engines to crawl and index. If you disallow a URL, you're telling search engines not to access it, and thus not to index its contents.
Consequently, placing disallowed URLs in your sitemap sends mixed signals to search engines, which could result in these pages appearing in search results, incorrectly:
What does the Optimization check?
This Optimization will be activated if any internal URL that is disallowed by the robots.txt file is discovered in the XML Sitemap.
Examples that trigger this Optimization:
Suppose you have the URL: https://example.com/pages/page-a in an XML Sitemap submitted to search engines.
This URL would cause the Optimization to trigger if it was found to match a disallow directive in the robots.txt file, such as:
User-agent: *Disallow: /pages/page-a
How do you resolve this issue?
The inclusion of disallowed URLs in your sitemap is not correct practice and can lead to confusion in indexation. To correct this, do either:
Remove the disallowed URL from all XML Sitemaps, then resubmit the updated sitemap to search engines via tools like Google Search Console.
If the URL should no longer be disallowed, amend or eliminate the relevant disallow directive in the robots.txt file.