An AMP page URL should not include a canonical link to a URL that is not permitted. This is a common issue that needs resolution.
Why is this important?
A canonical link in the AMP HTML document should reference the standard, non-AMP version of a webpage or itself if no such equivalent exists. Incorrectly pointing to a URL barred by the robots.txt file is a misconfiguration.
Canonical links are crucial for the validity of AMP pages, guiding search engines to the original non-AMP page. If this associated page is blocked, search engines struggle to index it, negatively impacting the AMP page's visibility and causing confusion for search engine algorithms.
What does the Optimization check?
The Optimization activates when an AMP page's canonical tag references a URL that the robots.txt file has explicitly disallowed access to.
Examples that trigger this Optimization
Consider an AMP Page URL such as https://example.com/amp/page-a/
This would trigger the Optimization if it had a canonical link:
<!doctype html><html amp><head> <meta charset="utf-8"> <title>Example Title</title> <link rel="canonical" href="https://example.com/pages/page-a/" /> ...</head>...</html>
Assume this canonical URL is disallowed by a directive in the robots.txt file.
How do you resolve this issue?
If the canonical is pointing to a correct but disallowed URL, update the robots.txt to allow search engine access.
If the canonical points to an incorrect URL, correct the AMP page's canonical link to point to the right non-AMP page, which should also self-reference:
For the URL https://example.com/pages/page-a/—the page determines a self-referencing canonical and links to the AMP version:
<link rel="amphtml" href="https://example.com/amp/page-a/"><link rel="canonical" href="https://example.com/pages/page-a/" />
Then, the corresponding AMP page should possess a canonical pointing back to https://example.com/pages/page-a/
<link rel="canonical" href="https://example.com/pages/page-a/" />