top of page
  • Writer's pictureBrent Payne

Disallowed URL received organic search traffic

The scenario here is that a certain URL, which is blocked by the robots.txt file, is still receiving traffic from organic search according to data from Google Analytics and Google Search Console linked accounts.


Why is this important?

When a URL is disallowed, it sends a clear command to search engines not to crawl that particular page, and as a result, the expectation is that the URL will not be listed in the search results. However, if search traffic is still directed to the URL, this indicates that the URL has been indexed.


What does the Optimization check?

This Optimization is activated when an internal URL shows activity such as receiving clicks in Search Analytics or visits in Google Analytics, while also being specified in a disallow directive within the robots.txt file.


The involved data was aggregated through APIs from Google Search Console and Google Analytics, in relation to the connected property/view for the chosen date range.


Examples that trigger this Optimization:

Consider the scenario:

seems to have attracted some search traffic.


This URL would activate the Optimization if it aligns with a rule found in robots.txt that specifies 'disallow':

User-agent: *Disallow: /pages/page-a


How do you resolve this issue?

It's crucial to remember that traffic data is based on past performance, while crawling is a present action. It's possible this URL was once crawlable but has recently been disallowed.


Check whether the URL is still in the index by performing a Google search with the exact URL enclosed in quotation marks:


If you find the URL lacking a title or meta description in the search result, it could be due to the recent application of the disallow directive, which stops Google from fetching this information.


This situation underscores a common error where it's assumed that disallowing a URL in robots.txt will remove it from indexing. However, to eliminate a URL from the index, you should employ a 'noindex' directive, or utilize a canonical tag to highlight a replacement URL for indexing.


Only once the URL is cleared from the index is the right time to reintroduce the disallow directive in the robots.txt.


If the disallow was not a recent change, you ought to investigate other reasons Google might index a blocked URL, such as:

  • Is the disallowed URL indicated as a canonical link elsewhere?

  • Does any XML Sitemap reference the disallowed URL?

Further Reading

5 views

Recent Posts

See All

ClubReq Gets A Link (Because They Asked)

I am a mentor for Techstars and have been for over 10 years. In those ten years I have mentioned to startups to ask ANYONE and everyone that they meet to link to their site. Yet, in all those times on

Comments


bottom of page