Crawling of wrong URL problem

#800250
  • Resolved Anonymous
    Rank Math free

    Hello, since february i got massive error on my GSC popping for all my website (more than 5000 url on few website)..

    “Other page with correct canonical tag”

    Here is an example of an URL with this problem : https://www.laboutiqueducocktail.com/accessoire-cocktail/shaker-cocktail/shaker-cocktail-original/?attribute_modele=Caution

    Question :

    Why is it crawled by the bot and how does the bot find this URL?

    Problem :

    It is spending all my budget crawl into wrong URL that shouldnt be crawled… And new website struggle very hard to get the first visitor.

    I tried to disallow Attribute from Robots.txt but it block the google shopping bot from it also, and so, my ads automatically put off.

    Thanks in advance for your help !

Viewing 11 replies - 1 through 11 (of 11 total)
  • Hello,

    I’m sorry to hear about the crawling issues you are facing with Google Search Console. To address the problem of Googlebot crawling the wrong URL, it seems like there might be a crawl budget issue affecting your website.

    One possible reason for Googlebot crawling unwanted URLs could be due to the lack of proper internal linking structure on your website. If certain pages like the one you mentioned have no internal links pointing to them, they might be considered orphan pages and still get crawled by Google.

    To prevent Google from spending crawl budget on irrelevant URLs, you should ensure that all important pages have proper internal links pointing to them. This will help Google prioritize crawling the right pages on your site.

    Regarding the issue with disallowing attributes in the robots.txt file, if blocking certain URLs impacts your Google Shopping ads, you may need to find an alternative solution. It’s crucial to balance between preventing Google from crawling irrelevant URLs and ensuring your ads remain active.

    If you need further assistance or have more questions, please feel free to reply here, and a human agent from our support team will assist you promptly.

    Thank you for reaching out!

    Sources:
    Common robots.txt Issues & How to Fix Them » Rank Math
    Discovered – Currently Not Indexed in Google Search Console » Rank Math
    Why Did My Website Disappear From Google Search? » Rank Math

    Anonymous
    Rank Math free

    Yes i still need human support, thanks !

    Hello,

    The issue of “Alternative page with proper canonical tag” errors typically arises when multiple URLs lead to similar or duplicate content, and Google will simply exclude the duplicate version and index the main version of the page.

    This means that Google recognises these canonicalized URLs correctly, and there is nothing you need to do on your part: https://rankmath.com/kb/alternate-page-with-proper-canonical-tag/#num-1-what-does-alternate-page-with-proper-canonical-tag-mean

    Many of the given URLs have parameters like ?attribute_modele=Caution.

    To disallow URLs with the parameter ?attribute_modele=Caution in your robots.txt file, you can add the following line :

    
    Disallow: /*?attribute_modele=Caution
    

    Here’s a guide on how to edit your robots.txt using Rank Math:
    https://rankmath.com/kb/how-to-edit-robots-txt-with-rank-math/

    Moreover, to figure out how Google is discovering those URLs, you should use the URL inspection tool of your GSC account to check the URLs and check the referring page where are they coming from.

    We hope that helps, and please don’t hesitate to get in touch if you have any other questions.

    Thank you.

    Anonymous
    Rank Math free

    Not very satisfied with your answer, did you take a least, 5 minutes, to read me?

    – We can’t exclude from robots.TXT because it block google ADS Bots and cut ads OFF
    – Is it a real problem since it take time from crawling bot staying on wrong URL !

    Hello,

    You can define the user-agent to make sure only Googlebot is affected by the rule, not Adsbot.

    Here is the rule you can use:

    User-agent: Googlebot
    Disallow: *?attribute_modele=*

    If Google keeps wasting the crawl budget on incorrect URLs, then it can cause issues with the crawling of the correct URLs. You should use the robots.txt rule to prevent that.

    Hope that helps and please do not hesitate to let us know if you need our assistance with anything else.

    Anonymous
    Rank Math free

    Google Bot and Ads bot are the same

    Anonymous
    Rank Math free

    Also what your solution for the problem tristan from ingenius is reporting also :

    look at this url : https://www.satan-shop.com/veste-gothique-homme-victorienne-noire/

    Canonical : https://www.satan-shop.com/vetements-gothique/veste-gothique/veste-gothique-homme-victorienne-noire/

    How does the bot find the first url and why does she exist ?

    Hello,

    The URLs mentioned by Tristan are the product URLs without the category slug. This is the default behavior of WordPress. When you use the category slug, the URLs can also be accessed without them. Our plugin adds the correct canonical URL to prevent Google from indexing duplicate URLs.

    Your URLs are variations of the products. These URLs are included in the Schema, as Google requires them for the Merchant Listing snippets. Google may have found them from the Schema. You can also use the URL inspection tool of your GSC account to confirm the referring page. https://search.google.com/test/rich-results/result/r%2Fmerchant-listings?id=qDtkMD31ET8ZDhovnc8bBg

    Googlebot and Adsbot are different. However, if you don’t want to use the robots.txt solution, you should ignore the errors mentioned by Google.

    Hope that helps and please do not hesitate to let us know if you need our assistance with anything else.

    Anonymous
    Rank Math free

    Why should i ignore ? It is ruining my crawling budget

    Anonymous
    Rank Math free

    User-agent: *
    Disallow: /wp-admin/
    Disallow: /wp-admin
    Disallow: /panier/
    Disallow: /contact
    Disallow: /contact/
    Disallow: /suivi-commande/
    Disallow: /livraison/
    Disallow: /politique-de-remboursement/
    Disallow: /support-client/
    Disallow: /mentions-legales/
    Disallow: /mon-compte/
    Disallow: /contact/
    Disallow: /cgv/
    Disallow: /search/
    Disallow: /*?add-to-cart=
    Disallow: /*author/
    Disallow: /non-classe/
    Disallow: /*?filter*
    Disallow: /*?min*
    Disallow: /*?product_cat*
    Disallow: /*?product_tag*
    Disallow: /*?source_id*
    Disallow: /*?orderby*
    Disallow: /*?product_order*
    Disallow: /*?product_count*
    Disallow: /*?page_size*
    Disallow: /?add-to-cart*
    Disallow: /uncategorized/
    Disallow: /*_gl
    Disallow: /*?attribute
    Allow: /*css?*
    Allow: /*js?*
    Allow: /wp-admin/admin-ajax.php

    User-agent: AdsBot-Google
    Disallow: /wp-admin/
    Disallow: /wp-admin
    Disallow: /panier/
    Disallow: /contact
    Disallow: /contact/
    Disallow: /suivi-commande/
    Disallow: /livraison/
    Disallow: /politique-de-remboursement/
    Disallow: /support-client/
    Disallow: /mentions-legales/
    Disallow: /mon-compte/
    Disallow: /contact/
    Disallow: /cgv/
    Disallow: /search/
    Disallow: /*?add-to-cart=
    Disallow: /*author/
    Disallow: /non-classe/
    Disallow: /*?filter*
    Disallow: /*?min*
    Disallow: /*?product_cat*
    Disallow: /*?product_tag*
    Disallow: /*?source_id*
    Disallow: /*?orderby*
    Disallow: /*?product_order*
    Disallow: /*?product_count*
    Disallow: /*?page_size*
    Disallow: /?add-to-cart*
    Disallow: /uncategorized/
    Allow: /*css?*
    Allow: /*js?*
    Allow: /wp-admin/admin-ajax.php
    Allow: /*_gl
    Allow: /*?attribute

    User-agent: Googlebot-Image
    Disallow: /wp-admin/
    Disallow: /wp-admin
    Disallow: /panier/
    Disallow: /contact
    Disallow: /contact/
    Disallow: /suivi-commande/
    Disallow: /livraison/
    Disallow: /politique-de-remboursement/
    Disallow: /support-client/
    Disallow: /mentions-legales/
    Disallow: /mon-compte/
    Disallow: /contact/
    Disallow: /cgv/
    Disallow: /search/
    Disallow: /*?add-to-cart=
    Disallow: /*author/
    Disallow: /non-classe/
    Disallow: /*?filter*
    Disallow: /*?min*
    Disallow: /*?product_cat*
    Disallow: /*?product_tag*
    Disallow: /*?source_id*
    Disallow: /*?orderby*
    Disallow: /*?product_order*
    Disallow: /*?product_count*
    Disallow: /*?page_size*
    Disallow: /?add-to-cart*
    Disallow: /uncategorized/
    Allow: /*css?*
    Allow: /*js?*
    Allow: /wp-admin/admin-ajax.php
    Allow: /*_gl
    Allow: /*?attribute
    Sitemap: https://www.satan-shop.com/sitemap_index.xml

    What do you think about that ?

    Hello,

    Those rules do block the type of pages you initially shared for the AdsBot-Google user-agent and it should help allocate more crawl budget to other pages on your website.

    This comment is solely based on was discussed on this thread but it’s important to note that the rules inside the robots.txt file are very specific for each website so it’s always good to test them using a tool such as this one: https://technicalseo.com/tools/robots-txt/

    Don’t hesitate to get in touch if you have any other questions.

    Hello,

    Since we did not hear back from you for 15 days, we are assuming that you found the solution. We are closing this support ticket.

    If you still need assistance or any other help, please feel free to open a new support ticket, and we will be more than happy to assist.

    Thank you.

Viewing 11 replies - 1 through 11 (of 11 total)

The ticket ‘Crawling of wrong URL problem’ is closed to new replies.