-
Hi, After connecting our website addupp.nl to Rank Math, there are a huge number of pages submitted to Google Search Console.
1.32 milion at te moment.
We are checking the site for malware but it seems clean.
Can you help identify why this is happening, and share any resources with us on the subject?Thanks for your help,
Deshna (on behalf of Milo)
-
Hello,
We appreciate your patience and we’re sorry for any trouble you may have faced with your sitemap.
You’re right 1.3 million is extraordinarily high. To help us troubleshoot this issue, can you please do the following:
- Share your website URL with us so that we can check your sitemap and see how many URLs it actually contains.
- Share a screenshot of where you saw the 1.3 million number on Google Search Console. This will help us understand what kind of error Google is finding on your sitemap and how to fix it.
You can use a service like https://imgur.com/upload to share your screenshot with us.
Please let us know if you have any questions or issues during this process. We’re here to assist you.
Thank you for using Rank Math as your SEO solution.
Thanks for your reply.
This is the url of the webiste: https://addupp.nl
And here is a screenshot from the google search console:
https://imgur.com/a/YucgN6hWhat do you make of this?
Hello,
I’ve checked your sitemap, and there are barely 25 URLs included that should be indexed.
In your screenshot, please click the error so the affected URLs will be displayed like the
Crawled - currently not indexed
with the highest number of pages reported. Once done, take a screenshot of the URLs and share it here.Alternatively, you can record a video screencast using a tool like Loom showing the affected URLs and add the link here.
Looking forward to helping you.
Thank you.
Hi Reinelle,
Here you find a screendump with a few example pages:
https://imgur.com/a/JOokWX8What I would like to know:
When Google has indexed the new sitemap, will all the other pages be disregarded or deleted from the search console?Thanks, Deshna
Hello,
Thank you for getting back to us.
It looks like you have shared the same image you shared previously. To help us troubleshoot this issue, can you please do the following:
- Go to Google Search Console and click on the Crawled – currently not indexed error message that you’re seeing for your sitemap (https://i.rankmath.com/i/2w0gtQ). This will show you more details about the error, such as the affected URLs.
- Take a screenshot of this page showing the affected URLs and share it with us.
- You can also use Loom.com to record a video for us.
Thank you for choosing Rank Math!
Hi Great,
I’m sorry for sending the wrong link.
Here is a new one: https://imgur.com/a/wRPr5xhI appreciate you taking your time to look into this.
What I would like to know:
Where could the pages come from? The site has no malware.
When Google has indexed the new sitemap, will all the other pages be disregarded or deleted from the search console?Thanks, Deshna
Hello,
Thank you for providing the correct image.
Well, this makes sense now. The URLs are simply query URLs, as far as we can tell – but the fact that there are over a million of them is weird. If you’re sure your site is malware-free, then there’s a good chance one of your plugins or theme is creating these URLs and Google is discovering and crawling them – taking precious crawl budget away from your relevant pages.
As you can see from your sitemap, Rank Math does not include these query URLs in your sitemap, Google is discovering them from “Google Systems” (https://i.rankmath.com/i/HFnze1), which nobody can really control.
But there are a few ways to move forward:
- Find what is generating these URLs and fix it: This solves the problem directly. It could be a plugin or theme or something from your server end. Find the root problem and fix it from there. Your host is a good place to ask for help with this
- You can exclude these URLs using your robots.txt: This solution only cures the symptom, not the problem. Whatever is generating the query URLs will still be alive and well, but Google will stop crawling the URLs. Also, these will likely not remove the URLs from your Search Console but since they’re not being crawled they would be moved to “Excluded by robots.txt”. They also won’t eat up your crawl budget – which is a good thing, since there are over a million of them.
To exclude query URLs using robots.txt, please navigate to Rank Math > General Settings > Edit robots.txt and add the following line to the disallow list:
Disallow: /*?*
The complete file should look like this (you can copy and paste this):
User-agent: * Disallow: /wp-admin/ Disallow: /*?* Allow: /wp-admin/admin-ajax.php Sitemap: https://addupp.nl/sitemap_index.xml
After following either of the two options, you can try removing your sitemap and adding it back so Google can visit again.
We hope this helps you understand and hopefully resolve the issue. Let us know if there’s any way we could be of help.
Thank you for choosing Rank Math!
Hello again,
Looking into this further, it does looks like your site got by hit by the Japanese keyword hack which typically creates new pages with autogenerated Japanese text on your site in randomly generated directory names.
We’d suggest you reach out to your hosting company or dev team for help with fixing this. You can also learn more about this hack here: https://web.dev/fixing-the-japanese-keyword-hack/
We hope this helps you get to the root of the issue. Let us know if you need help with anything else.
Thank you for choosing Rank Math.
Thank you so ao much!
We will look into this.Best, Deshna
You’re very welcome, Deshna. We’re glad we could help you identify the issue. Please let us know how it goes and if you need any further assistance. We’re always here to support you.
Thank you for choosing Rank Math!
Hello,
Since we did not hear back from you for 15 days, we are assuming that you found the solution. We are closing this support ticket.
If you still need assistance or any other help, please feel free to open a new support ticket, and we will be more than happy to assist.
Thank you.
The ticket ‘Huge number of pages in Search Colsole’ is closed to new replies.