This article delves into the details of how we indexed a website on Google in 10 days (more than 30,000 pages) using the Google Indexing API.
Of course, we’ve worked with indexing throughout the whole of 2021. But the standard work on linking with the page weight, the introduction of SEO tags, and additional repeating linking blocks did not bring the expected effect. Fortunately, we have found a solution, which we will share with you in this article.
This material will be useful for both big and small sites.
We use an online store site as an example.
WHAT DO WE KNOW ABOUT INDEXING IN GOOGLE?
Regarding indexing in Google, the crawling budget is implied first of all. This definition includes many factors that affect indexing: linking, website size, how often a website is crawled by Google’s robots, customized titles, and much more. You can learn more about the concept of the “crawling budget” here.
Thus, it is necessary to accomplish the minimum objectives for the search engine to start indexing, and then subsequently rank the site:
- Quality content and its optimization (meta tags).
- Micromarking.
- Customized Last-Modified headers.
- Linking blocks, such as SEO tile of tags in categories and products.
- Stable website hosting.
- Commercial ranking factors (YMYL).
- Correct robots.txt and sitemaps.
After having completed all the work on this list in the middle of 2021 (around June-July), we waited patiently for positive results in the Google Search Engine.
Unfortunately, we have not saved the screenshot that would reflect the indexing of the website on Google throughout 2021. However, according to Figure 2, it is obvious that all the completed work did not affect the number of indexed pages in any way. We were on a stable plateau.
Yes, it’s true that in Figure 2 there are about 10,000 pages in the index, but the total website size is 70,000 URLs. Approximately in mid-September (1-2 months after the implementation of all the procedures), we began to look for new solutions on how to index the website.
GOOGLE INDEXING API: UNDERSTANDING WHAT IT IS AND HOW TO USE IT
Google Indexing API is a tool that allows you to send links to new and updated landing pages on Google, as well as to remove old and unnecessary junk links. The limit for sending data to Google is 200 links per day.
In early September, before my vacation, I was sent a link to a post about the Google indexer written in Python. I added this post to my To Do list, which happened to be my first significant mistake. Nevertheless, without delving into the topic in more detail, we drew up an assignment for the programmer to create an indexer. But a significant drawback of the indexer is the limit of 200 queries.
More than 60,000 links are supposed to be indexed. Divide the indicated number by 200 requests (per day). As a result, we get 300 days (almost a year minus two months).
PREPARATORY WORK
Before we move on to analyzing the indexer from Google, it is necessary to determine which links are indexed and which are not. To solve this problem, SEO specialists can turn to ready-made services and programs, as well as individual solutions (developments). In our case, a parser was used in conjunction with Xevil and a mobile proxy farm.
First off, I’ll tell you what some tools are and what they’re responsible for:
- A-Parser sends requests to actual Google search results, processes responses, and saves all data. In other words, this is a flexible parser that can be customized for any purpose.
- Xevil is a component from XRumer (link runs on forums and profiles) that allows you to solve captchas.
- The mobile proxy farm includes modems with SIM cards through which requests for issuance were made to receive as few captchas as possible.
At this stage, we got a table with a list of links that need to be indexed (in places where the number of pages in the index is 0). Example in Figure 3.
ANALYTICS
For a detailed analysis of the indexer (Google Indexing API), you need to configure the analytics for the following items:
- Which links and when they were sent?
- Parsing and visualization of server logs.
TRACKING SENT LINKS
Tracking sent links is needed to control the dates the URLs were submitted to the indexer so that it is possible to match them with the date the Googlebot was crawled. For this reason, we have prepared a dashboard in Google Data Studio. Its appearance is shown in Figure 4.
Comment on Figure 4: we sent 200 links each time, but thanks to the graph we realized that we sent several duplicates (400 requests a time). In fact, Google did not process more than 200 requests. The bug was fixed closer to the beginning of January.
PARSING AND VISUALIZATION OF SERVER LOGS
The general dashboard in Google Data Studio consists of 4 tabs:
- Indexing API – was discussed above;
- Google and Yandex bots – to analyze website traffic by Google robots;
- URL filter – submitting sent input links and tracking their attendance by robots;
- not bots – a classic dashboard for analyzing all server response codes, for example, 400s, 500s, and others that are critical to us.
In analytics, we were faced with only one drawback. I completed the parsing and visualization of server logs only on January 18, 2022, but the indexer sent only 200 links each from the beginning of November 2021. As a result, it was impossible to evaluate the site traffic by Googlebot before the specified date (2022). We used the Google Search Console as an alternative source for analyzing Googlebot crawl statistics.
An example of a Google Data Studio dashboard NOT for bots is shown in Figure 5.
INDEXER AND BYPASSING GOOGLE LIMITATIONS
At this stage, the task was to bypass the limit of 200 requests for sending links and to send as much data to the Google indexer per day as possible. The idea behind this was to try to register another Google account and add JSON keys from it to the Python script. Hence, we moved on to the following tasks:
1. Registering a new Google account.
2. Writing a new Python script for cyclic processing of all issued Google accounts.
Let’s take a closer look at the second point. We needed to check the possibility of sending data to the indexer from different Google accounts. For this reason, we took the code from GitHub as a basis and reshaped it for our needs. The modified code is available here (I am not a programmer, so there are a lot of ‘crutches’).
For this code to work at the root of the script, you need to create a “json_keys” folder and place into it all of the JSON key files that Google gave you (you will find the final algorithm of actions at the end of the article). If you have questions about running the script, be sure to watch our practical video, which was specially recorded as an addition to the study on the Google Indexing API.
After writing the script and registering two test Google accounts, I sent the data to the API. It was amazing to see how the limit of 200 lines increased to 400. I urgently gave our Content Manager the task of registering Google accounts. The next day, instead of 400 links, we managed to send more than 4000 of them. A few days later, we had 32 Google accounts at our disposal, and we started submitting 6000+ links per day to the indexer. For clarity, I have attached a screenshot of the links sent to the indexer from Google Data Studio.
How did Search Console react? The scan statistics are shown in Figure 7.
In addition, there is a screenshot from the server log parser, which practically correlates with the data in Figure 8.
Explanations for figures 7 and 8: from January 18 to 23, we sent a test batch of links (about 35,000 links) to the Google indexing API, and then took a break to think and process the data. As it turned out, Googlebot stopped visiting the site as often after the completion of sending data (failure on January 24-25, Fig. No. 7 and No. 8).
You will find the final algorithm of actions at the end of the article. In the meantime, let’s move on to the results, although you have already seen some of them on the screenshots above.
RESULTS
We expected an increase in the number of links in the index. Surprisingly, most of the first (test) batch of links were indexed. Indexing growth from 23,000 to 47,000 links increased significantly in 10 days (from January 19 to 29, according to Figure 9).
Now, let’s talk about the obvious. We use the SEOWORK for analytics and evaluation of the effectiveness of our work. In particular, we found an increase in visibility on Google (Figure 10) by tracking a large pool of keys.
FINAL ALGORITHM OF ACTION
1. Register a new Google account.
2. Click here: https://console.cloud.google.com/apis/enableflow?apiid=indexing.googleapis.com&;credential=client_key&hl=ru&pli=1 and activate the API.
3. Click here: https://console.cloud.google.com/projectselector2/iam-admin/serviceaccounts?supportedpurview=project and click on ‘Create Project’.
4. Create a service account.
5. Fill in all the data, and select an ‘Owner’ role. Click done.
6. Click on the created account (on mail).
7. Go to the Keys section and add a new key. Select the JSON format and save the key file to the json_keys script folder.
8. On Google Search Console, go to ‘Settings’, then to ‘Users and Permissions’, and click on ‘Add the service mail of the created account’.
We hope that this method will also help those who had problems with indexing on Google and who have not previously used the Google Indexing API. We will be happy to answer your questions if you have any.
Author: Dmitry Fedoseev (Traffic-Hunters.com).