close
close

Google on how it handles search incident disclosure

Google’s latest podcast, Search Off The Record, discusses examples of disruptive incidents that can impact crawling and indexing, and the criteria used to decide whether or not to disclose details of the incident.

The problem of such a statement is complicated by the fact that there are cases where SEOs and publishers report that search is not working properly, even though from Google’s perspective it is working properly.

Google search has a high availability

The interesting part of the podcast started with the observation that Google Search (the homepage with the search box) itself has an “extremely” high level of availability and rarely goes down and is unreachable. Most of the problems reported were due to network routing issues on the Internet itself, rather than errors within Google’s infrastructure.

Gary Illyes commented:

“Yeah. The service that hosts the homepage is the same one that hosts the status dashboard, the Google Search Status Dashboard, and it has an insane uptime number. …the number is like 99,999, whatever.”

John Mueller jokingly answered with the word “no” (pronounced like the number nine):

“No. It’s never down. No.”

Google employees admit that the rest of Google Search is also experiencing outages in the backend and explain how they are dealing with it.

Crawling and indexing incidents at Google

Google’s ability to crawl and index web pages is critical for SEO and revenue, and disruptions can be disastrous, especially for time-sensitive content like announcements, news, and sales events (to name a few).

Gary Illyes explained that at Google, there is a team called Site Reliability Engineering (SRE) that is responsible for keeping the public-facing systems running smoothly. There is an entire Google subdomain dedicated to site reliability, which explains that they approach the task of keeping systems up and running in a similar way to how they approach software issues. They monitor services like Google Search, Ads, Gmail, and YouTube.

The SRE page explains the complexity of their mission as follows: it ranges from very small-scale troubleshooting (fixing individual problems) to fixing large-scale problems that impact “continental-scale service capacity” for billions of users.

Gary Ilyes explains (at minute 3:18):

“Site Reliability Engineering org publishes a manual on how they handle incidents. And many of the incidents are detected because they are problems with some system. They are detected with automated processes, which means, for example, there are auditors or there are certain rules set for monitoring software that checks numbers.

And when the number exceeds a certain value, an alarm is triggered, which is then recorded by software such as incident management software.”

Indexing problem in February 2024

Next, Gary explains that the February 2024 indexing issue is an example of how Google monitors and responds to incidents that could impact users’ searches. Part of the response is figuring out whether it’s an actual problem or a false positive.

He explains:

“That’s what happened on February 1st too. Basically, some numbers got mixed up and then an incident was automatically triggered internally. Then we have to decide if it’s a false positive or if it’s something we really need to investigate, like we, the SRE people.

And in this case, they decided it was a valid issue. And then they moved the priority of the incident up a notch, no matter what it was.

I think it was a minor incident at first and then they upgraded it to mediocre. And then when it gets to mediocre, it goes into our inbox. So we have a threshold for mediocre or higher. Yes.”

Minor incidents are not disclosed publicly

Gary Ilyes then explained that they don’t communicate every small incident because most of the time users won’t even notice it. The most important consideration is whether the incident affects users, who will then automatically be given a higher priority.

An interesting fact about Google’s decision-making is that issues that affect users are automatically given higher priority. Gary said he doesn’t work in SRE, so he couldn’t comment on exactly how many users need to be affected before Google makes a public announcement.

Gary explained:

“SRE would investigate everything. For example, if they get a probability warning or a warning based on any numbers, they will look into it and try to explain it themselves.

And if it’s something that affects users, then that almost automatically means they have to raise the priority because users are actually affected.”

Incident with missing pictures

Gary reported another example of an incident, this time involving images not being shown to users. It was decided that while the user experience was impacted, it was not so much that users could not find what they were looking for. While the user experience was degraded, it was not so much that Google became unusable. Therefore, it is not just whether users are impacted by an incident that triggers a priority escalation, but also how badly the user experience is impacted.

In the case of the images not showing, it was decided not to make a public statement because users would still be able to find the information they needed. Although Gary didn’t mention it, it sounds like an issue recipe bloggers have had in the past when images stopped showing.

He explained:

“For example, there was an incident recently where some images were missing. If I remember correctly, I intervened and said, ‘This is stupid and we shouldn’t publicize this because the impact on users is actually not bad,’ right? Users literally just aren’t getting the images. It’s not like anything is broken. They just won’t see certain images on the search results pages.

And for me, it’s just, well, back to 1990 or back to 2008 or something. It’s like it’s still usable and everything’s still fine except for some pictures.”

Are publishers and SEOs taken into account?

Google’s John Mueller asked Gary if the hurdle for a public announcement was whether the user experience was compromised or whether the experience of publishers and SEOs was also taken into account.

Gary replied (after about 8 minutes):

“So from a search perspective, these are search relationships and not site owner relationships.

But in a broader sense, like the website owners, they would also care about their users. So if we care about their users, they’re the same group of people, right? Or is that too positive?”

Gary apparently sees his role primarily as a search relations team in the general sense of his users. This may surprise many in the SEO community, as Google’s own documentation for the Search Off The Record podcast explains the role of the search relations team differently:

“As the Search Relations team at Google, we are here to help website owners make their websites successful in Google Search.”

Listening to the entire podcast, it’s clear that Google employees John Mueller and Lizzi Sassman place a high value on engaging with the search community, so perhaps there’s a language issue that’s causing his remark to be interpreted differently than he intended?

What do search relationships mean?

Google explained that it has a process for deciding what to disclose about disruptions to search and that this is a 100% reasonable approach, but it is important to remember that the definition of “relationships” is a connection between two or more people.

Search is a relationship. It is an ecosystem where two partners, the creators (SEOs and website owners), create content and Google makes it available to its users.

Featured image from Shutterstock/Khosro