How does Google Decide Upon Autocomplete Query Suggestions?
When Google suggests queries to search upon, based upon what is being typed in a search box, how does it come up with those suggestions?
Google was granted an updated, or continuation patent this week at the US Patent Office, on the process by which they autocomplete query suggestions. The original version of the patent, Autocompletion using previously submitted query data was granted on January 6th, 2015.
As a continuation patent, it takes the filing date of the initial version of the patent, and only updates the claims section. So, the claims from the older version of the patent are different from this newer version. It is interesting comparing the two versions because they are different in important ways. How do these autocomplete query suggestions work at Google?
The patent description tells us that it involves some mind-reading to give a reader what they want:
Internet search engines aim to identify documents or other items that are relevant to a user’s needs and to present the documents or items in a manner that is most useful to the user. Such activity often involves a fair amount of mind-reading–inferring from various clues what the user wants. Certain clues may be user-specific. For example, the knowledge that a user is requesting a mobile device, and knowledge of the location of the device, can result in much better search results for such a user.
Google appears to have changed the documents that they will present in these query predictions to make what may seem to better guesses. That can be seen by comparing the older and the newer claims from an earlier version of the patent and the latest from the continuation patent.
The Older First Claim
1. A computer-implemented method for processing query information, comprising: receiving query information at a server system, wherein the query information includes a portion of a query from a search requestor, the query information being received prior to receiving data indicating that the search requestor has completed the query and the portion of the query from the search requestor being only a portion of a final query; obtaining a set of predicted queries relevant to the portion of the query from the search requestor based upon the portion of the query from the search requestor and data indicative of search requestor behavior relative to previously submitted queries, wherein the set of predicted queries includes two or more predicted queries, and each predicted query is a prediction of a possible final query of the search requestor and wherein each predicted query includes the portion of the query and is different from each other query; ranking the predicted queries in the set of predicted queries according to a ranking criteria; providing the ranked set of predicted queries for display to the search requestor; determining whether an input is received from the search requestor selecting a predicted query, of the ranked set of predicted queries displayed to the search requestor, within a specified time; in response to a determination that the input from the search requestor selecting a displayed predicted query is not received within the specified time: obtaining a subsequent ranked set of predicted queries for the portion of the query from the search requestor, the predicted queries in the subsequent ranked set of predicted queries being ranked according to different criteria than the predicted queries in the ranked set of predicted queries; and providing the subsequent ranked set of predicted queries for display to the search requestor in response to receiving the query information.
The Newer First Claim
The predicted query suggestions shown as autocomplete query suggestions in the new claims are based on a searcher’s previous searches and documents that they may have looked at and interacted with previously. That is a change from the first version of the patent. This reminded me of another continuation patent I looked at from Google, which I blogged about in a post that I called, Personalizing Search Results at Google. That one told us that it might personalize search results by selecting them from the union of two different sets of documents. One of those sets is a set of “high-quality sites” and the other set includes documents that are considered “bias documents” or pages that may have shown up in a person’s search or query history. They may have visited those pages before, or seen them in a set of search results and not clicked through to them.
So, autocomplete query suggestions may end up returning documents that a person may have seen before (a positive bias, maybe), or interacted with in some way such as not selecting them from search results (more of a negative bias). I’ve highlighted in yellow where it talks about previous documents.
1. A method performed by data processing apparatus, the method comprising: receiving, from a user device of a user, query data specifying a portion of a query entered by the user; selecting, based on the portion of the query and first criteria different from query text entered by the user, a first set of predicted queries that each predict a respective final query for the portion of the query; providing, to the user device, data that cause presentation of the first set of predicted queries at the user device; receiving, from the user device, a user request for additional predicted queries, wherein the user request is sent by the user device in response to user-initiated activity; in response to receiving the user request for additional predicted queries, selecting, based on the portion of the query and second criteria that is (i) different from the first criteria and (ii) different from query text entered by the user, a second set of predicted queries that each predict a respective final query for the portion of the query, wherein at least one of the first criteria or the second criteria is based upon a behavior of the user relative to documents provided to the user in response to previous queries received from the user; determining that the second set of predicted queries includes a given predicted query that is included in the first set of predicted queries; removing the given predicted query from the second set of predicted queries; and providing, to the user device, data that cause presentation of the second set of predicted queries at the user device.
Links from Sites and Click Data
This is an interesting statement about auto-complete query suggestions that appears in both the older and the newer version of the patent. It seems that if the search results from a query that is being typed link to other results from query suggestions, that may be a sign that a query that returns those linked-to pages may be something that would interest a searcher. That isn’t part of the change from the older version of this patent, to the newer version, but it is an interesting aspect of both of them, which shows the potential value of linking out to other sites and other pages:
Clues about a user’s needs may also be more general. For example, search results can have elevated importance, or inferred relevance, if several other search results link to them. If the linking results are themselves highly relevant, then the linked-to results may have particularly high relevance. Such an approach to determining relevance may be premised on the assumption that, if authors of web pages felt that another web site was relevant enough to be linked to, then web searchers would also find the site to be particularly relevant. In short, the web authors “vote up” the relevance of the sites.
We’ve been told by Google representatives that click-data isn’t a ranking signal, but may be used to test Google’s algorithms. But sometimes what we are told by search engineers might be a little different from what might be written in a patent, like in this section of the description:
Other various inputs may be used instead of, or in addition to, such techniques for determining and ranking search results. For example, user reactions to particular search results or search result lists may be gauged, so that results on which users often click will receive a higher ranking. The general assumption under such an approach is that searching users are often the best judges of relevance, so that if they select a particular search result, it is likely to be relevant, or at least more relevant than the presented alternatives.
This patent does tell us that click data may be used to determine what predicted queries are shown to searchers:
Particular embodiments of the described subject matter can be implemented to realize one or more of the following advantages. A search assistant receives query information from a search requestor, before the requestor indicating completion of inputting the query. Additionally, information associated with previous user (or users) searches (such as click data associated with search results) is collected. From the received query information and the previous search information, a set of predicted queries is produced and provided to the search requestor for presentation.
The newer version of this auto-complete query suggestion patent can be found at:
Autocompletion using previously submitted query data
Inventors: Michael Herscovici, Dan Guez, and Hyung-Jin Kim
Assignee: Google Inc. (Mountain View, CA)
US Patent: 9,740,780
Granted: August 22, 2017
Filed: December 1, 2014
A computer-implemented method for processing query information includes receiving query information at a server system. The query information includes a portion of a query from a search requestor. The method also includes obtaining a set of predicted queries relevant to the portion of the search requestor query based upon the portion of the query from the search requestor and data indicative of search requestor behavior relative to previously submitted queries. The method also includes providing the set of predicted queries to the search requestor.