How Many People does Mudville Stadium (where Mighty Casey Struck Out) hold?
Google was granted a new patent involving query rewriting earlier this week. It does not work the same way that Google’s Hummingbird does or how Google’s Rankbrain works.
We’ve seen patents about rewriting queries searchers may search with returning pages that do meet the situational of informational needs of a searcher. Those are ideas behind Google’s Hummingbird update and other Google patents on using synonyms to rewrite queries.
In the drawing below from the Hummingbird patent, the word “place” for Chicago Style Pizza can be rewritten to “restaurant”, which is easier for Google to answer as a query.
I wrote about a patent describing a query rewriting approach used by Hummingbird in my post The Google Hummingbird Patent? In that post, I wrote about a Google Patent granted two weeks before Google announced the Hummingbird Update, which shared some examples of query rewriting with the announcement. The patent was Synonym identification based on co-occurring terms.
At the event announcing Hummingbird they made the following statement about the update:
In particular, Google said that Hummingbird is paying more attention to each word in a query, ensuring that the whole query — the whole sentence or conversation or meaning — is taken into account, rather than particular words. The goal is that pages matching the meaning do better, rather than pages matching just a few words.
In the Query “What is the best place for Chicago Style Pizza,” this patent tells us that the word “place” could be substituted with the word “Restaurant” which would make it easier for the search engine to answer.
This new patent may look at a searcher’s own words to rewrite a query. It may look at what queries they may have performed before to help them find what they want to find. I am going to provide a summary of the process behind the patent, and then an example of how previous queries may help rewrite a searcher’s query.
Query Rewriting using Previous Queries
These are the steps identified in the new granted patent that detail the process behind it:
- The search engine receives a query from a searcher
- The search engine may have received several previous queries from the same searcher during the same session
- The search engine may create many candidate query rewrites, based upon the latest search query and the prior search queries from the same searcher
- Those candidate query rewrites are scored based upon determining the quality of the rewrite from an analysis of search results responsive to the candidate query rewrite
- A candidate query rewrite is selected based upon a score that satisfies a threshold value
- Those search results from the selected candidate query rewrite are shown to the searcher
The patent tells us that there are some optional features associated with this query rewriting approach. I thought it was interesting that the inclusion of entities in previous queries was an interesting aspect of this rewriting approach.
These optional steps flesh out the steps listed above.
- Creating the number of candidate query rewrites can mean concatenating the original query with each prior search query
- Each prior query has a timestamp, and part of scoring the candidate query rewrites can include weighting that candidate rewrites on how old the prior queries are.
- Creating those number of candidate query rewrites can include identifying queries from a collection of queries from several users that are similar to the search query.
- Scoring candidate query rewrites also includes determining how popular for each candidate query rewrite is, based on a collection of queries from many users.
- Scoring each of the candidate query rewrites can also include determining whether it includes a referential term of a particular type (likely an entity reference)
- A score for a candidate query rewrite is can be increased in response to determining that the candidate query rewrite includes an entity of the particular type.
- The scoring each candidate query rewrite also includes determining whether the search query has terms in it that are highly correlated with particular entities.
- A score of a candidate query rewrite can be increased in response to determining that the candidate query rewrite includes the entity highly correlated with the query term.
- Determining the quality of each candidate query rewrite based on an analysis of search results responsive to the candidate query rewrite includes obtaining search results responsive to each candidate query rewrite and determining a quality of the search results.
This query rewriting approach isn’t about understanding the context of all the words in a query and rewriting them in a way that is more likely to return a result that is useful to a searcher. It expands upon that by having the search engine understand the context of several queries from the same query session, and whether or not they might help provide an answer that satisfies a searcher. The patent tells us that the advantage behind the process from this patent is to:
Using prior user session queries to rewrite queries improves the likelihood of returning search results responsive to the user’s intent.
This Query Rewriting patent is:
Query rewriting using session information
Inventors: Marcin M. Nowak-Przygodzki and Behshad Behzadi
Assignee: Google LLC
US Patent: 10,387,437
Granted: August 20, 2019
Filed: January 13, 2017
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for natural language processing. One of the methods includes receiving a search query from a user during a user session; obtaining a plurality of prior search queries by the user received during the user session; generating a plurality of candidate query rewrites, wherein the candidate query rewrites are derived from the search query and the plurality of prior search queries by the user; scoring each candidate query rewrite, wherein scoring each candidate rewrite includes determining a quality of each candidate query rewrite based on an analysis of search results responsive to the candidate query rewrite; selecting a candidate query rewrite having a score that satisfies a threshold value, and providing search results responsive to the selected candidate query rewrite.
Query Rewriting Testing and Takeaways
I did try out the queries from the example listed in this patent, and it didn’t rewrite my query for me. Always be testing stuff like this, regardless of whether I write it, or if you come across a patent on your own, and it sounds interesting – see if Google is doing what the patent says, or if they have implemented anything that shows that they are moving towards it.
I recently wrote the post Quality Visit Scores to Businesses May Influence Rankings in Google Local Search, and it is impossible to tell whether they are using quality visit scores to increase the rankings of local results, but there are other signs that Google may be moving towards such a thing. The first of those is that Google is showing off quality visit information in Google Analytics 360. The second is that a recent post on the Google Webmaster blog told us that Google would be awarding badges in different business categories for businesses that were among the top 5% visited sites in their categories. So look for signs that processes described in patents are in use.
The examples from this patent include searching using the following queries:
The next query would be:
[what is the capacity]
And it would refer to the very first of the prior queries.
A candidate query rewrite based upon concatenating that query with the first prior query would be:
[what is the capacity of Mudville Stadium?]
It also refers to a property of an entity named in that first query (the capacity of the stadium), which fits with the optional features that a candidate rewritten query would be scored highly upon.
Google is not combining that last query with the first query and returning results that show me the capacity of the Mudville Stadium at this time. Then again, I’m not sure that there is a real Mudville Stadium (which is where Casey At the Bat took place, so I tried the Same Set of queries, replacing Mudville Stadium with Lincoln Financial Field (which is real) and Google did not tell me the capacity for Lincoln Financial Field either. At least not yet. I will be trying again.