Ambiguous Queries and Semantic Interpretations
When someone asks an ambiguous query at Google, how does it go about responding to that query? It may start off by trying to understand the intent behind the query, to interpret what might have been meant by the person who entered that ambiguous query and finding the right words (a canonical query form) to use to try to uncover an answer that might satisfies a searcher. This was the second patent I’ve seen very recently that used the same example query, “How long is Harry Potter?”
I mentioned this example on Twitter and received many thoughts about how that might be answered:
How would Google Answer the query, "How Long is Harry Potter?" It's the second time I've seen this as an example in a Google Patent, and this second time has a different explanation than the first patent. How would you answer it?
— Bill Slawski ⚓ (@bill_slawski) August 8, 2019
I recently wrote a post about How would Google Answer Vague Questions in Queries? which also used that example query. Google has been granted a patent that covers some of the same territories in a patent that was granted a week later and provides a richer and more detailed answer.
The patent I am writing about with this post is specifically about evaluating different semantic interpretations of a search query. Here are steps that it may take in providing a response to an ambiguous query such as “How long is Harry Potter:”
- An technique would include determining one or more semantic interpretations of the particular search query. Each of those semantic interpretations is associated with at least one canonical query. For each semantic interpretation, a modified search query is generated based on the original search query and the associated canonical query
- Search results are then obtained for the particular search query and the modified search queries
- The search results of each modified search query and the search results of the original search query are compared to evaluate the semantic interpretation associated with each modified search query
- For example, each semantic interpretation can be ranked or validated
- Different semantic interpretations for the original ambiguous query can be compared, and a semantic interpretation for the original search query may be selected based on the comparison.
different versions of an ambiguous query may be compared to each other
To get different versions that may be rewritten, a search is performed on the original query.
Those may be looked at to see if a semantic interpretation, representing a candidate intent associated with it can be determined.
The degree of similarity between the results might be compared
This patent addressing ambiguous queries can be found at:
Evaluating semantic interpretations of a search query
Inventors: Ashish Venugopal, Jakob D. Uszkoreit, John Blitzer, and Edward Everett Anderson
Assignee: Google LLC
US Patent: 10,353,964
Granted: July 16, 2019
Filed: March 11, 2015
The present disclosure relates to evaluating different semantic interpretations of a search query. One example method includes obtaining a set of search results for a particular search query submitted to a search engine; obtaining a set of semantic interpretations for the particular search query; obtaining, for each semantic interpretation of the set, a canonical search query; generating a modified search query based at least in part on the particular search query and the canonical search query for the semantic interpretation; obtaining a set of search results for the modified search query for the semantic interpretation; and determining, for each semantic interpretation of the set, a degree of similarity between (i) the set of search results of the modified search query for the semantic interpretation, and (ii) the set of search results for the particular search query.
Identifying Intent behind an Ambiguous Query
We are told that “In order to improve search result quality, the search engine may interpret received search queries to discern a likely intent associated with each query.” This means what is a searcher most likely looking for when they type something like “How long is Harry Potter?” into a search box.
Difficulties in Determining Intent with Ambiguous Queries
Another example that was also included in the patent I wrote about involving vague queries is mentioned in this patent too:
For example, a query that recites “Washington’s age” could refer, for example, to President George Washington, actor Denzel Washington, the state of Washington, or Washington D.C. Determining the user intent associated with such ambiguous queries may be challenging.
Google has decided that I am most likely interested in George Washington.
The patent tells us it is about providing “techniques for evaluating different interpretations of a particular search query.”
semantic interpretations of an Ambiguous Query
The process behind this patent starts off with a semantic interpretation being associated with at least one canonical query.
1. For each of those semantic interpretations, a modified search query is generated based on the original search query and the associated canonical query.
In the example query “how long is harry potter” the terms “harry potter” are ambiguous, and may refer to one or more particular topics such as:
- Any of the seven books in the Harry Potter franchise
- Any of the film adaptations of the books
- A ride
- Theme park
That query could also refer to the Harry Potter character itself.
Depending on the topic a searcher intended to refer to in the query, a different interpretation can apply, or even several different interpretations.
- Book – A searcher probably wants to know the number of words or pages in the book
- Movies – A Search probably wants to know film’s running time
- The fictional character – the Searcher may want to know his height
Original ambiguous query: How long is Harry Potter?
Semantic Interpretation: How long is the Book Harry Potter?
Semantic Interpretation: How Long is the movie Harry Potter?
Semantic Interpretation: How tall is the character Harry Potter?
Semantic Interpretation: How old is the character Harry Potter?
2. Search results are then obtained for the each of the original ambiguous search query and the modified (the interpretation) search queries.
3. The search results of each modified search query and the search results of the original search query are compared to evaluate the semantic interpretation associated with each modified search query.
4. For example, each semantic interpretation can be ranked or validated. In this manner, different semantic interpretations for the original search query can be compared to each other. In some cases, a semantic interpretation for the original search query can be selected based on the comparison.
Advantages of Following the Process from the Patent
1. Using search results to evaluate the different semantic interpretations, other data sources may be looked at such as:
- Click-through data
- User-specific data
- others that are utilized when producing the search results
2. By evaluating different semantic interpretations for a query, a user intent may be predicted for the query, thereby lessening the effect of any ambiguity in the query on the quality of the identified search results.
3. Confidence Score determined fo each of the semantic interpretations, representing “a likelihood that the associated interpretation matches the user intent for the particular query.”
A confidence threshold may also be defined to indicate a minimum confidence score necessary for a semantic interpretation to be considered when returning search results to the user. Since Google is interested in returning high-quality results to searchers, even when they may type a query that may seem to be ambiguous into a search box, and answer that seems resonable isn’t bad. The patent tells us that this confidence score may be set at a high threshold:
For example, the confidence threshold may specify that semantic interpretation with confidence scores over 90 for a particular search query should be considered when returning search results.
The confidence score for semantic interpretations may be used to decide which one of the semantic interpretations may go with a particular query. For instance, “how long is Harry Patter” may be scored to tell us that the version asking how long one of the Harry Potter books was in Pages, or How long one of the Harry Potter movies might have been, rather than how tall or how old Harry Potter was in one of those books.
Canonical Queries and Ambiguous Queries
This is the second time I recall seeing a document from Google refer to canonical queries. The first time was in the paper Biperpedia: An Ontology for Search Applications, where it seems like it was telling us it may save the format that it saw different query terms in in query logs (along with common misspellings). The canonical queries mentioned in this patent are slightly different. I liked this interpretation of what a canonical query is:
The canonical query may be a query that conveys the user intent associated with a particular semantic interpretation
It could be a “structure or template used to generate a modified search query from the original particular query” and also could be “combined with information in the particular query to generate the modified search query”
We are given some examples of canonical queries being used to provide modified queries in the patent:
For example, the canonical query may be an incomplete query such as “how many pages is the book
Given that template from the canonical query, we see how a modified query might be generated:
A modified search query may be generated from the canonical query using the portions of the particular query. For example, given the previous canonical query and the particular query “how long is harry potter,” the modified search query “how many pages is the book harry potter” may be generated.
A modification may involve rewriting the original query to match a canonical query:
For example, a given query “how long is the book harry potter” could be reformatted to “how long is the harry potter book” to match a canonical query “how long is the book
Entities and Rewriting an Ambiguous Query
In some implementations, generating a modified search query for a semantic interpretation includes replacing a substring included in the particular search query identifying a particular entity with an alternate substring identifying the particular entity included in the canonical search query for the semantic interpretation.
For example, the particular query “how long is harry potter” may have a semantic interpretation of asking for the number of pages in a book.
Including an entity in a query seems to be a way of making sure that there is more certainty to a query. The patent tells us that using an entity from the original query is definitely part of this process:
A canonical query associated with this semantic interpretation could be “how long is the
Thus, with “harry potter” identified as a particular entity, a modified search query could be generated by replacing the substring “harry potter” in the particular search query with the substring “harry potter book” derived from the canonical search query.
Using Similarity to Decide if a Modified Query fits well with an Ambiguous Query
The similarity used to make this decision might be based upon comparing the search results for the ambiguous query and the modified query to see:
1. The frequency of occurrence of particular keywords associated with the particular search query in the modified search query results and the frequency within the search results for the particular search query.
2. The degree of similarity is based on comparing an order of the modified search query results with an order of the search results for the particular search query.
3. Other data might be considered such as
- User click rate
- Site traffic data
- Other data
Google may provide different answers to an ambiguous query such as “how long is Harry Potter?” to see which results people tend to favor over others. It sounds like the decisions to show some results over others might be based upon an initial evaluation of modified queries and a confidence score associated with those. But when someone does a “How long is Harry Potter” type query, it may be more likely that they are asking for the length of a movie or how many pages there might be in one of the Harry Potter books, rather that how tall Harry Potter is or how old Harry Potter is.
We can see how Google attempts to understand the intent behind an ambiguous query, but it is possible that they also attempt to understand the intent behind queries that we may not perceive as being ambiguous, such as a search for “Pizza” around lunchtime, which Google seems to understand is a query for a nearby place to find a slice, rather than a history of Pizza.
You can interpret that as an intent to get some lunch on my part with a high degree of confidence.