When someone performs searches for a particular topic, they may use many query terms to try to find the information they are looking for and may receive a lot of the same pages in the results in response to those, returned to them.
A Google patent granted this week, may look at the results of sets of queries from the same sessions and see how much overlap there is in the results for those, and if there’s a lot of overlap in the results for those queries, may consider the queries to be related to each other.
The described method further includes:
- Identifying a given query in several queries; a first and second grouping in the ordered list for the given query; and a first and second grouping in the ordered list for each of the remaining queries in the plurality of queries.
- Determining non-overlap scores between the given query and each of the remaining queries in the plurality of queries.
The non-overlap scores measure the dissimilarities between the search result documents within the first grouping in the ordered list for the given query appear and the first grouping in the ordered list for each of the remaining queries in the plurality of queries.
- Selecting one or more candidate queries from the remaining queries in the plurality of queries based on the non-overlap scores.
- Determining overlap scores between the given query and each of the candidate queries. The overlap scores measure the similarities between the search result documents within the second grouping in the ordered list for the given query and the second grouping in the ordered list for each of the candidate queries.
- Selecting one or more related queries from the candidate queries based on the overlap scores.
- Associating the related queries with the given query.
While the number of overlapping results can show how similar those queries may be, the numbers of non-overlapping results may show whether or not offering those queries as suggestions may help provide more opportunities for a searcher to find something potentially useful to them as being responsive to what they might be looking for.
Related queries may be shown as possible query suggestions for received queries.
These related queries can represent information requests that may be relevant to the information need of the user, while also showing search results that may be different from the information already being searched by the user. Subsequent submission of a related query by the user may increase the likelihood of assisting users in obtaining search results containing the information they seek.
The patent is:
Identification of related search queries that represent different information requests
Invented by: Sean Liu, and Emily Moxley
Assigned to: Google Inc.
US Patent 9,122,727
Granted September 1, 2015
Filed: March 1, 2013
Methods, systems and apparatus are described herein that include obtaining a respective ordered list of search result documents for each query in a plurality of queries. Non-overlap scores between search result documents within a first grouping in the ordered lists for a given query and remaining queries in the plurality of queries are then calculated.
One or more remaining queries are then selected as candidate queries using the non-overlap scores.
Overlap scores between search result documents within a second grouping in the ordered lists for the given query and the candidate queries are then calculated. One or more of the candidate queries are selected as related queries for the given query using the overlap scores. The related queries are then associated with the given query.
What the patent tells us is that sometimes some queries that may “represent related topics or concepts” will have higher-ranked search results are quite different, and lower-ranked search results may include several documents in common. If there is an appreciable overlap among their lower-ranked search results in comparison to their higher-ranked search results, that can indicate a likelihood that the two queries represent related information requests, but that they do not represent the same or similar information requests.
Because of that, there’s possible value in providing those related queries as query suggestions to the searcher. These related queries could represent information requests that could be relevant to the information need of the user, while also showing them different search results, from the ones already shown to them. That can reduce or eliminate a repeated, prominent presentation of the same search results during a search session. These related query suggestions are seen as meaningful, and could, therefore, increase the likelihood of assisting users in obtaining search results containing the information they seek.
It does make sense that there may be overlap among the lower-ranking results for related queries because the higher ranked results for them would be more related to the words and terms used in those queries themselves, and the lower-ranked terms for both would produce pages that aren’t quite as relevant for those exact queries.
The patent provides and example of a search query “giant panda”, with some related queries shown as “related searches.” that include “panda conservation”, “red panda”, “beijing zoo”, “panda research centers” and “Toronto zoo”. We see a slightly different set now, if we run that search:
Another example provided is when a searcher submits the query “startups”, and related queries include “incubator”, “ipo”, “silicon valley”, “entrepreneurship” and “tech blogs”. Again, a slightly different set of suggestions for queries:
These additional search suggestions may be taken from other queries performed that may be related to this topic and maybe selected as suggestions based upon how much overlap there may have been in pages returned as search results for these queries, after the more specific higher ranking results were returned.