A continuation patent about Google Oneboxes shows how user experience data will likely play a larger role in determining whether or not a onebox appears for a set of search results, including oneboxes for fact repository queries, such as Featured snippets
What We Knew About One Boxes
In 2007, I wrote about a patent in Search Engine Land that I titled Google’s OneBox Patent Application. What that patent was telling us was that sometimes Google might add another result at the top of a set of search results, and that extra result could be from some different vertical search results or repositories. If the query searched for was something appearing in news search results, a news result one-box might show up. If a lot of people were looking at images of jaguars in image search, and the query was “jaguar,” then it’s possible that the one-box might contain pictures of jaguars. If I was searching from Jacksonville, and it was a Monday after a Sunday filled with NFL games, including one with the Jacksonville Jaguars, the onebox might contain a sports story about a Jaguars game.
Changes to One Boxes
Google was granted an updated version of the patent from that post this week that appears to have chosen one of many methods described in the earlier version of the patent. The newer version of the patent is a continuation patent, which takes the filing date of the original version, but updates the claims section to reflect how the process it protects is being used.
In the SearchEngineLand post, I wrote about how Google might be identifying and using certain data to understand which repositories to show results from. That patent described at least seven different methods that it might use to determine what type of data to show to searchers. One of those was the most interesting sounding methods, so I wrote about it in my post. The continuation patent seems to point this method out as the approach used to decide upon a repository:
One variation describes a process in which log data is collected about searchers and searches of repositories. The log data is represented as triples of data (u, q, r), with u being information about the searchers, q as information about the query, and r is information about repositories from which search results were provided. Labels for each of the triples of data (u, q, r) are created, where the label includes information about whether the user u desired information from the repository r when the user provided the search query q. Instructions are created to train a model based on the triples of data (u, q, r) and their associated labels, to predict whether a particular user desires information from certain repositories when providing a particular search query.
Repositories Selected for One Boxes Based upon User Data including Clicks
A repository is a collection of data that focuses upon a certain aspect of search, such as news-based searches, Image-based searches, local-based search. Those different types of searches have their own unique ranking factors and their own results. Google has told us about a Browseable Fact-Based repository where query answering type results such as featured snippets come from. It’s interesting that Google is likely using user-data information to decide which repository to show results from. So, when someone decides to click upon a certain result – clicks like that one could determine what other people see who choose the same query to search with
The claims in the new version of patent are worth looking at, as the updated and protected version of the process described by Google. It is possible to try to understand what may have changed by looking at the older version of the claims, and the newest version and seeing what has been removed and what has been added. One thing that stood out right away to me were these lines from the new version of claims:
4. The method of claim 1, where the information is provided in a search results document, and the method includes: positioning, in the search results document, the information based on a respective score for each repository of the more than one of the plurality of repositories.
5. The method of claim 1, further comprising: generating the model based on information associated with log data, the information associated with the log data being formed in triples.
I thought the part of the description to the patent that discussed query log data, presented as triples of data to predict what information is desired from which repositories when a certain query was searched for, was interesting when I read that in 2007. It appears that the newest version of the claims are focusing upon looking at that user data to make a prediction as to what information from which repository to show searchers.
This newer version of the patent is at:
Determination of a desired repository for retrieving search results
Inventors: Michael Angelo, David Braginsky, Jeremy Ginsberg, and Simon Tong
US Patent 9,639,579
Granted: May 2, 2017
Filed: July 27, 2015
A system receives a search query from a user and searches a group of repositories, based on the search query, to identify, for each of the repositories, a set of search results. The system also identifies one of the repositories based on a likelihood that the user desires information from the identified repository and presents the set of search results associated with the identified repository.
With this update to this patent, it seems like Google is likely using the kind of user data I pointed out in instances of clicks which I described in my first writeup of this patent back in 2007, which started with these items:
- The country in which user u is located,
- The language of the country in which user u is located,
- A cookie identifier associated with user u,
- The language of query q,
- Each term in query q,
- The time of day user u provided query q, the documents from repository r that were presented to user u,
This is just the start of the kind of data that Google might be collecting to build a prediction model with, to decide which repository to show information from at the top of search results.