Will Google Start Explaining Estimates in Search Results?
We’ve been seeing an increase in the number of answers that Google shows for questions that searchers ask in Google’s search box. And Google has made statements about wanting to answer such questions, which I wrote about in When Google Started Showing Direct Answers, which points out a 2005 Google Blog post, and 2011 granted Google Patent about answering fact questions in response to queries.
In a 2017 Alphabet Financial Statement, we were told that Google wanted to start answering questions without requiring that searchers look through a bunch of documents to find answers to those questions:
Instead of just showing ten blue links in our search results, we are increasingly able to provide direct answers — even if you’re speaking your question using Voice Search — which makes it quicker, easier and more natural to find what you’re looking for.
There are times when the sources of an answer for such questions may be missing entities asked about or contain incorrect facts about some entities. A newly granted Google Patent tells us about a problem that exists with some data sources that can prevent it from answering questions using those sources:
Relational models of knowledge, such as a graph-based data store, can be used to provide answers to search queries. Such models describe real-world entities (people, places, things) and facts about these entities in the form of graph nodes and edges between the nodes. While such graphs may represent a significant amount of facts, even the largest graphs may be missing tens of millions of entities or may have incorrect facts for some of the entities. For example, dates or other attributes can often be missing for a given entity.
The patent aims at resolving such problems by inferring answers based upon related facts (providing explainable estimates):
Facts missing from a relational model of knowledge often can be inferred based on other related facts in the graph. For example, a search system may learn that in 70 percent of marriages, the husband and wife are within 5 years of age. Using this distribution, the system can estimate with high confidence that a man whose birthdate is unknown, but whose wife’s birthdate is known, is most likely within 5 years of the age of his wife. While this example uses one piece of supporting evidence (called a feature), the age of the spouse, estimates of missing or incorrect facts are often more complex and can be based on several, even hundreds, of such features. Some implementations provide a search interface that provides an estimate for a missing fact as well as a human-readable explanation of the basis for the estimate. For example, a search system may use the joint distribution of a plurality of features to generate an estimate for information requested by a query that cannot be directly obtained from a data graph. Each feature may represent a fact related to the missing information. The system may apply a set of measures against the features to determine which features and combination of features strongly influence the estimate and select a small quantity of the features for an explanation that is displayed to the query requester. The quantity of features used in the explanation may depend on the strength or the type of the features or its non-linear relation to other features. In one implementation, the system may use templates to provide the human-readable explanation of the estimate.
An estimate for a missing fact would be accompanied by a human-readable explanation of a basis for that estimate. The patent points out the use of a joint probability distribution to estimate some facts That explanation is built into the process behind this patent, and we are told that it would be included, along with an estimate as part of a search result for a query. So, when a query is for an age of a particular man, and his wife’s birthdate is known, we can be given an estimate of his birthdate based upon her birthdate:
In one aspect, a computer system includes at least one processor and memory storing a data graph and instructions that, when executed by the at least one processor, cause the system to receive a query that requests information for a first entity, and generate an estimate for the requested information using known information from the data graph for second entities related to the first entity in the data graph. The instructions may also include instructions that cause the system to generate, from the known information used to determine the estimate, an explanation for the estimate based on the known information deemed influential to the estimate, and provide the explanation and the estimate as part of a search result for the query. For example, when the first entity is a person, a second entity is a spouse of the person, the known information can include an age or a birthdate of the spouse.
We are told that when such estimates are provided to searchers, those answers may sometimes be kept in the data graph, based on how influential the known information is, and a contribution score for that information.
Advantages to this Explaining Estimates for Missing Facts Approach
The patent tells us about what it believes are advantages to using this process.
One of those is that explaining estimates can give searchers “a sense of understanding about the estimate and a basis to believe, or not believe, the estimate, which enhances the user’s search experience.”
Another is that making estimates based upon “influential features” as well as “features that can be estimated by a joint distribution model,” means that this system is not relying upon “manually entered or maintained lists.”
The patent about explaining estimates about facts is:
Providing an explanation of a missing fact estimate
Inventors: Gal Chechik, Yaniv Leviathan, Ran El Manor, Yoav Tzur, Eyal Segalis, Efrat Farkash and Yossi Matias
Assignee: GOOGLE LLC
US Patent: 10,318,540
Granted: June 11, 2019
Filed: December 29, 2016
Systems and methods are disclosed for providing an explanation of an estimate for information missing from a data graph. An example method may include receiving a query that requests information for a first entity and receiving an estimate for the information, the estimate being based on a plurality of features of a joint distribution model. The method may include determining respective contribution scores for the plurality of features, selecting a quantity of the features with highest contribution scores, generating, using the selected quantity of features, an explanation for the estimate; and providing the explanation and the estimate as part of a search result for the query.
Other Missing Information
The patent starts with an example about Birthdates, but it covers other types of information as well. It can look through search records to try to get an idea of what people may have requested in the past, and what might be missing in its data graph.
In some implementations, the modules may include a prediction engine. The prediction engine determines that requested information is missing from the data graph and may provide an estimate for the missing information. The requested information may be requested in a query, or may be determined to of the type of information often requested in queries. For example, the prediction engine may analyze search records to determine what kinds of information query requestors have often requested in the past (e.g., like birthdates, spouses, song or movie release dates, etc.) and use this information to look for these facts in the data graph. Of course, the prediction engine may also include other methods of finding missing facts, for example using an entity type to determine what attributes entities of the entity type have and look for missing attributes for entities of the entity type. For example, a person entity may have a birthdate, so the prediction engine may look for entities that are people that are missing a birthdate, etc.
Additional Features Used to predict a birthdate.
The patent is showing off other kinds of information that could be used in explaining estimates about birthdates. These could include:
- Age of the person’s spouse.
- Age of the person’s child.
- College graduation date.
I can’t claim to have seen any estimates with explanations for facts that may be missing when asking a question as a query, and expecting a textual answer. I will be keeping an eye out for one of these. I expect to see Google explaining estimates over things other than just Birthdates.