Answering Questions with Structured Data

by Posted @ Mar 20 2018

Twitter

Only the Facts, Fast

Back in 2005, The Official Google Blog published a post called, Just the Facts, Fast. Recently, Google has been showing single result answers to queries that has captured a lot of attention in posts such as, Zero-Result SERPs: Welcome to the Future We Should’ve Known Was Coming.

Showing just one answer to a question, as opposed to a featured snippet at the top of a set of questions seemed to capture a lot of attention.

Question answering at Google has been in the form of fact-based answers to questions at the top of search results that the SEO industry had been referring to as Direct Answers. We’ve heard from Eric Schmidt that Google wants to answer questions people may ask as far back as 2011, as was covered in a post called, Eric Schmidt: Google wants to get so smart it can answer your questions without having to link you elsewhere.

I wrote about how Google was finding facts for such questions in the post: How Google was Corroborating Facts for Direct Answers, which may remind some people of the NAP (name-address-phone) consistency that has been known to help local search results.

I also wrote about a slightly different approach that Google might use to answer questions where they might crawl pages, and collect questions and answers and create a data store of such information from which they could provide answer, in the post, Direct Answers – Natural Language Search Results for Intent Queries.

Structured Data Based

But, a question that has been on my mind, as could Google be using Structured Data, such as Schema markup to answer questions. That is a question that really hasn’t been answered by Google, even recently when we saw Google publish a post titled A reintroduction to Google’s featured snippets (In one of the first things to be published by Danny Sullivan after he joined Google.)

We do get a slightly different answer from a recently granted patent at Google; which focuses on answering questions. It starts off by telling us that searchers often want answers to the questions that they ask:

Users of search systems are often searching for an answer to a specific question, rather than a listing of resources. For example, users may want to know what the weather is in a particular location, a current quote for a stock, the capital of a state, etc. When queries that are in the form of a question are received, some search engines may perform specialized search operations in response to the question format of the query. For example, some search engines may provide information responsive to such queries in the form of an “answer,” such as information provided in the form of a “one box” to a question.

Where this patent seems to veer off from other ones about question-answering is with this next line in the patent’s description:

Some question queries are fact-seeking, and thus are well served by facts that are enumerated in structured data, such as a table of facts.

We’ve seen Schema introduced in the last year in the form of information about Howto facts about how to accomplish some type of task by follow a sequence of events. It would make a terrific answer to a featured snippet.

So, the summary of this patent goes into detail about the process behind it. It tells us how it is unique in this way:

In general, one innovative aspect of the subject matter described in this specification can be embodied in methods that include the actions of receiving a query determined to be a question query that seeks an answer response and data identifying resources determined to be responsive to the query and ordered according to a ranking, the query having query terms; identifying structured content set in a top-ranked subset of the resources, each structured content set being content arranged according to related attributes in one of the resources; for each identified structured content set, determining whether the query matches the structured content set based on terms of the query matching related attributes of the structured content set; selecting one of the structured content sets for which the query is determined to match; generating, from the selected structured content set, a structured fact set from the related attributes that matched the terms of the query; and providing the structured fact set with search results that identify the resources determined to be responsive to the query and being separate and distinct from the search results.

Advantages of Using Structured Data to Answer Questions

The summary of the patent provides a list of “advantages” behind the use of this process:

1. Multiple quality signals ensure that the search system obtains high precision matches.
2. The system is able to provide unique results from structured sources without the need of human curation.
3. The system can readily provide fact answers for various less-know properties of prominent entities, e.g., [Mongolia population density], [what is India’s highest point], [French population growth rate], [Nebraska quarterback Heisman], etc.
4. Fact sets based on structured data can be provided in response fact-seeking question queries.
5. By first identifying structured content in resources and processing subsets of the structured content, processing power required at query time are reduced, which, in turn, improves the technology area of search query processing.
6. This also allows faster access to information most relevant to user searches, and in turn, improves user experiences.
7. Additionally, this improves the likelihood that fact sets that directly answer informational needs are provided to users, and in turn, further improves users’ experiences.

This recent patent is:

Answer facts from structured content
Inventors: Jayant Madhavan, Hongrae Lee, Warren H. Y Shen and Sreeram Viswanath Balakrishnan;
Assignee: Google LLC (Mountain View, CA)
US Patent: 9,916,348
Granted: March 13, 2018
Filed: August 12, 2015

Abstract

In one aspect, a method includes receiving a query determined to be a question query that seeks an answer response and data identifying resources determined to be responsive to the query; identifying structured content set in a top-ranked subset of the resources, each structured content set being content arranged according to related attributes in one of the resources; for each identified structured content set, determining whether the query matches the structured content set based on terms of the query matching related attributes of the structured content set; selecting one of the structured content sets for which the query is determined to match; generating, from the selected structured content set, a structured fact set from the related attributes that matched the terms of the query; and providing the structured fact set with search results that identify the resources determined to be responsive to the query.

Take-Aways

You may remember when Google had doctors and people from the Mayo clinic update knowledge panels in a human-curated way. That would be difficult to do for a wide range of domains of facts.

Google did come out with a paper a couple of years ago that tried to score the sources of facts found on the Web, called Knowledge Based Trust: Estimating the Trustworthiness of Web Sources. That would be one way of trying to make sure that answers to questions were coming from sources that were known to be correct most of the time (though not a guarantee that all answers from sources are correct.)

The patent includes an example of a question query asking about baggage fees for a particular airline and using a data table that contains baggage fee information, to generate an answer box that contains a fact answer from the structured data in response to the query.

The patent tells us that structured data might come from a number of different types of resources that can be identified using processes such as “markup language tag detection, formatting instructions, file identifiers, etc.”

We also learn about how structured data query templates might be built and indexed to provide answers to question queries.

The description in this patent seems to focus upon data-based tables as a source of structured data used as fact sources to answer queries, such as the baggage fee costs for a certain airline. It provides some detailed examples, of how a query template that can answer such queries might be constructed.

We may start seeing Question Answering using Structured Data to answer queries in the future, and those may use different approaches than tables, such as in the example from the patent. It could use structured data from sources such as json-ld. We may have to explore those further when we see them,

subscribe to our newsletter

Leave a Comment