Does Google Use Schema to Write Answer Passages for Featured Snippets?

by Posted @ Jan 24 2019

Twitter


There hasn’t been an answer to this question about featured snippets until now, with the granting of a patent that says that an answer passage might be selected based upon a score that show that there is structured data (such as schema) and unstructured data (such as prose passages) to provide an answer.

Google was granted a patent last week, describing search engine query processing when it comes to question answering.

The patent tells us about what makes question answering results different and unique:

Users of search systems are often searching for an answer to a specific question, rather than a listing of resources. For example, users may want to know what the weather is in a particular location, a current quote for a stock, the capital of a state, etc. When queries that are in the form of a question are received, some search engines may perform specialized search operations in response to the question format of the query. For example, some search engines may provide information responsive to such queries in the form of an “answer,” such as information provided in the form of a “one box” to a question.

I’m reminded of Google dictionary results, which I wrote about back in 2006, in the post Looking at Google Definitions. That reference to a one box type of answer also reminds me of Google’s one box patent, which tells us about the large amount of data that Google might look at when it decides to return a one box result. I wrote about an update to the one box patent in 2017 in the post at Google Updates Their One Box Patent.

What are Candidate Answer Passages?

This new patent about question answering at Google introduces the concept of candidate answer passages, which they define for us at the start of this candidate answer passages patent:

Some question queries are better served by explanatory answers, which are also referred to as “long answers” or “answer passages.” For example, for the question query [why is the sky blue], an answer explaining Rayleigh scatter is helpful. Such answer passages can be selected from resources that include text, such as paragraphs, that are relevant to the question and the answer. Sections of the text are scored, and the section with the best score is selected as an answer.

Knowing something about how an answer passage might be scored by Google may improve your chances of creating an answer passage on your page that Google may use to answer something such as [why is the sky blue?]

How are answer passages scored by Google?

First, Google looks at a query received to see what type of response it appears to be looking for. Is it a question query that is looking for an answer response and “data identifying resources determineed to be responsive to the query?”

The data resources that an answer comes from may be scored based upon the following factors:

The resource contains a number of passages, with each of those being content that is eligible to be included as an answer.

The passages may be judged based upon “selection criterion” that may look at:

  • Whether there is structured data (such as schema) and unstructured content (such as text on a web page) that responds to the query. as an answer.
  • Is the resource separate and distinct from search results that might be included in addition to an answering passage?

Why Does Google Want Structured and Unstructured Content for their Answer Passages?

The patent refers to this requirement as an advantage of the process behind the patent.

By requiring both, Google tells us that the unstructured content alllows the searcher to receives “prose-type explanations,” and the Structured Content enables factual information to be returned, which means that an answer can be a combination of prose and facts, which can be very relevant to what the searcher was trying to find.

Addressing Searcher’s Informational Needs with Answer Passages

The patent tells us that when they score candidate answer passages, they look at query dependent and query independent signals.

Query Dependent signals are ones that are based upon how relevant a passage might be for the terms used in the query to find a passage. So, a question asking about whether Rami Malek sung in the movie Bohemian Rhapsody would score higher based upon query dependent signals if it mentioned the actor, the movie, and was about him singing.

Query Independent signals are ones that look at other things than relevance to query terms, such as the amount of links pointed to a page that passages are upon, or how fresh and timely that page may be if the question was one that involved very timely news (such as the winning of a best drama movie in the Golden Globes for Bohemian Rhapsody.)

The patent says that this scoring based upon both query dependent and query independent signals tells us that:

the query dependent signals may be weighted based on the set of most relevant resources, which tends to surface answer passages that are more relevant than passage scored on a larger corpus of resources. This, in turn, reduces processing requirements and readily facilitates a scoring analysis at query time.

Earlier patents that talked about providing answers for queries that contained questions said that they were looking for answers from high authority sites, but didn’t provide as much detail. I wrote about one of those in the post, Direct Answers – Natural Language Search Results for Intent Queries. It’s difficult believing I wrote that five years ago. I’ve been waiting for something ever since that said that Google might be looking at structured data for answers for those questions.

This newly granted patent was orginally filed in 2015. By telling us that having Structured data on our page increases chances of showing a featured snippet, it provides another good reason to be including structured data on your site.

The patent is:

Candidate answer passages
Inventors: Steven D. Baker, Srinivasan Venkatachary, Robert Andrew Brennan, Per Bjornsson, Yi Liu, Nitin Gupta, Diego Federici and Lingkun Chu
Assignee: Google LLC
US Patent: 10,180,964
Granted: January 15, 2019
Filed: August 12, 2015

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating candidate answer passages. In one aspect, a method includes receiving a query determined to be a question query data identifying resources determined to be responsive to the query; for each resource in a top-ranked subset of the resources: identifying a plurality of passage units in the resource; applying a set of passage unit selection criterion to the passage units, each passage unit selection criterion specifying a condition for inclusion of a passage unit in a candidate answer passage, wherein a first subset of passage unit selection criteria applies to structured content and a second subset of passage unit selection criteria applies to unstructured content; and generating, from passage units that satisfy the set of passage unit selection criterion, a set of candidate answer passages.

subscribe to our newsletter

5 Comments

  1. Bob Gladstein

    January 24th, 2019 at 5:47 pm

    Hi Bill,
    Earlier today, an article was published (https://www.searchenginejournal.com/how-to-rank-featured-snippets/288573/) that suggested that John Mueller had suggested (lots of suggestions) that schema code, or structured data in a more general sense, didn’t directly help a page get returned as a featured snippet.

    You don’t appear to be directly saying the opposite here, but I think it would be really easy to conclude that what you’ve written here disagrees with that article, so i wonder if you could take a shot at clearing up the grey areas between the two.

    Reply

    • Bill Slawski

      January 24th, 2019 at 6:52 pm

      Hi Bob, I didn’t see John’s Post until after I had written this post, or I would have considered linking to it here. John is a Webmaster evangelist for Google, and does know the answers to a lot of questions about what is happening at Google. I you had asked me a couple of weeks ago whether schema made a difference whether something would be considered a featured snippet, I would have said it probably isn’t necessary, because it seemed that Google was creating a data store of questions and answers, and that the answers didn’t rely upon the existence of Schema.

      What I thought was really interesting about this patent was that it provide a really good reason for looking at Schema in addition to finding prose based answers, so that it could also show factual information from that Schema in addition to the text-based answers.

      There are aspects of Schema markup that exist, but haven’t been implemented yet, such as a “knows about” value. So, if you’re writing a page about a plumber, you will be able to indicate that plumber “knows about” drain repair, and has an expertise in it.

      There is a question schema: https://schema.org/Question, and an answer Schema:https://schema.org/Answer

      There is also a Speakable Schema that Google has said they will start accepting from News Sources, to enable a speaker device to read answers to questions that a site owner has selected: https://developers.google.com/search/docs/data-types/speakable

      There is more about that speakable schema in the pending schema information: https://pending.schema.org/speakable

      This is one of the fastest growing areas of SEO. The SEJ article makes an assumption that, ” If structured data was an important factor for ranking in featured snippets, John Mueller very likely would have known about it.” John’s answer was that “I don’t know. I can’t think of anything offhand.”

      This patent was granted nine days ago. It is possible that it hasn’t been implemented yet. I like the idea behind it that having both structured and unstructured data behind an answer can result in richer answers. I would like to see richer answers.

      I’ve been expecting to see Schema used in Featured Snippets since I wrote about natural language answers taken from data stores back in 2014. Now there is a patent that provides a good reason for Schema to be used. The process for updating Schema is independent of Google since it is a joint venture between Google, Bing, Yahoo, and Yandex. Google has said that they will start accepting speakable markup even though it is in the pending schema extentions.

      It is quite possible that if Google decides to use Schema as something that featured snippets do rely upon, that they will announce that officially. I don’t recall seeing anything specifically about it i the official Schema mailing list.

      I could see John Not being told about everything that appears in Google Patents about Schema. If Google decides to start implementing Candidate answer passages as described in the patent, I could see him learning about it.

  2. William F.

    January 24th, 2019 at 6:45 pm

    Hey Bob,

    “increases chances of showing a featured snippet”

    There are thousands of mark up classifications, one may be better suited depending on the type on content. So there is probably not a particular one that works best.

    He mentions clear structure on a page. Is that not what structured data is? But for the crawler/bot/algorithm.

    Perhaps they go hand in hand? A clearly labeled QA box, marked up with JSON would certainly be doing all of those things.

    “…featured snippets in particular I don’t think it has any type of markup specific to that.”

    “So that’s something where you have clear… structure on the page, that helps us a lot.”

    Reply

    • Bill Slawski

      January 24th, 2019 at 7:03 pm

      Hi William,

      Good points – there is no specific “featured snippet” markup, and I don’t expect a google patent to define what any particular schema should look like since it’s more than just Google building Schema. The score for answer passages would be based upon whether there is a textual passage on a page that answered a question, and schema that might contain facts that could be added to an answer. While there is specific schema for questions and for answers, they don’t quite fit the patent, which could be describing schema that made provide facts that the textual passage doesn’t, and wouldn’t necessarily have to be something that contains an answer by itself, or is called a Featured Snippet Schema.

  3. Pingback: How To Optimise For Voice Search - The Future Of SEO!

Leave a Comment