How Google May Respond to How-To Queries
Google has published a patent on how they might handle how-to queries
Make sure to also read the Google Developer’s page about How to Use HowTo structured data, for information specifically about how Google recommends developers implement How-to Structured Markup. That page defines the purpose of How-to markup for us:
A how-to walks users through a set of steps to successfully complete a task and can feature video, images, and text. For example, “How to tie a tie” or “How to tile a kitchen backsplash”. If each step in your how-to must be read in sequence, it’s a good sign that HowTo structured data could benefit your content. HowTo structured data is appropriate when the how-to is the main focus of the page.
Google has been granted a patent about “How-to” queries, which looked like it was worth sharing to show off what they might be thinking about in reference to showing off answers to How-to queries.
As the description of the patent immediately tells us:
This specification is directed generally to providing step-by-step instructions for completing a task based on analysis of many sources.
The developer’s page doesn’t include some information that the patent does, by telling us about how Google may analyzing many sources to provide “step-by-step instructions for completing a task based on analysis of those sources.”
This is the How-to behind the process of responding to a how-to query by Google:
- A how-to query related to performing a task and sources related to the how-to query may be identified
- Steps may be determined that may enable a user to perform the task
- The determination of the steps may be based on analysis of the sources related to the how-to query
- Confidence measures may be determined for sources
- The steps may be associated with the how-to query in a database
- Those steps may be provided to a searcher in response to the how-to query (or a similar query) being submitted by a searcher
- An analysis of the sources related to the how-to query may include comparing components of different sets of steps and identifying common elements to determine a set of steps
In more detail, the steps of the how-to query process include:
- Identifying a how-to query related to performing a task
- Identifying sources responsive to the how-to query
- Determining a confidence measure for one or more of the plurality of identified sources, the confidence measure of a given source indicative of the effectiveness of the given source in providing steps for the task of the how-to query
- Determining steps to perform the task based on the confidence measures for the identified sources
- Associating the steps with the how-to query and storing the steps to be provided in response to the how-to query
Some additional features behind this process could include:
When finding sources, one may be a user manual and that user manual may be considered one of the sources. The set of steps to perform a task may be based on the user manual.
Steps Behind a Response In a How-to Query
The method to respond to a how-to query may further involve:
- Identifying steps from a number of sources
- Identifying the steps from each group of steps
- Determining steps to perform a task based on those steps
- Determining similarity measures between the steps from the sources
- Determining which steps to show based on the similarity measures
Smilarity measures can be based on:
- Keyword matching
- Phrase matching
- Parse-tree matching
- Distributional similarity scores
- Edit distance scores
In some implementations the method may further comprise:
- Determining, for each step in each group of steps, a relevance score indicative of the confidence level of the step
- Identifying the one or more steps from the group of steps based on the relevance scores
How Steps Are Chosen in a How-To Query Response
The steps shown to perform a task may be chosen from a source based on the confidence measure of the source.
A confidence measure for a source may be based on:
- Ranking of the given source
- Frequency of visits to the given source
- Number of links to the given source
- Cohesiveness of the given source
- User feedback related to the given source
A quality measure may be determined for the set of steps that may be provided in response to the submitted query.
A query score may also be determined for the submitted query, based on the confidence that a searcher’s query indicates a desire to receive steps for completing a task indicated by the query; and a decision that showing steps in response to a query will meet a searcher’s need.
The method may further comprise:
- Determining one or more of a skill level required to perform the task
- Time duration for performing the task
- A list of tools needed to perform the task
- A list of materials needed to perform the task.
This patent about responding to how-to queries can be found at:
Determining a set of steps responsive to a how-to query
Inventors: Kerwell Liao, Nikhil Sharma, LaDawn Risenmay Jentzsch, and Jennifer Ellen Fernquist
Assignee: GOOGLE LLC
US Patent: 10,585,927
Granted: March 10, 2020
Filed: March 2, 2017
Methods and apparatus related to providing steps for completing a task based on an analysis of multiple sources. A how-to query related to performing a task and a plurality of sources related to the how-to query may be identified. A set of steps related to performing the task may be determined based on analysis of the plurality of sources that are related to the how-to query, optionally including determining a confidence measure for the plurality of sources. The set of steps may be associated with the how-to query in a database. The set of steps may be provided to a user in response to the how-to query submitted by the user. In some implementations, the analysis of the plurality of sources that are related to the how-to query may include comparing components of different sets of steps and identifying the common elements to determine a set of steps.
Task Terms and Inquiry Terms in How-to Queries
The patent points out some examples of what they refer to as task terms and inquiry terms:
- “how to remove tar from clothing” is a how-to query that includes task terms (“remove tar from clothing”) which identify the task of removing tar from clothing and include inquiry terms (“how to”) indicative of a desire for information that may be used in removing tar from clothing.
- “how do I change a car tire” is a how-to query that includes task terms (“change a car tire”) which identify the task of changing a car tire and include inquiry terms (“how do I”) indicative of a desire for information that may be used in changing a car tire.
How Google May Identify How-to Queries
1. Use of key terms or key phrases – These may be included in the query. For instance a prefix of the query may be matched to one or more inquiry terms, such as:
- “how to”
- “how do I”
- “how does one”
- “does anyone know”
- “where do I find instructions to”
- “where can I get instructions to”
- “can someone tell me”
- “teach me to”
- “tell me how”
2. Use of a prefix with inquiry terms and additional terms following the prefix, which may involve matching terms of the query to inquiry terms and also matching terms of the query to task terms. For example:
- The query “change flat tire?” may be identified as a how-to query based on matching of the terms “change flat tire” to the task phrase “changing a flat tire” and matching of the term “?” to the inquiry term “?”
- The query “remove tar instructions” may be identified as a how-to query based on matching of the terms “remove tar” to the task phrase “removing tar” and matching of the term “instructions” to the inquiry term “instructions”
3. Exact matching and/or soft matching between terms of a query and inquiry terms and/or task terms may be used.
Key terms and key phrases (including task terms and/or inquiry terms) that may be frequently included in how-to queries may be stored in a content database.
Task terms may be identified based on:
- Part-of-speech tagging
- Semantic analysis
- Syntactic analysis
- Other techniques
4. The frequency of inquiry terms and task terms included in a query, may be used to determine if a query is a how-to query.
Data related to the frequency of key terms in queries and/or frequency across another corpus of documents may optionally be stored in the content database and used to decide f a query is a how-to query.
- “how do I make a cake from scratch” may only be identified as a how-to query if the task terms “make a cake from scratch” occur with at least a threshold level of frequency in past queries.
5. The frequency of submission of a query may be used to decide if a query is a how-to query.
This frequency may be stored in the content database. A query could be identified as a how-to query if it has been submitted with at least a threshold level of frequency.
- “how do I make a cake from scratch” may be seen as a how-to query if it and variations of it meet a threshold level of queries in a number of past queries.
6. Similar how-to queries may be associated with each other
IThe associations between similar how-to queries may be stored in the content database. Similar how-to queries are queries that indicate a desire for information to perform similar tasks, such as:
- “how to remove tar from clothing”
- “how can I remove tar from clothing?”
- “how to remove tar from fabric”
- “remove tar stains”
- “stain buster–tar”
Similar how-to queries can be identified by comparing respective inquiry terms and/or task terms from them.
7. Analysis of search results and/or search result documents related to the how-to query
SERPs responding a query may be analyzed to determine if the search result has a document providing steps related to performing a task identified by the query.
So, the first highest ranked N number of search results might be analyzed to see if a threshold number of them include steps about performing a task identified by the query.
A page in those search results may be determined to have the highest selection rater for a query may be also be analyzed to see if it provides steps about perrforming a task identified by the query.
8. Similar queries may use some additional methods
These can involve:
- Keyword matching
- Phrase matching
- Contextual similarity matching of phrases
9. Similarity between terms can be determined other ways
These can include:
The semantic distance, or length of path along edges between the terms in an external resource such as a lexical database. The lexical database may include key terms and/or phrases including words, nouns, adjectives, verbs, adverbs, etc. and their conceptual and/or semantic inter-relationships. In some implementations, the key terms and/or phrases may be grouped based on the meaning of the key terms and/or phrases, and/or their syntactic relationships to other key terms and/or phrases. In some implementations, a database such as content database may include distributionally similar inquiry terms and/or task terms and their corresponding distributional similarity scores. Phrases that typically occur in similar contexts may, for example, be considered to have similar meanings. For example, a first phrase that co-occurs with the same words as that of a second phrase over a collection of documents, such as HTML web pages, may be considered to be distributionally similar to the second phrase. Identifying two or more queries as similar may be utilized in one or more steps of methods described herein. For example, queries may be grouped together in determining if a collective frequency of query is great enough to constitute identifying such queries as how-to queries and determining a set of steps to perform a task identified by such queries. Also, for example, identification of sources for determining the set of steps and/or any ranking associated with such sources may be based on a ranking of the sources for each of multiple similar queries. Also, for example, associating a set of steps with a how-to query may include associating the set of steps with similar how-to query.
Some Other Types of How-To Queries
The patent describes some other instances where How-to queries might be searched for by searchers. These can include:
- Installing a replacement part on a vehicle
- Installing complex software
- Performing a task related to a search (“how do I find a new house in the City?)
- A locational query related to a map
Confidence Measures Associated with Sources
A confidence measure for a source may indicate the effectiveness of that source in providing correct steps to complete a specific how-to query.
A confidence measure for a source may be based on:
- Timeliness of the given source (a timestamp indicating last time the source was updated.)
- The number of documents that link to the given source (indicating popularity or authoritative value of a source
- The number of outgoing links from the given source (Indicating the comprehensiveness of the source.)
- If based on outgoing links, it may also be based on a selection rate of the outgoing links
- An analysis of the cohesiveness of the given source
- How closely the given source relates to the task identified by the how-to query
- Anchor-text evidence (if a page contains links with anchor text that is similar to information about the task covered on the page.)
- The frequency of visits to the given source
- An analysis of the information-to-noise ratio of the given source. This noise could include things such as HTML tags, white space, unrelated links, sponsored advertisement, or content covering unrelated content
- The number of steps provided for completing the task of the how-to query (A larger number of steps may indicate comprehensiveness.)
- Based on the author and/or publisher associated with the given source (e.g., if the how-to query relates to a technical task, is the author and/or publisher a recognized authority for such a technical task?)
- Based on the author of the source and attributes of the author, such as appropriate technical qualifications and/or experience to provide authoritative information related to the how-to query)
- Techniques discussed here may be optionally combined
- A how-to query may use an answer found in an instruction manual that describes how to perform the task, and that manual may have the higest confidence measure as a source of the steps for performing the task
- One or more sources (highly ranked sources) may be used to include steps to perform a task with some steps from one source and some from another
- Some additional steps may be included as optional
- Some steps may be shown and labeled as less than ideal
- Information could be shown in a paragraph rather than a set of steps
- Natural language processing techniques may be used to segment a paragraph or other text segment into steps that perform at least a portion of the task
The patent also provides information about considering the similarities of steps that might be included in the completion of tasks in response to a how-to query, and about relevance scores for steps. A relevance score for a step may be based in part on a confidence measure for sources in which that step is identified. So, a step from a highly regarded technical manual may have a higher relevance score.
The relevance score for a group of steps could be based on the number of sources that identify a step corresponding to the group of steps as required to perform the task.
Individual steps to accomplish to perform a task may be based on upn confidence ratings such as “high confidence,” “medium confidence.” and “low confidence.” (based upon how often those steps appear in sources indicating that a step is a required step.
So the relevance of steps may be based upon both the confidence measures from sources and the number of sources that include those steps.
The steps must meet a threshold relevance score to be included as steps in the content database that answers to how-to queries may come from.
Attributes for How-to Query Answers
Attributes associated with a set of steps may be identified and displayed with those steps in response to a how-to-query. Examples can include:
- A title for the set of steps (e.g., “How to change a car tire”)
- A skill level (e.g., a person of driving age)
- An estimated time required (e.g., twenty-five minutes)
- Tools required (e.g., a jack and a wrench)
- Materials required to perform the task
- One or more sources (e.g., user manual) associated with the determined set of steps
- One or more cautionary statements (e.g., park car on a level surface, place stoppers behind tires to prevent rolling, apply hand brake).
The sources may include sources on which the set of steps is based and/or which are identified as conforming to one or more of the steps.
The patent includes more details about attributes that might be associated with steps and also quality measures for each individual step and each attribute that might be associated with a set of steps.
It also tells us about labels that might be associated with steps, such as “best guess” and “highest confidence,” or “lowest confidence.”
How-to queries takeaways
I’ve included a lot of the different aspects of this patent, but it has a lot of details, and I didn’t capture everything.
Read the patent to understand which are recommended, and hopefully, this post will make it easier to go through the patent.
One of the things I found very interesting in the process behind the patent was how much effort is taken in comparing different sources of information about tasks, and the steps to fulfill those tasks.
I think it is helpful in understanding why some answers may be better than others for how-to queries.