Google Ranking Search Results

by Posted @ Apr 14 2021

Twitter

How Google Responds to Queries when Ranking Search Results

A patent from Google filed in 2019 and granted in 2021 goes over aspects of Google ranking search results (SERPs).

This patent is assigned to Google, which started as the search engine BackRub founded by Lawrence Page and Sergey Brin when they were graduate students at Stanford, and Page filed a patent to be assigned to Stanford, and later licensed to Google on PageRank (named after him, and descriptive of how it acted as one part of a ranking system that was a query independent part of a ranking system for pages based on links pointing to those pages.)

The search engine is ranking search results in pages using page-level scores and can apply site-level scores based on quality.

Google does not apply a Domain Authority score such as the Moz DA metric because some sites on the Web are subdomains on sites such as WordPress.com. Google does not use a domain authority score for all of the sites that make up WordPress.com.

Google can apply scores such as site quality scores based on sites, but it does not do so on a domain level. Google does not rank Pages in SERPs using DA scores, and they never have.

This patent points out that the search engine can return public documents and privately accessible documents that only people logged on to the system or granted access to those private pages can access.

This more modern patent about rankings at Google also refers to Entities several times in the patent, which had become part of Google after 2012 when it introduced the Knowledge Graph and started to incorporate information about real-world entities into search results.

This patent was worth spending time with as a review of how a modern search engine works and the role that it may play in many people’s lives.

How a Search Engine Indexes Content and Ranks Search Results

The patent starts out by telling us that search engines provide information about various documents such as:

  • Web pages
  • Images
  • Text documents
  • Multimedia content
  • Other electronic communications

In response to a query, Google is ranking search results by returning one or more pages or other types of results in response to a query.

Those pages may be ranked based on relevance to a query or other ranking signals, which it will provide search results for.

This patent tells us about page features for pages that are responsive to a query, and query features for characteristics of a search result that corresponds to the given page responding to a query, to show the search result with a specific presentation characteristic.

This patent uses the phrase “presentation characteristics” frequently without really defining what they mean. For example, Google has developed many unique ways to present information when ranking search results to respond to queries. The patent doesn’t really define most of the differences between an organic search result with a title, a URL, and a snippet acting as a summary, and other ways of ranking search results using map pack results, definitions, featured snippets, related questions, knowledge panels, sitemaps, and universal search results. We have written about many of these different presentations even though they aren’t covered in this patent.

Google Ranks Search Results for Access Restricted Documents

One of the topics t this patent covers is what it refers to as “access restricted documents.” For example, Google allows a searcher to search through content associated with its user log-in at Google, such as Gmail, or content associated with Google My Business, such as data associated with a business that may assign someone to control under their log-in. In addition, a person may see search annotations associated with public or private data associated with their use of the search engine and access restricted documents.

This patent focuses more on accessing restricted documents than on other content they might see because they may be logged into the search engine. The focus of the patent is on the formatting of search results regardless of whether they are public or private documents, and so it covers both. It starts by focusing on private documents, but it covers search results primarily.

Some pages that Google is ranking search results to respond to a query could be access restricted documents. These would be accessible to only the searcher who submitted the query and optionally to other users designated by that user.

The patent notes measures associated with page features or query features determine a presentation characteristic.

That measure is based on past interactions by users of other pages that share page features with the given page, where many of the other pages are each different from the given page (and each different from one another).

Looking at such measures enables past interactions with other pages to be leveraged in determining interaction-based relevance of the given page, without reference to any query-based past interactions directed to the given page.

The other pages can include, or be restricted to, pages that are themselves access restricted.

How Google is Ranking Search Results Using Query Dependent and Query Independent Features

This patent also refers to “Query Dependent” and “Query Independent” aspects of ranking signals that can play a part in how pages may be ranked when a searcher enters a query into a search box.

When a page has had many other pages linked to it, it is considered a more important page under the PageRank approach. A link to a page is like a footnote on a paper that points to the source of information used on that page. The more citations like that, and the better the citation source, the more important that page is considered to be. This effect of links pointing to a page is considered to be query-independent because it is used to create a score that is the same regardless of the relevance of the link (and its anchor text) to the page being linked to. This query-independent PageRank score is combined with a relevance score to rank a page in response to a query.

A query-dependent feature for a page is usually based on an information retrieval score that looks at the relevance (and meaning) of words on a page and the anchor text of links pointing to that page. You will see that this new patent points to query features and document features in a search system and references query dependent and query independent features. A determination about whether relevance (or meaning) makes a difference is important for those.

Back to that ranking search results patent…

To determine a presentation characteristic of a search result that corresponds to whether a given page is responsive to a query, a query dependent measure for the given page is generated and used to determine the presentation characteristic.

A query-dependent measure may determine a score for the given page. Then, that score can be used to rank the given page relative to other responsive pages for the query (e.g., based on their corresponding scores, which may also be based on corresponding query dependent measures).

A query-dependent measure may change an initial score for the given page (e.g., a score based on a degree of matching between the query and the given page). The modified score is used to rank the given page relative to other responsive pages for the query.

The ranking may determine which responsive pages are used in providing corresponding search results for presentation in response to the query and/or to determine a presentation order (or other display prominences) for the search results.

The query dependent measure for a given page responsive to a query can be decided based on past interactions between query features of the query and page features of the given page.

Each of the measures may be based on several past interactions, by corresponding users, with other pages having the page features when the other pages were presented in response to corresponding queries having one or more of the query features.

Various past interactions between query features and page features may be used to determine measures such as:

  • Search results selections for other pages in response to the corresponding queries (e.g., a clicked to observed fraction)
  • Page access counts
  • Cursor tracking
  • Touch gestures

Other pages may include or be restricted to access restricted documents, such as non-accessible pages that are each person to one of the other searchers and not accessible to the searcher.

How Google Ranks Search Results Using Query Independent Ranking Signals

A query independent measure for a page can be generated and used to determine the presentation characteristic. It will do so regardless of whether it is relevant to the query.

The query independent measure can be based on past interactions, by other searchers, with other pages that possess features of the given page when shown the other pages in response to corresponding queries that include queries without any query features.

That query independent feature may state the popularity of pages with the same page features. The query dependent measure provides a sign of the popularity of pages having page features in response to queries having the query features.

A method may show receiving a query entered by a searcher via a searcher interface input device of the user’s computing device.

The method may further include identifying responsive pages responsive to the query, including an email sent to an email address of the searcher.

The method further includes identifying one or more page features for the email.

The page features include at least one email feature based on at least one of:

From content, based on its presence in a “From field” of the email

Subject content, based on its presence in a “Subject field” of the email.

The method further includes identifying features for the query and generate a query dependent measure for the email based on measures of past interactions between the query features and the page features, where each of the measures is based on past interactions, by corresponding users, with other pages having the page features when the other pages were presented in response to corresponding queries having one or more of the query features.

The ranking search results method further includes:

Using the query dependent measure for the email to determine a presentation characteristic for presenting an email search result that corresponds to the email
Providing, in response to the query, the email search result for presentation with the presentation characteristic.

This method and other technology implementations disclosed herein may include one or more of the following features.

The email feature is based on both the From content in the From field and the Subject content in the Subject field.

The email feature may be a co-occurrence of the From content in the From field and the Subject content in the Subject field.

The From content may include a domain name of a sender email address of the email, and/or the Subject content may include a template that includes one or more terms and one or more placeholders.

At least one email feature is based on the Subject content in the Subject field, and the Subject content includes a template that includes one or more terms and one or more placeholders.

In some implementations, the other documents on which the measures are based exclude the email.

The method further includes:

Generating a query independent measure for the email based on more measures of additional past interactions with the document features in response to additional queries not having any of the query features
Using the query independent measure for the email to determine the presentation characteristic for presenting the email search result corresponds to the email.

Using the query dependent measure for the email to determine the presentation characteristic comprises:

  • Determining a score for the email based on the query dependent measure
  • Determining additional scores for other of the responsive documents
  • Ranking the email relative to the other of the responsive documents based on the score and the additional scores
  • Determining the presentation characteristic based on the ranking.

The document features may further include a category of the email. The method may include using a machine learning model to determine the category of the email.

Past interactions with other documents having the document features can include selections of the other documents.

A method includes receiving a query entered by a user via a user interface input device of a computing device and identifying responsive documents responsive to the query.

The ranking search results method further involves query features for the query and, for each of many the access restricted documents:

  • Identifying document features for the access restricted document
  • Generating a query dependent measure for the access restricted document based on measures of past interactions between the query features and the document features, where each of the measures is based on a number of the past interactions, by corresponding users, with other documents having the document features when the other documents were presented in response to corresponding queries having query features, and where the other documents may include many non-accessible documents that are not accessible to the user

The method further includes using the query dependent measures for the access restricted documents to determine a presentation order for the responsive documents and providing, in response to the query, one or more of the responsive documents for presentation based on the presentation order.

This method and other technology implementations disclosed herein may include one or more of the following features.

The document features for the access restricted document may comprise a template included in a particular field of the access restricted document.

The other documents may exclude one or more of the access restricted documents.

The other documents on which a given measure of the measures is based may consist of non-accessible documents that are not accessible to the user.

The method further includes: for each of the access restricted documents, generating a query independent measure for the access restricted document based on more measures of more past interactions with the document features in response to more queries not having any of the query features; and further using the query independent measures for the access restricted documents to determine the presentation order for the responsive documents.

A method may include:

  • Receiving a query entered by a user via a user interface input device of a computing device of the user
  • Identifying responsive documents that are responsive to the query
  • Identifying query features for the query

The method further includes, for each of a number of the documents:

  • Identifying document features for the document
  • Generating a query dependent measure for the document based on measures of past interactions between the query features and the document features, where each of the measures is based on a quantity of the past interactions, by corresponding users, with other documents having document features when the other documents were presented in response to corresponding queries having the query features, and where the other documents include many documents in addition to the document

The method further includes using the query dependent measures for the documents to determine a presentation order for the responsive documents and providing, in response to the query, one or more of the responsive documents for presentation based on the presentation order.

This ranking search results method may include:

  • Selecting several document features and selecting some query features.
  • Selecting each of the document features includes selecting the document feature based on its occurrence in access restricted documents of at least a threshold quantity of users
  • Selecting query features includes selecting the query feature based on its occurrence in access restricted queries of at least a threshold quantity of users
  • The access-restricted queries are those for which at least one of the access restricted provided documents in response

The method further includes, for each of several query features, document feature tuples that each include at least one of the query features and at least one of the document features: generating a past interaction measure between the query features and the document features of the query feature, document feature tuple.

Generating the past interaction measure is based on many past interactions with corresponding documents of the access restricted documents when the corresponding documents were presented in response to corresponding queries of the access restricted queries, where the corresponding documents have the document features of the query feature, document feature tuple, and where the corresponding queries have the query feature of the query feature, document feature tuple.

The method further includes storing, in one or more computer-readable media, each of the past interaction measures associated with a corresponding query feature, document feature tuple.

This method and other technology implementations disclosed herein may include one or more of the following features.

The method further includes:

  • Identifying a new document that is responsive to a new query of a given user and that includes a new query group of the document features
  • Generating a measure for the new document based on a group of the past interaction measures

The group of the past interaction measures may be based on the past interaction measures of the group in association with query feature, document feature tuples that each include at least one of the document features of the new query group.

The method further includes providing the new document in response to the new query based on the measure.

The group of the past interaction measures is further selected based on past interaction measures of the group in association with query feature, document feature tuples that each include at least one query feature of the new query.

The new document may be omitted from the access restricted documents used in generating the past interaction measures.

A method can include selecting several document features and selecting a plurality of query features.

The method further includes, for each of many query feature, document feature tuples that each include at least one of the query features and at least one of the document features: generating a past interaction measure between the query features and the document features of the query feature, document feature tuple, where: generating the past interaction measure is based on several past interactions with corresponding documents when the corresponding documents were presented in response to corresponding queries; the corresponding documents have the document features of the query feature, document feature tuple; and the corresponding queries have the query feature of the query feature, document feature tuple.

The method further includes storing, in one or more computer-readable media, each of the past interaction measures associated with a corresponding query feature, document feature tuple.

Other implementations may include one or more non-transitory computer-readable storage media storing instructions executable by one or more processors to perform a method such as one or more of the methods described herein.

Yet another implementation may include a system including memory and one or more processors operable to execute instructions stored in the memory, to perform a method such as one or more of the methods described herein.

All combinations of the foregoing concepts and additional concepts described on ranking search results are considered part of the subject matter disclosed herein.

The method further includes storing, in one or more computer-readable media, each of the past interaction measures associated with a corresponding query feature, document feature tuple.

This method and other technology implementations disclosed herein may include one or more of the following features.

In some implementations, the ranking search results method further includes:

  • Identifying a new document that is responsive to a new query of a given user and that includes a new query group of the document features
  • Generating a measure for the new document based on a group of the past interaction measures

The group of the past interaction measures may be selected based on the past interaction measures of the group being stored in association with query feature, document feature tuples that each include at least one of the document features of the new query group.

The method further includes providing the new document in response to the new query based on the measure.

The group of the past interaction measures may further be selected based on the past interaction measures of the group being stored in association with query feature, document feature tuples that each include at least one query feature of the new query.

The new document can be omitted from the access restricted documents used in generating the past interaction measures.

A method can be provided, including selecting a plurality of document features and selecting a plurality of query features.

The method may further include, for each of many query feature, document feature tuples that each include at least one of the query features and at least one of the document features: generating a past interaction measure between the query features and the document features of the query feature, document feature tuple, where: generating the past interaction measure is based on many past interactions with corresponding documents when the corresponding documents were presented in response to corresponding queries; the corresponding documents have the document features of the query feature, document feature tuple; and the corresponding queries have the query feature of the query feature, document feature tuple.

The method would further include storing, in one or more computer-readable media, each of the past interaction measures in association with a corresponding query feature, document feature tuple.

Other implementations may include one or more non-transitory computer-readable storage media storing instructions executable by one or more processors to perform a method such as one or more of the methods described herein.

Yet another implementation may include a system including memory and one or more processors operable to execute instructions stored in the memory, to perform a method such as one or more of the methods described herein.

All combinations of the foregoing concepts and additional concepts described in greater detail herein are considered part of the subject matter disclosed herein.
For example, all combinations of claimed subject matter appearing at the end of this disclosure are considered part of the subject matter disclosed herein.

This patent on how Google ranks search results is found at:

Ranking search result documents
Inventors: Mike Bendersky, Marc Alexander Najork, Donald Metzler, and Xuanhui Wang
Assignee: GOOGLE LLC
US Patent: 10,970,293
Granted: April 6, 2021
Filed: August 26, 2019

Abstract

Methods and apparatus related to using document feature(s) of a document that is responsive to a query, and optionally query feature(s) of the query, to determine a presentation characteristic for presenting a search result that corresponds to the document.

Measures associated with the document feature(s) and/or query feature(s) may be used to determine the presentation characteristic.

The measures may be based on past interactions, by corresponding users, with other documents that share one or more of the document’s features. A—– plurality of the other documents is different from the document (and optionally e, each different from one another).

In some implementations, the document and/or the other documents include, or are restricted to, documents that are access restricted.

Ranking Search Results of Access Restricted Documents

The patent tells us that some processes to rank search results from the patent may be applicable to access restricted documents. This would mean documents that a searcher may have been able to search, such as his or her own emails.

We are also told about those “access restricted documents” that can be contrasted with publicly accessible documents, which would be freely accessible to the public via the World Wide Web. We aren’t electronic documents accessible to a restricted group of users.

Access to access restricted documents may be restricted for a restricted group of users based on login credentials of the restricted group of users, and the access restricted document being accessible via a private network that is accessible to only the restricted group of users and/or based on other techniques.

These “access restricted documents of a user” are access restricted and accessible to only the user and optionally to a restricted group of one or more other users that can be designated or otherwise controlled by the user.

An access restricted document of a user may be accessible to only the user as a function of:

  • Being stored locally on a computing device controlled by the user
  • Being accessible via one or more computer applications via appropriate login credentials of the user, etc.

For instance, the user’s emails may be access restricted documents of the user access to only the user via appropriate login credentials of the user.

Also, heterogeneous documents of a user stored in a cloud-based storage system may be access restricted documents of the user that are accessible to only the user via appropriate login credentials of the user.

Optionally, one or more of the heterogeneous documents may also be accessible to a restricted group of other users based on an explicit authorization by the user via one or more computer applications. For instance, shared documents on a program such as Google Documents.)

We are also told that various documents stored locally on a mobile phone, tablet, desktop, and/or other computing devices of a user may be access restricted documents of the user due to being stored locally on the user’s computing device(s).

User Interaction Data Used in Ranking Search Results

The patent also tells us about user interaction data (e.g., click-through rate) used to rank particular publicly accessible search result documents for particular queries.

This user interaction data may indicate that for a particular search query, a particular publicly accessible search result document responsive to the particular search query has a click-through rate for that particular search query that far exceeds that of any other publicly accessible search result documents that are responsive to the particular search query.

Based on that indication, a search result corresponding to the particular publicly accessible search result document may be ranked more prominently (e.g., provided for presentation more prominently), for the particular search query, than search results for the other responsive publicly accessible search result documents.

Some User Interaction Data May Not be used in Ranking Search Results for Limited Access Data

The patent then tells us that some techniques related to using user interaction data to rank publicly accessible search results for particular queries may not apply to various documents and/or may not provide desired performance.

Various techniques may not apply to various access restricted documents (e.g., access restricted documents of a user submitting a query) and/or to various publicly accessible documents (e.g., publicly accessible documents with no and/or relatively few interactions in response to queries).

Assume someone submits a search query to search their personal email and that several responsive emails (that are access restricted documents of the user) are returned as responsive to the search query (e.g., the emails include one or more terms that match one or more terms of the search query).

One or more of the responsive emails may have never been presented and/or interacted with in response to prior searches of other users and/or of the user.

A particular email may have been sent only to the user and maybe one with which the user has never previously interacted in response to a prior search query.

So, there may not be any user interaction data associated with the particular email, rendering various techniques related to using user interaction data to rank publicly accessible search results ineffective in ranking the particular email.

As another example, assume a searcher submits a query to search a corpus of access restricted documents accessible to a restricted group of users. Again, a plurality of responsive documents is identified as responsive to the search query.

One or more of the responsive documents may have never been presented and/or interacted with in response to prior submissions of the search query and/or may have been presented and/or interacted with only a de minimis amount prior submissions of the search query.

There may not be sufficient searcher interaction data associated with such pages in response to the search query, rendering various techniques related to using user interaction data to rank publicly accessible search results ineffective in ranking such documents.

Another example is assuming someone submits a search query to search a corpus of publicly accessible documents. Again, a plurality of responsive documents is identified as responsive to the search query.

The responsive documents may have never been presented and/or interacted with in response to prior submissions of the search query and/or may have been presented and/or interacted with only a de minimis amount in response to prior submissions the search query.

So, there may not be sufficient user interaction data associated with such documents in response to the search query, related to using user interaction data to rank publicly accessible search results, ineffective to ranking such documents.

This patent presents various technical features related to using document features of a document that is responsive to a query, and optionally query features of the query, to determine a presentation characteristic for presenting a search result that corresponds to the document–and, in response to the query, providing the search result for presentation with the presentation characteristic.

Measures that are associated with the document feature) and/or the query features may be used to determine the presentation characteristic.

Those measures may be based on past interactions, from previous searchers, with other documents sharing document features with the document. Many other documents are different from the document (and optionally, each different from one another).

Using such measures enables those past interactions to be leveraged in determining interaction-based relevance of the access restricted document, optionally without reference to any past interactions that are specifically directed to the access restricted document.

The other documents can include or are restricted to documents that are themselves access restricted.

In determining a presentation characteristic of a search result corresponding to a responsive document, a query-dependent measure for the access restricted document is generated and used to determine the presentation characteristic.

The query dependent measure can be based on past interactions between query features of the query and document features of the document.

Each of the measures may be based on a quantity of the past interactions, by previous searchers, with other documents having one or more of the document features when the other documents were presented in response to corresponding queries having one or more of the query features.

For example, assume a searcher uses an email search interface to submit a “book order number.”

A corpus of the searcher’s emails that access restricted documents of the user may be searched, and a plurality of responsive emails is identified as responsive to the query.

A particular responsive email may be from “store@exampleurl.com,” may include a subject of “Confirmation of Order 1A2B3C”, and may include a body with content that identifies a particular book purchased by the user, along with details of the purchase (e.g., date of purchase, shipping address, delivery date, cost).

The particular responsive email may have never been interacted with by other users in response to queries of the other users (i.e., since it is personal to the user and not accessible to the other users)–and may have potentially never even been interacted with by the user in response to a query of the user. However, techniques described there may still be used to determine a query-dependent measure for the particular email based on past interactions between query features of the query “book order number” and document features of the particular email.

For example, the first measure of past interactions may be determined based on several interactions of multiple users with other emails that include “store@exampleurl.com” in a From field and “Confirmation of Order [#]” (where [#] is a placeholder indicating an alpha and/or numeric string) in a Subject field, when those other emails were presented in response to corresponding queries having n-grams of “book order.”

Also, for example, the second measure of interactions may be determined based on many interactions of multiple users with other emails that include “store@exampleurl.com” in a From field and “Confirmation of Order [#]” in a Subject field, when those other emails were presented in response to corresponding queries having n-grams of “order number.”

The query dependent measure may be generated based on the first measure, the second measure, and optionally other similarly determined measures.

The query dependent measure may be a sum, average, median, or other statistical combination of the measures.

The query-dependent measure may be used to determine a presentation characteristic for the particular responsive email.

The query dependent measure may be utilized to modify an initial score for the particular responsive email (e.g., a score based on a degree of matching between the query and the particular email), and the score utilized to rank the particular email relative to other responsive emails (e.g., based on optionally modified initial scores for those emails).

The ranking may be used to determine which responsive emails are initially used to provide corresponding search results for presentation in response to the query, to determine a presentation order (or other display prominences) for the search results, and/or to determine additional or alternative presentation characteristics for search results.

Query Independent Measures to Rank Search Results

A query independent measure for the document is generated and could be used to determine the presentation characteristic.

The query independent measure could be based on measures of past interactions, by previous searchers, with other documents having one or more of the document features of the document when the other documents were presented in response to corresponding queries, where those queries include or are restricted to, those that do not include any of the query features.

The query independent measure may indicate the overall popularity of documents having the document features. The query dependent measure indicates the popularity of documents having the document features in response to queries having the query features.

A query dependent measure or a query independent measure of a document may be generated based on a query feature document feature model.

The query feature document feature model may be generated based on a query-document model, a document-feature model, and/or a query-feature model.

The query-document model may be a bipartite graph that models the interactions between queries and documents, as indicated by one or more stored records of past queries and corresponding interactions.

The nodes of the query-document graph may indicate queries and documents.

The edges may be between query and document nodes. They may each represent whether the corresponding document was observed for the corresponding query (e.g., a corresponding search result presented in response to the corresponding query) and/or whether the document was interacted with (e.g., selection of a corresponding search result) for the corresponding query.

The document-feature model may be a bipartite graph that models the relationship between documents and their document features.

Various features may be utilized, such as category features, structural features, and/or n-gram features.

The category features of a document may indicate categories to which the document belongs and may be based on applying features of the document to a classifier or other machine learning model and determining the category features based on the output generated over the machine learning model.

As an example of categories, emails may belong to finance, travel, order confirmation, and/or other categories.

Structural features may indicate templates and/or other contents of particular structural fields of documents.

For emails or other electronic communications, structural features may include:

  • From content included in a From field of the electronic communication (e.g., a domain name of a sender’s email address, a relationship of the sender to the user)
  • Subject content included in a Subject field of the electronic communication (e.g., a particular template to which the Subject field conforms such as “Confirmation of Order [#]”)
  • A co-occurrence of particular From content and particular Subject content (i.e., the From Content and the Subject Content both occurring in their respective fields)

Also, for example, structural features of an access restricted document may include a file type feature that is based on a file extension of the access restricted document.

Other structural features may include content, such as template(s) and/or n-grams that appear in one or more particular additional and/or alternative fields of a document, such as in a title field of a document; in a title, location, and/or notes field of a calendar entry document; etc.

The query-feature model may be a bipartite graph that models the relationship between queries and their query features.

The query features of a query may include:

  • N-grams appearing in the query (e.g., the longest n-gram appearing in the query)
  • Entities referenced in the query (e.g., a particular person, place, and/or thing)
  • Entity categories referenced in the query (e.g., city, person’s name, location, restaurant)
  • Grammatical features of the query, etc.

The query feature–document feature model may be a bipartite graph generated using the query-document graph, the document-feature graph, and the query-feature graph.

The query feature–document feature model can model the interactions between document features and query features.

In other words, it can model interactions between document and query features instead of interactions directly between queries and documents.

It is generated based on transforming the query-document model to the “document features” and “query features” space collectively modeled by the document-feature and query-feature models.

Only features (query or document) present in at least a threshold number of times (in queries or documents) and/or for at least a threshold number of searchers may be used in generating the query-feature, document-feature, and/or the query feature–document feature graphs.

This may ensure features do not include sensitive information by ensuring those features occur at least a threshold number of times and/or for at least a threshold number of users.

The query feature–document feature model may be used to determine, for a given document, a query independent measure and/or query dependent measure for the given document.

This may be done to determine a query-dependent measure for a given query. For example, having given query features, edges between the given query features and document features of the given document may be determined.

Each of the edges provides a measure of past interactions between a corresponding query feature and a corresponding document feature.

The measures may be combined (e.g., summed and/or other statistical combinations) to determine the query dependent measure.

Also, to determine a query independent measure for the given document, edges between all query features and document features may be determined.

The measures may be combined (e.g., summed and/or other statistical combinations) to determine the query independent measure.

Where Techniques Disclosed in the Ranking Search Results patent may be used

This example environment can include:

  • A client device
  • A search system
  • A past interaction measures system
  • A document measure system

The example environment also includes the personal corpora of a searcher of the client device.

The personal corpora may be stored on one or more corresponding non-transitory computer-readable media, which may be on the client device and/or remote from the client device (e.g., on one or more remote servers).

The personal corpora may each store one or more access restricted documents of the user such as electronic communications of the user (e.g., emails, SMS messages, chat messages, social networking messages), media files (e.g., audio files, image files, video files), word processing documents, calendar entries, contact entries, etc.

The example environment also includes a query-document model stored on one or more non-transitory computer-readable media.

The query-document model may be a bipartite graph that models the interactions between queries and documents (including, or restricted to, access restricted documents), as indicated by one or more stored records of past queries and corresponding interactions.

For example, the query-document model may be generated based on records of past queries and corresponding interactions provided by the search system and/or other search systems based on interactions with the search system(s) by multiple users via multiple corresponding client devices.

The example environment further includes additional models generated by the past interaction measures system and utilized by the document measure system.

For example, the additional models may include at least a query feature–document feature model.

A searcher of the client device can submit queries to the search system via one or more user interface input devices of the client device.

In response to a query from the client device, the search system searches the personal corpora to identify access restricted documents of the user that are responsive to the search query using conventional and/or other information retrieval techniques.

The personal corpora may include an index that indexes documents thereof based on one or more features, and the search system identifies responsive documents using the index.

The search system searches corpora that include, or are restricted to; access restricted documents that are not access restricted documents of the user and/or publicly accessible documents.

The search system ranks scores for the documents responsive to a search query using one or more ranking signals.

Each of the ranking signals provides information about the document itself and its relationship, and the search query.

The ranking search results approach which the ranking engine uses to calculate scores for a document includes both a query dependent measure and/or a query independent measure that is generated by the document measure system according to the patent.

The ranking engine may use additional ranking signals, such as those indicating a degree of matching between the given document and the search query.

Ranking search signals for a document may be based on:

  • Whether each of one or more query terms appears in the document
  • Where each of one or more query terms appear in the document
  • The term frequency of each of one or more of the query terms that appear in the document
  • The document frequency of each of one or more of the query terms that appear in the document

The ranking engine then ranks the responsive documents using the scores.

The search system uses the responsive documents ranked by the ranking engine to generate search results to provide in response to the query.

The search results include search results corresponding to the documents responsive to the search query.

Ranking search results can include:

  • A title of a document
  • A link to a document
  • A summary of the document
  • The summary of the content may include a “snippet” or section of the document that is responsive to the search query.

    For a search result associated with an image document, the search result may include:

    • A reduced size display of the image document
    • A title associated with the image document
    • A link to the image document

    For a search results associated with a video, the search result may include:

    • An image from the video
    • A segment of the video
    • A title of the video
    • A link to the video

    Other Search Results

    Other search results include a summary of information responsive to the search query.

    The summary is generated from documents responsive to the search query and/or from other sources.

    Those search results are shown in a form enabling them to be presented to a searcher using searcher interface output devices on the client device.

    How Ranking Search Results (SERPs) is Done

    The document measure system may include:

    • A document features engine
    • A query features engine
    • A query dependent measure engine
    • A query independent measure engine

    The patent tells us that various query features may be identified, such as:

    • N-grams appearing in the query
    • Entities referenced in the query
    • Entity categories referenced in the query
    • Grammatical features
    • Etc.

    The query-dependent measure engine generates a query-dependent measure for each of the documents.

    In determining a query dependent measure for a document, the query dependent measure engine determines past interaction measures that are assigned, in the model, to the query features and document features determined by engines.

    For example, assume query features QF1 and QF2 for a query (where QF indicates a query feature) and document features DF1, DF2, and DF3 for an access restricted document responsive to the query (where DF indicates a document feature).

    The query-dependent measure engine may determine a past interaction measure for each QF1 and DF1, QF1 and DF2, QF1 and DF3, QF2 and DF1, QF2 and DF2, and QF2 and DF3.

    The query dependent measure engine may generate the query dependent measure for the access restricted document based on a combination of the six separate past interaction measures.

    Each of the past interaction measures utilized by the query dependent measure engine may be based on a quantity of the past interactions, by corresponding users, with other documents having one or more of the document features when the other documents were presented in response to corresponding queries having one or more of the query features.

    The other documents themselves may include, or be restricted to, a plurality of access restricted documents, such as non-accessible documents that are each personal to a corresponding one of the other users and that are not accessible to the user. Additional description of generating past interaction measures is provided herein.

    The query independent measure engine generates a query independent measure for each of the documents.

    In determining a query independent measure for a document, the query independent measure engine determines past interaction measures that are assigned, in the model, to a group of query features and the document features determined by the engine.

    The group of query features includes or is restricted to query features in addition to those determined by the query features engine.

    Accordingly, the group of query features is independent of the query for which the page is responsive. It includes query features that are in addition to query features of the query. As one example, assume the document features DF1, DF2, and DF3 for an access restricted document (where DF indicates a document feature).

    The query independent measure engine may determine:

    • All of the past interaction measures between the group of query features and DF1
    • All of the past interaction measures between the group of query features and DF2
    • All of the past interaction measures between the group of query features and DF3

    For instance, assume the group of query features includes query features QF1-QF1000.

    For DF1 past interactions, measures may be determined for QF1 and DF1, QF2 and DF1, QF3 and DF1, . . . , and QF1000 and DF1.

    The query independent measure engine may generate the query dependent measure based on past interaction measures.

    The document measure system provides the query dependent measure and/or the query independent measure for each of the documents’ search systems.

    The ranking engine may utilize the query dependent measures and/or the query independent measures in ranking the documents and may use the ranking in determining a presentation order and/or other presentation characteristics for search results for the documents.

    The ranking engine utilizes the query dependent measure and/or the query independent measure to determine a score for the document and uses the score to rank the document.
    For example, the ranking engine may adjust a base score for the document (e.g., a base score based on other ranking signals) because of the query dependent measure and/or the query independent measure to create a modified score.

    For example, assume a base score of sc.sub.b for a document for a query.

    This base score can be based on keyword matching and/or other ranking signals.

    The ranking engine may determine a final score, sc.sub.f based on f(sc.sub.b, M.sub.d, M.sub.q,d) where M.sub.d represents the query dependent measure for the document and where M.sub.q,d represents the query independent measure for the document.

    f( ) can optionally be a hand-tuned score or a machine-learned ranking function.

    In some implementations, the ranking engine keeps the base score (sc.sub.b) fixed, and trains an adjustment .delta.(M.sub.d, M.sub.q,d) over the base score sc.sub.b.

    The scoring function f( ) thus becomes: f (sc.sub.b, M.sub.d, M.sub.q,d)=SC.sub.b+.delta.(M.sub.d, M.sub.q,d).

    This adaptive formulation may be beneficial for environments where the base score is already highly optimized and optionally disjoint with the query independent and/or query dependent measures.

    Past interaction measures system may include a query-document model engine, a document-feature model engine, a query-feature model engine, and/or a query feature–document feature model engine.

    All aspects of engines may be omitted, combined, and/or implemented separately from past interaction measures system.

    The query-document model engine generates the query-document model.

    The search system may implement the query-document model engine.

    The query-document model may be, for example, a bipartite graph that models the interactions between queries and documents, as indicated by one or more stored records of past queries and corresponding interactions.

    For example, the nodes of the query-document graph may indicate queries and documents.

    The edges may be between query and document nodes. They may each represent, for example, whether the corresponding document was observed for the corresponding query (e.g., a corresponding search result presented in response to the corresponding query) and/or whether the document was interacted with (e.g., selection of a corresponding search result) for the corresponding query.

    In some implementations, each of the edges may include a binary representation of whether an interaction occurred.

    The edges may be weighted based on the type of interaction.

    A selection of a search result followed by access of the underlying document for at least a threshold time duration may be weighted more heavily than a selection that is followed by access of the underlying document that is not for the threshold time duration, which may be weighted more heavily than a “hover” over the search result without a resulting selection.

    The query-document model may be represented by a triple (), where the set of query nodes representing corresponding queries is the set of document nodes representing corresponding documents. The edge set represents the edges connecting the query nodes and document nodes.

    The edges in the edge set may be parameterized by tuples of the form e(q, d)=<.gamma..sub.o(q, d),.gamma..sub.c(q, d)>, where q represents a query node connected by the edge, d represents a document node connected by the edge, and parameterization functions .gamma..sub.o (a, b) and .gamma..sub.c(a, b) indicate that entities a and b were observed or clicked in the same search session, respectively.

    In this specification, the term “graph” will be used broadly to refer to any mapping of a plurality of associated information items.

    A graph, or a portion of a graph, need not be present in a single storage device and may include pointers or other indications of information items that may be present on other storage devices.
    For example, a graph may include multiple nodes mapped to one another. Each node includes an identifier of an entity or other information item that may be present in another data structure and/or another storage medium.

    The document-feature model engine generates a document-feature model that may include in the model(s).

    The document-feature model engine may generate the document features based on documents included in the query-document model.

    For example, for each of the documents of the query-document model, the engine may identify one or more document features and define a relationship between the document and its document features. The document-feature model may be, for example, a bipartite graph that models the relationship between documents and their document features.

    For example, a first node in the model may represent a document feature. That node may be connected, by corresponding edges, to each of a plurality of document nodes that each represent a corresponding document that includes the document feature.

    The edges may indicate whether a corresponding feature is present in a corresponding document, and optionally a weight of the corresponding feature for the corresponding document (e.g., for a category feature, the weight may indicate how strongly the document is associated with the category).

    Various features may be used in ranking search results, such as:

    • Category features
    • Structural features
    • N-gram features

    In some implementations, the document-feature model may be represented by a triple (, A.sup.D, ), where is the set of document nodes representing corresponding documents where A.sup.D is the set of document feature nodes representing the set of document features, and the edge set .epsilon..sup.D represents the edges connecting the document nodes and the document feature nodes.

    The edges in the edge set .epsilon..sup.D may be parameterized by e(d, a.sub.ij.sup.d), where e(d, a.sub.ij.sup.d) indicates whether a corresponding feature is present in a corresponding document, and optionally a weight of the corresponding feature for the corresponding document.

    The query-feature model engine generates a query-feature model that may include in the model(s).

    The query-feature model engine may generate the features for queries that are included in the query-document model.

    For example, for each of the queries of the query-document model, the engine may identify one or more query features and define a relationship between the query and its query features.

    The query-feature model may be, for example, a bipartite graph that models the relationship between queries and their query features.

    For example, a first node in the model may represent a query feature. That node may be connected, by corresponding edges, to each of a plurality of query nodes that each represent a corresponding query that includes the query feature.

    The edges may indicate whether a corresponding feature is present in a corresponding query and optionally a weight of the corresponding feature for the corresponding query.

    Various features may be used to rank search results, such as:

    • N-grams appearing in the query
    • Entities referenced in the query
    • Entity categories referenced in the query
    • Grammatical features of the query
    • Etc.

    The query-feature model may be represented by a triple (), where the set of query nodes represents corresponding queries. Thus, the set of query features represents the set of query features, and the edge set represents the edges connecting the query nodes and the query feature nodes.

    The edges in the edge set may be parameterized by e(q, a.sub.kl.sup.q), where e(q, a.sub.kl.sup.q) indicates whether a corresponding query feature is present in a corresponding query, and optionally a weight of the corresponding feature for the corresponding query.

    The query feature–document feature model engine generates a query feature–document feature model that may include the model(s).

    The query feature–document feature model may be, for example, a bipartite graph that is generated using the query-document graph, the document-feature graph, and the query-feature graph.

    The query feature–document feature model models the interactions between document features and query features.

    In other words, it models interactions between document and query features instead of interactions directly between queries and documents.

    It is generated based on transforming the query-document model to the “document features” and “query features” space collectively modeled by the document-feature and query-feature models.

    The query feature-document feature model may be represented by a triple (, .epsilon..sup.A), where is the set of query feature nodes representing the set of query features, A.sup.D is the set of document feature nodes representing the set of document attributes, and the edge set .epsilon..sup.A represents the edges connecting the query feature nodes and the document feature nodes.

    The edges in the edge set .epsilon..sup.A each has a weight or other measure based on the number of past interactions between the query feature of the corresponding query feature node and the document feature of the corresponding document feature nodes.

    The edges in the edge set .epsilon..sup.A may be parameterized by:

    .function..function..times..function..times..function..gamma..function..g- amma..function. ##EQU00001## where the edge functions e( ) are each defined as set forth above.

    As appreciated by viewing the parametrization of the edges set forth above, the parameterization models query-document attribute observed and co-click associations via summation over all the queries and documents that can be associated with their respective attributes.

    In many implementations, only features (query or document) that are present in at least a threshold number of times (in queries or documents) and/or for at least a threshold number of users may be used in generating the query-feature, document-feature, and/or the query feature–document feature models.

    This may ensure feature nodes do not include sensitive information by ensuring features of those feature nodes occur at least a threshold number of times and/or for at least a threshold number of users.

    This may be achieved by removing, from the document-feature graph, any document feature nodes that do not have at least a threshold number of edges indicating the presence in corresponding documents; and/or by removing, from the query-feature model, any query feature nodes that do not have at least a threshold number of edges indicating the presence in corresponding queries.

    Query feature nodes and/or document feature nodes may be removed from the query feature-document feature model utilizing similar techniques.

    The query feature–document feature model may be utilized to determine, for a given document, a query independent measure and/or query dependent measure for the given document.

    For example, to determine a query dependent measure for a given query having given query feature(s), edges between the given query feature(s) and document features of the document may be determined. Each of the edges provides a measure of past interactions between a corresponding query feature and a corresponding document feature.

    The measures may be combined (e.g., summed and/or other statistical combinations) to determine the query dependent measure.

    Also, for example, to determine a query independent measure for the given document edges between a group of query features (that includes or is restricted to query features not included in the given query features) and document features of the document may be determined. Finally, the measures may be combined (e.g., summed and/or other statistical combinations) to determine the query independent measure.

    Adding Provate Documents to how Google Ranks Search Results

    Documents such as emails do not get linked to as Web pages do. But there are ways to determine which of those documents might be more important than others independent of the query terms used to search for those. Fitting those private documents into how the Google search engine ranks content is an approach that this patent takes.

    There have been many changes to how search works since the Google search engine started, including how private and public search results are returned and looking at entities in queries.

    subscribe to our newsletter

    2 Comments

    Leave a Comment