Table of Contents
How are Personalized Ranking of Search Results Calculated at Google?
I wrote about an earlier version of this patent when it was still a patent application in 2012 in the post Google’s User Profile Personalization and Google Plus. This patent has been filed 4 times by Google and wasn’t granted until the fourth version, which I am writing about today.
The essential part of any patent is the claims section, which the patent office looks at when deciding whether to grant a patent.
The first version of the patent (Personalization of web search) was filed on September 30, 2003. It was initially given a nonfinal rejection based on similarities to patents from Microsoft, Utopy, and NEC USA.
Related Content:
The first three versions of the patent were abandoned by Google at the patent office, looking like they could not show how Google’s filing was different enough from other previously filed patents from other companies.
They are listed as earlier versions of this latest granted patent, a continuation patent, taking the filing dates of the earlier versions. So it is considered an updated version of the patent.
Here are the 3 earlier versions of this patent filing:
- Filed September 30, 2003 – Personalization of web search
- Filed May 12, 2010 – Personalization of Web Search Results Using Term, Category, and Link-Based User Profiles
- Filed November 11, 2011 – Personalization of Web Search Results Using Term, Category, and Link-Based User Profiles
The first claim from the 2003 version of the Personalization of Web Search patent reads like this:
1. A method of personalizing search results of a search engine, comprising: accessing a user profile for a user based on information about the user, the user information including information derived from a set of documents, the set of documents comprising a plurality of documents selected from the set consisting of documents identified by search results from the search engine, documents accessed by the user, documents linked to the documents identified by search results from the search engine, and documents linked to the documents accessed by the user;
– receiving a search query from the user;
– identifying a set of search result documents that match the search query; assigning a generic score to each of at least a plurality of the search result documents;
– assigning a personalized score to each document of the plurality of search result documents following the generic score assigned to the document and the user profile;
– and ranking the set of search result documents according to their personalized scores.
From the second version of the patent application filed in 2011, here is the first claim of the patent describing how it works:
2. A computer-implemented method, comprising: accessing a user profile for a user and a group profile for the user; receiving a search query from the user;
– identifying a set of generic search result documents that match the search query; assigning a generic score to each document of at least a subset of the set of generic search result documents;
– assigning a personalized score to each document of the subset of search result documents following the generic score assigned to the document, the user profile, and the group profile;
– ranking the subset of search result documents per their respective personalized scores;
– providing, to a client system associated with the user, information identifying a plurality of documents in the ranked subset of search result documents;
– and updating the user profile based on a document selected by the user from the plurality of documents.
Here is the first claim from the recently granted version of the patent granted in November 2020:
1. A method of personalizing search results of a search engine, the method comprising:
– accessing a user profile for a user, wherein the user profile is based at least in part on information about the user, the user profile including information derived from a set of documents, the set of documents including documents identified by search results from the search engine, documents accessed by the user, documents linked to documents identified by search results from the search engine, and documents linked to the user accessed documents; receiving a search query from the user;
– identifying a set of documents in response to the search query, each document is associated with a generic score that is independent of the user profile;
– assigning a personalized score to each of at least a subset of the identified set of documents, the personalized score being based, at least in part, on the user profile;
– and determining a final score for each document in the subset of the identified set of documents, the final score being a function of the personalized score for the document, the generic score associated with the document, and a confidence score accounting for one or more of: a quantity of information acquired about the user, how closely the search query matches the user profile, and an age of the user profile;
– and providing, following the final scores, results identifying at least a subset of the identified set of documents to a client system associated with the user, wherein providing the results includes providing at least one result based at least in part on the personalized score for the corresponding document and providing other results of the obtained search results under the generic scores for the documents corresponding to the other results but independent of the user profile.
Confidence Scores and Personalized Ranking of Search Results
The most interesting part of the first claim from the 2020 granted version of the patent that stands out to me is the mention of a “confidence score” in the last half of that claim:
Determining a final score for each document in the subset of the identified set of documents, the final score being a function of the personalized score for the document, the generic score associated with the document, and a confidence score accounting for one or more of: a quantity of information acquired about the user, how closely the search query matches the user profile, and an age of the user profile; and providing, by the final scores, results identifying at least a subset of the identified set of documents to a client system associated with the user, wherein providing the results includes providing at least one result based at least in part on the personalized score for the corresponding document and providing other results of the obtained search results following the generic scores for the documents corresponding to the other results but independent of the user profile.
That confidence score is based on:
- A quantity of information acquired about the user
- How closely the search query matches the user profile
- An age of the user profile
A confidence score wasn’t mentioned in the first version of the patent but was added to the second version.
Personalized Rankings Using Term, Category, and Link-Based User Profiles
Another noticeable difference in the 4 patents is that the titles changed from the first one, “Personalization of Web Search” to “Personalization of Web Search Results Using Term, Category, and Link-Based User Profiles” last 3 versions. This is because the first patent mentions Term, Category, and Link-based User Profiles, but they gain more prominence in the last three versions.
As I wrote in my previous post:
Instead of using a single focus for a generated user profile, those profiles might be made up of many sub-profiles, each of which may characterize a searcher’s interest from different perspectives. These could include:
A term-based profile with several terms carries a weight indicative of its importance relative to other terms.
A category-based profile using multiple categories, possibly organized into a hierarchical map (like the hierarchy you see in DMOZ organized into).
A link-based profile with several links that might be directly or indirectly related to pages or documents identified in a user’s search history, with each link having a weight indicating the importance of the link (like PageRank).
Why Use Personalization in Search Results?
I like that these patents explain why Google decided that there was value in providing personalization in search results.
In part, they place value in user profiles for searchers, which can help customize search results when they are returned to a searcher in response to a query that the searcher may have submitted to a search engine.
The patent tells us that queries are usually concise (averaging 2-3 words, and as the number of documents in a search engine index grows, the number of results that could be returned could grow as well. But, they tell us that “not every document matching the query is equally important from the user’s perspective.”
The problem that these personalization patents were intended to solve was to keep searchers from being overwhelmed by many search results that might be returned for a query. The search engine would do this by ordering search results based on their relevance to the user’s query. It would use personalization to provide results that are more relevant to a specific searcher.
One approach to improving the relevance of search results to a search query is to use the link structure of different web pages to compute global “importance” scores that can be used to influence the ranking of search results. (This is how this patent refers to PageRank.)
Personalization of results is in response to the Random Surfer Model, which PageRank followed. The patent tells us that:
In reality, a user like the random surfer never exists.
Every user has his own preferences when he submits a query to a search engine.
The quality of the search results returned by the engine has to be evaluated by its users’ satisfaction.
When the query itself can well define a user’s preferences, or when the user’s preference is similar to the random surfer’s preference concerning a specific query, the user is more likely to be satisfied with the search results.
However, if the user’s preference is significantly biased by some personal factors that are not clearly reflected in a search query itself, or if the user’s preference is quite different from the random user’s preference, the search results from the same search engine may be less useful to the user, if not useless.
I sometimes find myself refining my searchers to return results that are a lot more relevant to what I might be looking for.
The patent also addresses refining results like this, telling us that query refinement sometimes requires more knowledge of the subject or even more expertise with search engines than the searcher may possess, requiring more time and effort than a searcher is willing to spend.
Personalization is Based on User Profiles to Customize Search Results
User profiles consist of multiple items that characterize a searcher’s preferences.
These items could be extracted from various information sources:
- Previous search queries submitted by the user
- Links from or to the documents identified by the previous queries
- Sampled content from the identified documents as well as personal information implicitly or explicitly provided by the user
How Personalization may work at Google:
- When the search engine receives a query from a searcher, it begins by returning documents matching the query
- Each search result has a generic rank based on PageRank, text associated with the document, and the query
- The searcher’s profile is identified, and that is correlated with each of the identified documents
- That correlation between a document and the user profile produces a profile rank for the document, which indicates the relevance of the document to the user
- The search engine will then combine the document’s generic rank and profile rank into a personalized rank and orders those results according to their personalized ranks
The Process of Personalized Ranking of Search Results
Searcher profiles may be based on several sub-profiles, and each sub-profile can character the search’s interest from different perspectives.
A term-based profile is based on many terms, with each of the terms carrying a weight that indicates its importance relative to other terms. These terms may be found on web pages, and important and unimportant terms may be identified on pages to determine if those are a good match for searchers who fit into profiles based on terms, as shown in this drawing from the patent:
The use of Terms on pages will be weighted:
A category-based profile based on multiple categories, that could be organized into a hierarchical map, as shown in the following patent drawing:
And the searcher’s search preferences could be associated with at least some of the multiple categories, each of the categories having an associated weight that indicates the searcher’s interest in documents that may fall into those categories. Different categories could have different weights associated with them:
There may also be multiple category-based profiles for a user.
And sub-profiles may include link-based profiles, which include several links directly or indirectly related to identified documents, with each link having a weight indicating the importance of the link to the searcher. Links in the link-based profile may be further organized concerning different hosts and domains.
Google may collect these Term-Based Profile weights, Category-based Profile weights, and link-based Profile weights and use them to determine which documents should be returned as personalized search results as seen in the patent drawing:
The personalized ranking of search results is determined following the following flowchart from the patent:
Personalization of web search results using term, category, and link-based user profiles
Inventors: Stephen R. Lawrence;
Assignee: GOOGLE LLC
US Patent: 10,839,029
Granted: November 17, 2020
Filed: March 3, 2016
Abstract
A system and method for creating a user profile and using the user profile to order search results returned by a search engine. The user profile is based on search queries submitted by a user, the user’s specific interaction with the documents identified by the search engine, and personal information provided by the user. The user profile may be selected from the documents accessed by the user by performing paragraph sampling or context analysis. The user profile modulates generic scores associated with the search results to measure their relevance to a user’s preference and interest. The search results are re-ordered accordingly so that the most relevant results appear on the top of the list. User profiles can be created and/or stored on the client-side or server-side of a client-server network environment.
Personalized Ranking of Web Search at Google
If you have been using Google Now to look at personalized news results, you have seen how Google might influence the search results you see by personalizing them.
With Google Now, you are sometimes able to expressly show your interests in specific topics by filling out forms telling Google what you have interests in.
Your selection of stories to click on and read also influences what you may see in the future. Your decisions not to choose some stories may also play a role in what is chosen to show you.
Google now also has a feature that allows you to indicate that you want to see more articles of a certain type or less of that type.
If you create content for audiences to attract them to specific web pages or websites, understanding what the likes and dislikes of those audiences might be may be helpful to you and have an idea of where they might like to visit on the Web.
This description of how the personalized ranking of search results provides more details of how personalization works at Google compared to other patents I have written about from Google such as one on Personalized Search Results at Google, which told us about bias documents that would be added to non-personalized search results.
Search News Straight To Your Inbox
*Required
Join thousands of marketers to get the best search news in under 5 minutes. Get resources, tips and more with The Splash newsletter: