A couple of months ago, in June, I wrote the post, Click a Panda: High-Quality Search Results based on Repeat Clicks and Visit Duration.
In that post, I pointed out that Google search engineer Navneet Panda, whom the Panda update at Google was named after had worked on a few patents that focused upon high-quality search results. And that I couldn’t help myself but review new patents that may have been written by Panda since the Panda update. The patent I wrote about in June was a continuation patent that added information about repeat clicks on search results and measured durations of visits to those pages. The repeat clicks and measuring durations of visits to sites were interesting enough to make me wonder if I might find something equally worth investigating
Had he looked at similar signals on other websites?
Website Duration Performance Scoring
A patent from December 6th, 2016 from Navneet Panda and James A. Kunz, titled Website duration performance based on category durations, looks at website durations and clicks to sections of sites. The patent description gives us a summary of how this patent works. The first part seems fairly simple:
The index, query logs, and the navigation logs are processed to generate site data. The site data describes websites and includes data that characterizes visits to particular resources of the websites by users and durations of each of those visits. For example, the visit data can identify clicks by users on search results included in search results web pages or direct inputs of URLs, and, for each of the selections and inputs, a measurement of the duration of time that elapsed between the time that the user requested the resource and the time that the user device requested another resource.
The patent uses the word “duration” to talk about the time that might be taken to visit a page:
The obtained data characterizes user visits to resources and the duration of those visits. In this data, the duration of a visit can be measured in any of a variety of ways. For example, the duration of a visit can be measured as the time between the time that a user initiates a request for a resource, e.g., by clicking on a link to the resource or entering a resource locator for the resource into an input field provided by an application program running on the user device or an add-on to the application program, and the time that the user initiates another request for another resource. Alternatively, the duration may be measured as, e.g., the time between the time that a resource is fully rendered by the application program and the time that the user initiates another request for another resource.
The patent also looks at the weights of categories or topics that may be associated with a resource:
A process external to the search engine may categorize the resources and websites. In some implementations, the resources are individually categorized, and the websites are then categorized based on the resource categories. Each resource and website may also belong to more than one category, and each categorization may be reflected by a category weight that is a measure of the strength of association of the category to the resource and/or website. For example, a resource that includes a news story on a professional athlete launching a chain of restaurants may have category weights that respectively reflect moderate relevancies for the categories of news, sports, and dining. Conversely, a resource that includes a news story regarding an international conflict may have a category weight reflecting a very high relevance for the category of international news.
The patent talks about measuring the length of sessions and weighing durations based upon different categories on a site. We are told the following, which are interesting uses of duration scores:
The duration performance scores can be used in scoring resources and websites for search operations. The search operations may include scoring resources for search results, prioritizing the indexing of websites, suggesting resources or websites, protecting particular resources or websites from demotions, precluding particular resources or websites from promotions, or other appropriate search operations.
The duration information collected about visits to different parts of a site may be used to tell a search engine more about that site. We are told about a website duration performance score based upon category duration scores:
The process determines, for each of the plurality of categories to which the website belongs, a category duration score based on the duration measurements, each category durations score being proportional to durations of time from the duration measurements (206). In some implementations, to determine the category duration scores, a single website duration score is determined for a website, and then the duration score is used to generate category duration scores for the website. From these category duration scores, the duration performance score for the website is determined.
The process determines, from one or more of the category duration scores, a duration performance score for the website (208). The duration performance score is, in some implementations, proportional to the one or more category duration scores from which the duration performance score is determined (208). For example, the duration performance score may be based on all of the category duration scores for the website. Alternatively, the duration performance score may be based on a proper subset of the category duration scores for the website.
One of the issues that is often mentioned by someone from Google about the use of user behavior data in rankings is that it is often a noisy signal. This patent talks about how it might work to reduce noise in such signals. It’s interesting seeing such a discussion. (not sure I’ve ever seen a list like this from Google about reducing noisy signals.) Here are a few ways in which noise might be reduced:
Filtering Out Short Clicks and reducing Other Noise Factors – In some implementation, a last visited duration time is discounted when determining a duration time for a session. The last visited duration time corresponds to the duration measurement generated in response to a user device requesting a resource from the website and requesting a resource from another, different website. The discounting is done, for example, to filter out “short clicks,” when a search result is selected and then the user navigates back to a search results page, or when a single visit to a website is lengthened due to page loading delays. Another reason for discounting the last visited duration time is that last visited duration times may be susceptible to 2) other noise factors, such as a user leaving a computer with a resource displayed, and then coming back an hour later and immediately navigating to another resource of another website. For example, when discounting the last visited duration time from Table 1 above, a duration time of 525 seconds is computed.
Removing prior visited duration times – In some implementation, a prior visited duration time is added when determining a duration time for a session. The prior visited duration time corresponds to the duration measurement generated on the last resource visited at the first website immediately before selecting the first resource at a second website. For example, in Table 1 above, duration measurement for the resource R0 at websites S0 is 160 seconds. This is a prior visited duration for the durations for resources R1-R7. Thus, when discounting the last visited duration time from Table 1 above and taking into account the prior visited duration, a duration time of 685 seconds is computed.
Addressing Boosting from previous visits – In some implementations, each duration measurement generated in response to a user device requesting a resource in response to direct user input of an address of the resource is boosted. Such direct input is indicative of a positive user assessment of quality, and thus the duration time for that resource is boosted. The boost value may be a fixed value or may be proportional to the frequency or quantity with which the address is directly input by users. For example, assume the boost factor is 1.5, and also assume discounting of the last visited duration time is also used. From Table 1 above, if the first resource is requested in response to direct user input, a website duration time of 585 seconds is computed.
It is interesting seeing a patent from Google that looks at User Behavior data, such as what someone might click upon on a site and how much time they might spend upon that site. It’s also intriguing to see a discussion from Google on how noise from user behavior signals might be reduced. When the author of such a discussion is named Panda, that makes it worth revisiting.