Table of Contents
A Google patent granted this week explores how Google might recommend TV shows using Web-popularity signals.
The problem this patent attempts to solve is that it can take a lot of money and time to collect comprehensive data about the popularity of shows on TV, and there’s an overwhelming amount of content that could be used to make recommendations.
The patent attempts to address the collection of data issues and provide recommendations that stay up-to-date as time passes. It may create a special app to use to recommend such shows.
This process might use the following steps for content entities(TV shows or movies) :
Analyzing text on one or more predefined websites to identify reference web pages
Determining content information and one or more related search queries for each of those reference web pages
Determining popularity rankings for the content entities based on user interactions with the identified reference web pages and the corresponding search queries
Selecting some of the content entities to show to users based upon those popularity rankings
The patent is:
Method and system for ranking content by click count and other web popularity signals
Invented by: Lukasz Fryz, Grzegorz Glowaty, and Gregory Allan Funk
Assigned to: Google
US Patent 9,098,551
Granted August 4, 2015
Filed: October 26, 2012
A computer-implemented method for ranking content entities by their associated web pages and search queries is disclosed.
The method comprises: at a computer system having memory and one or more processors:
Performing a textual analysis on one or more predefined websites to identify a plurality of reference web pages,
further including determining content information and one or more search queries for each reference web page;
for a respective content entity:
Identifying a subset of the reference web pages and the corresponding search queries based on the content information of the reference web pages and query terms of the corresponding search queries; and
Determining a popularity ranking for the respective content entity based on user interactions with the identified subset of reference web pages and the corresponding search queries; and
Selecting at least a subset of the content entities for a display to an end-user in accordance with their respective popularity rankings.
This patent refers to TV shows as Content Entities and tells us that it will look at the relative popularity of websites about TV programs (number of unique user visits over a predefined period) to recommend a TV show. We’re told that:
After determining the popularities of a large number of content entities (e.g., movies, TV shows, web pages, apps, or other media content items) based on other information sources, the system can identify a subset of content entities, including a TV series, a TV show, a movie, etc., to be presented to an end-user through a browser interface for a given query/time/user/location.
Determining the Popularity of Content Entities
The patent mentions an application or program used to browse and search for TV Content, which may receive requests for the most popular of the content entities. When I read this I was envisioning a set-top box or a connection to a site that provides streaming content, but it seems to be focused upon connecting to the Search Engine. Searches for TV shows, search result page selections from those queries may point to pages about those shows, and query log information from the search engine about those searches may be used to provide information about TV shows, and how popular they are, and what topics they are about.
This search engine-based approach doesn’t include direct visits to web pages about TV shows that don’t involve the search engine.
Search Query Processing
The content ranking and search server involve processing search queries from searchers to identify the most relevant and popular content entities (TV Shows).
After receiving a query that includes search keywords, the query processing module may submit the query to a content entity’s search index to help it decide what content entities are most relevant to the search and identify those in the content entities’ database.
A content entity (e.g., TV show or a movie) may be associated with one or more web pages (referred to as “reference web pages”). Search queries whose text content and/or metadata may provide an accurate description of the content entity. A content entities search index is built upon these reference web pages and related search queries.
For some content entities, there may be tens of thousands of web pages or search queries that may be related to the content entity.
Not every one of the web pages or search queries would qualify as a reference web page or search query for determining the content entity’s popularity.
The content ranking and search server can include web pages and search queries identification modules which would be used to identifying web pages and search queries that could be used for determining a content entity’s popularity.
Those identified reference web pages and search queries would be stored in the reference web page and search queries database.
Reference Pages for Content Entities
A search query might include a word that is in the title for a movie, such as “Shawshank,” which might be a query from someone who may want to find the movie “The Shawshank Redemption.” The search engine could then return a list of search results based on that keyword. One page linked that is returned as a result might be from the Wikipedia website, https://en.wikipedia.org/iwiki/The_Shawshank_Redemption, which provides rich information about the movie including “title, release time, running time, director, writer, plot, producer, starring, cinematographer, editing by, music by, studio, country of release, language, budget, box office, etc.”
That website and the content of the web page tell us that the page is reliable and that the popularity of the page (its search ranking) is likely a good indicator of the associated movie’s popularity. For a query like that, web pages and search queries identification module can identify, in the search log database, several web pages that searchers are most likely to visit by clicking through respectively document links during a predefined period which may range from the last hour to the last week or even the last month.
In other words, the click-through rate of a web page hyperlink in the search results associated with a particular set of search queries for a particular content entity may be used to indicate the popularity of the content entity if the web page is deemed to be closely related to the content entity and the web page is associated with a reliable information source.
Predefined Content Patterns to Identify Reference Web Pages
Whether a page may be considered a reference web page for a content entity can depend not only on the ranking of the web page on the Web but, more importantly, on the content of the web page itself.
Therefore, this system would take the step of performing a textual analysis of the page to decide if it has the necessary content needed to provide an objective description of the content entity. That textual analysis may look for a predefined content pattern. The patent tells us that a reference web page of a movie typically would have a telltale pattern that includes a predefined set of text fields near each other (e.g., title, year, director, stars, genre, box office gross revenue, etc). Sometimes, reference web pages can be identified on predetermined web domains, which are considered to have reliable information about particular types of content entities.
Predefined Associated Web Pages as Reference Web Pages
Reference web pages and relevant search queries may be found through a combination of an automatic process of choosing candidate web pages based on their click-through rates and the number of hits during a predefined period.
Based upon the type of content entities dealt with by the content ranking and search server, there may be multiple sources for identifying reference web pages including sites that are well-known as objective information providers and the websites of a professional content provider such as a TV content broadcaster or cable network official web pages including sub-domains, and micro-sites (e.g., www.example-TV-show-name.com) that may be dedicated to a specific program.
Other potential sources of reference web pages can include “TV program web pages on media content aggregator sites (such as youtube.com or the like), TV guide websites, official or unofficial web pages on social networking sites dedicated to a particular TV program, and unofficial fan web pages or discussion forums for a TV program.”
Predefined Web Queries and Reference Web Pages
A different way of measuring the popularity of a content entity is to choose a predefined set of search queries as being relevant to a content entity and track the number of search queries that take place for those for a predefined period. So, whenever the keyword “Shawshank” appears in a query, it is highly likely that the is a sign someone is searching for “The Shawshank Redemption”. The number of times that the keyword is submitted to the search engine is a reliable indicator of how popular the movie is. For some query terms, those searches might not be counted unless they meet some threshold amount of searches.
The Ranking of Reference Pages
The ranking of a reference web page may be used for measuring the popularity of a content entity associated with that reference web page. A higher ranking reference web page boosts the popularity of the content entity and vice versa. Rankings of different reference web pages that may be associated with the same content entity can be further weighted by some other measures in their contribution to determining the content entity’s popularity. The patent tells us that the ranking of a reference web page that has a highlighted “+1” viewer recommendation button may be boosted over other web pages that do not have the button.
The Popularity of Content Entities as Shown by Reference Pages
The popularity of TV Shows and movies (Content entities) may be inferred by the popularity of reference pages that are deemed associated with them. Other information about those content entities may also be collected, such as:
Other metadata includes start year, end year, original air date, original theatrical date of release, season, total season count, episode, total episode count per season, total episode count overall, episode title, description, rating, rating reason, version information (e.g., Director’s Cut). In some implementations, a content entity can be described as having multiple children entities or a parent entity depending on the distribution channel, year, rating, etc. Each child entity is likely to have its unique ID, metadata, and a mapping to a single parent.
Viewing Log Files of Reference Pages
The reference web pages and search queries associated with different content entities may be identified by examining the log entries of related services (including the search engine and the web servers) across a variety of periods. Doing this means that popularity data from a long period can be considered. The geographic regions associated with the reference pages and the queries may also be reviewed, so that their rankings “may be acceptable globally or applicable to a particular country or region.”
Ranking Scores for Content Entities
Once reference web pages and/or search queries are chosen for particular content entities, a content entity ranking module determines a ranking score for each content entity. There are other ways that content entities may be ranked in terms of popularity. These can include:
Direct User Feedback – such as ratings or endorsements
User Clicks on associated search queries or click-throughs on reference pages
Actual visits to Reference Pages
This patent sets out to provide a way to use pages about particular television shows and/or movies, and search queries about that content, to help indicate the popularity of those content entities. It looks at how often people search for those queries or click upon and visit those pages to understand how popular those shows might be.
The patent doesn’t say so, but other content could be examined in similar ways, such as information about songs, or even games. I haven’t seen Google making recommendations about movies, but Google did recently add ratings to knowledge panels about movies and TV shows.
Will Google add recommendations for TV and/or movies based upon these web-based popularity scores? It’s possible, and now you know how they might do that.
Search News Straight To Your Inbox
Join thousands of marketers to get the best search news in under 5 minutes. Get resources, tips and more with The Splash newsletter: