About Entity Attributes
When we see Google talk about the properties of different entities, they often refer to those as entity attributes and often define them in key/value pairs. For instance, Abraham Lincoln has a height of 6’4″. Alphabet is a holding company with a headquarters in Mountain View, California. Paul Newman has blue eyes. Fortnite is a multiplayer game.
Some online reviews are specifically about entities such as products, product creators, and/or product vendors. These reviews may contain information about those entity attributes that searchers may be interested in. A patent granted to Google this past week tells us that it may collect information about attributes of those entities from reviews that searchers may be interested in, and how they may do that. This is important, because as the patent tells us:
These types of user reviews may include information about entities that may not have been provided or generated, for instance, by the entities themselves.
Information About Entities Go From Reviews to Google’s Index
This process involving entity attributes may involve:
(1) Identifying, based on a corpus of user queries, one or more categories of observed user interest;
(2) Detecting, in one or more user reviews associated with a product, one or more segments of text related to the one or more categories of observed user interest;
(3) And based on the detecting, indexing, in a searchable database, the product on the one or more categories of observed user interest.
So. the search engine may look at the queries about an entity to identify what searchers may be interested in about those entities, and use those queries to identify text in reviews that searchers may be interested in, and index that text so that searchers can find answers to the queries that they have about entities such as products, product manufacturers, and product vendors.
Google may count the number of queries asking about certain entity attributes to understand how much interest there is in answers to those queries, or what the user interest may be in them.
Machine learning may be used to identify whether queries are interested in the product itself, or in the product manufacturer.
The patent in question, granted last week, can be found at:
Analyzing user reviews to determine entity attributes
Inventors: Advay Mengle, Jindong Chen, Charmaine Cynthia Rose D’Silva and Anna Patterson
Assignee: GOOGLE LLC
US Patent: 10,061,767
Granted: August 28, 2018
Filed: June 16, 2017
Methods and apparatus are described herein for classifying user reviews or portions thereof as being related to various entities, and for associating extracted descriptive segments of text contained in those user reviews or portions thereof with entities based on the classifications. In various implementations, one or more categories of observed user interest may be identified based on a corpus of user queries. One or more segments of text related to the one or more categories of observed user interest may be detected in one or more user reviews associated with a product. Based on the detecting, the product may be indexed on the one or more categories of observed user interest in a searchable database. In some implementations, the searchable database may be accessible to one or more remote client devices, and may be searchable by the one or more categories of observed user interest to provide search results to be rendered by the one or more remote client devices.
User reviews, as described in this patent, may be from sources such as:
- social network postings
- articles written for websites or for printed publications such as magazines or newspapers
- postings made to a user review section of an online vendor or marketplace
- user reviews submitted to various existing user review clearinghouses
Those reviews may then be classified based upon the categories of interest that they may cover, and the entities that they may contain information relating to. It’s interesting because this patent tells us about how machine learning may be part of the process involved in taking these steps.
In some implementations, one or more “categories of interests” in entities may be employed to classify user reviews and/or portions thereof as being related to particular entities. Detection of words or phrases in a user review that correspond to these categories (e.g., as sufficiently similar) may be interpreted as signals for classifying the user review or a portion thereof as related to an entity. Categories of interest may come in various forms, such as categories of predicted interest and categories of observed interest. In various implementations, a category engine may maintain an index of categories that may be used by classifier engine to classify user reviews and/or portions thereof.
If you are a fan of online games, the patent provides a number of examples focused upon those to describe how the process behind the patent works:
Categories of observed interests, by contrast, may be determined, e.g., by category engine, based on patterns observed in user activity, such as among a plurality (or corpus) of user queries. For example, and continuing with the online marketplace of apps example, multiple users may search for apps using the same or similar terms or phrases. If sufficient users submit queries containing a particular word or phrase (or similar variations thereof), then category engine may deem those words or phrases to constitute a category of observed interest and may update index accordingly. Thus, if enough users search an online marketplace for “massively multiplayer online role-playing games,” or “MMORPG,” an MMORPG category may be established.
Processes involving extracting text from reviews is described as well and includes things such as comparisons between different entities (such as “Product X is better than product Y.”)
The patent does spend a lot of attention on how information and specific words and phrases might be extracted from reviews, and be responsive to queries, such as if there were a lot of queries about “cameras that have great optical zoom capabilities,” information might be extracted that answered that query for a particular camera. And this kind of information might be taken from user reviews of cameras.
We are provided with a hint of the implications of such extractions:
Graph engine may perform various actions with these newly associated entity attributes and/or scores. For example, in some implementations, graph engine may interface with a search engine (not depicted), and may index one or more entities based on one or more descriptive segments of text that are associated with those one or more entities, e.g., by descriptive text association engine.
Use of Reviews to Learn About Entity Attributes
Interesting that Google may use user-generated content such as reviews of products to learn about those products, the manufacturers of those products, and the distributors of the products.
We’ve seen that Google has had an interest in reviews of products and businesses, and places because searchers have been searching for those things. We’ve also seen Google talk about how they might use reviews to learn about sentiment regarding entities. This patent takes that interest a step further, beyond just making reviews available to searchers or pointing out sentiment filled sentences. It seeks to learn about the things being reviewed so that it can match up specific interests with information that answer those.
The patent tells us about where reviews may be found online, how text from those reviews may be classified and extracted, how interest from searcher’s queries may be crowdsourced to understand what those searchers may be interested in learning, and that a search engine could show searchers information that is responsive to those interests about specific entity attributes.
Perhaps reviews are a good place to learn information about entities that those entities aren’t sharing themselves. What do you think?