How Google May Determine Similar Local Entities

Published: January 29, 2018

How Google May Determine Similar Local Entities featured cover image

Local Search is Filled with Local Entities Having Local Significance

Google’s Local Search is more semantic-based than its organic search; where businesses are often referred to as “local entities,” like in a patent that was granted to Google on January 2 of 2018. It’s more of a consideration of different things (as in “Things and not strings”), rather than of matching keywords on pages. In part, it’s why we see knowledge panels for businesses these days at Google after they introduced their Knowledge Graph, which shows off entities. Their definition of a local entity, under that patent, is interesting:

Some search systems can obtain or infer a location of a user device from which a search query was received and include local search results that are responsive to the search query. A local search result is a search result that references a document that describes a local entity. A local entity, in turn, is an entity that has been classified as having local significance to a particular location. Local entities are typically physical entities associated with an address or a region, such as a restaurant, a hospital, a landmark, and the like. A search result referencing a document describing a local entity receives a search score “boost” for a query if the location associated with the local entity is near the location of the user device. For example, in response to a search query for “coffee shop,” the search system may provide local search results that reference web pages for coffee shops near the location of the user device. Many users in various geographic regions will likely be satisfied with receiving local results for coffee shops in response to the search query “coffee shop” because it is likely that a user submitting the query “coffee shop” is interested in search results for coffee shops that are local to the user’s location.

Displaying Similar Local Entities is a Goal

This new patent isn’t just about ranking local entities in response to a query that is relevant to those entities. It also tells us that some searches will provide search results that are based upon showing similar results, which is interesting as well:

In the context of local entities, for example, search engines may provide search results for local entities that are related to each other in some predetermined way. For example, in the context of restaurants, suggestions for other restaurants that offer similar menu items at similar prices may be made in response to a selection of search result that references the first restaurant, or in response to a search of other restaurants related to a first restaurant.

If you are looking for a place to stop at and get some coffee, being able to see several coffee houses nearby, even some that might be further away may make it possible to decide which you want to visit even if one is closer and others are a little more distant. The question I had when I started reading this is what might Google use to decide if different entities were similar? How did they determine that? How do they decide if a local entity has local significance to a geographic location?

Related Content:

The patent tells us that several things make the process in this patent have innovative aspects. These include:

1) Accessing data specifying, for each local entity in a set of local entities, wherein each local entity is a physical entity resolved to a geographic location and having local significance to the geographic location based upon query terms that resolve in selections of local entities in a location.
2) Determining a similarity measure that is a measure of similarity of an identified local entity and similar local entities; their similarities may be enough to consider them to be related.

The patent is:

Detection of related local entities
Inventors: Kumar Mayur Thakur and Mukund Jha
Assignee: Google Inc.
US Patent 9,858,291
Granted: January 2, 2018
Filed: October 30, 2014

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing local entities. In one aspect a method includes accessing data specifying query terms for each local entity in a set of local entities and for each query term a term value based on many instances of queries that include the query term occurring in a query log, and a selection value based on several selections of search results that each reference the local entity in response to a query that includes the query term and attributed to the query term; selecting a first local entity from the set of local entities; selecting a subset of second local entities from the set of local entities; and for each second local entity in the subset, determining a measure of similarity of the second local entity to the first local entity.

How are Similarities determined for Local Entities?

Local results in Google Maps are ranked based upon such things as distance from a mobile location history, relevance of the title of a business to a query looking for it, and a location prominence score, based upon citations (and links and reviews can count as citations.) and an authority score for a website for business.

This new patent tells us that Google may show off similar sites to accompany a specific local result, to give a searcher options and choices of places to visit. It explains how similarity is determined, in terms of looking at words that appear in queries and descriptions:

The similarity measure is based, in part, on query term data for each local entity. A query term, as used in this written description, can be an n-gram that constitutes part of a query, but need not be an entire query. For example, for the query “Restaurant Review Gino’s”, the query terms may be the unigrams “Restaurant,” “Review,” and “Gino’s.” Other n-grams, such as bi-grams, tri-grams, etc., can also be used as query terms.

In addition to looking for language similarities, Google may also look for similar local entities within a radius of a certain number of miles. This has me thinking of how many similar businesses might be nearby a business that I might want to have appeared in local search results, and how similar those might seem to be.

Local Search has a Similar Local Entity Subsystem Which Learns from Query Logs

When we think of search at Google, we usually think of organic search which uses information retrieval scores to determine the relevance of search results to a query that a searcher performs, and authority scores for those results. This patent talks about a similar local entity subsystem for when local results are processed:

When processing local results, the similarity of local entities to other local entities may be used when determining search scores of documents that reference the local entities. Likewise, if the search system is used to search local entities independent of documents (e.g., such as a search for restaurants), the similarity of local entities to other local entities can also be used when determining which local entities to list in response to a local entity query. Accordingly, the search system can include, or be in data communication with, a local entity similarity subsystem. The local entity similarity subsystem determines, for each local entity, a corresponding list of similar local entities that includes a list of local entities ranked according to their similarity to the local entity to which the list corresponds.

Similarities between local entities may be determined in part by looking at query log appearances of how often a business might appear in query log results for certain terms that may be similar:

The process accesses data specifying, for each local entity in a set of local entities, term values, and selection values for query terms (202). The term value is proportional to several instances of queries that include the query term occurring in a query log. For example, suppose the queries “Restaurants NYC Italian” and “Italian Restaurants Manhattan” each respectively appear N times in a query log. Based on these two queries and their respective instances, the term value for “Restaurants” is proportional to 2N, while the term values for `NYC,” “Italian” and “Manhattan” are proportional to N.

Might selections of certain pages that mention entities in response to similar queries help determine which businesses might be similar in a local search? That may depend upon how important that entity may be to that page. The patent seems to describe that happening:

The selection value is proportional to many selections of search results that each respectively specifies a local entity in response to a query that includes the query term and attributed to the query term. For example, assume that search results, each referencing a document, are provided in response to a search query. For each selection of a search result referencing a document that, in turn, references a local entity, the selection value for the query terms of the query are increased for that local entity. How much the selection value is increased may depend, in some implementations, on the score that describes how important the entity is to the subject matter of the document. For example, for the first document that lists hundreds of restaurants, described above, and having a relatively low score for each restaurant entity, a query term selection value for a particular query term and local entity would be increased very little in response to a selection of a search result referencing the document. Conversely, for the second document that is highly scored for the local entity, a query term selection value for the particular query term and the local entity would be increased much more than for the selection of the first local document.

Distances of Similar Local Entities May Vary in Importance Based upon the Business Type

Similar local entities may be shown based on the type of businesses involved. How far would you drive to a pizza place? For a Gas Station?

The patent asks those questions too:

Selecting a proper subset of second local entities from the set of local entities can, for example, involve selecting local entities that have a geographic location within a threshold distance of the geographic location of the first local entity. The threshold distance can be a fixed distance or can vary based on the local entity type. For example, for the first entity of a restaurant type, the distance may be 10 miles; for the first entity of a gas station type, the distance may be three miles; etc.

It also provides some answers regarding those distances:

The distance can also be based on an estimated time of travel. For example, when a first local entity is selected, all other local entities within an estimated 20-minute drive may be selected. Thus, depending on geographic boundaries (e.g., bridges, rivers, etc.), the area from which other local entities are selected may be asymmetric, and not simply circular or rectangular. The time-based distance can be determined from, for example, traffic patterns obtained from systems external to the subset selection stage and pathfinding algorithms.

Query Terms Used to Determine Similarity May Be High Quality and Low Quality and look at Click Selection Numbers

When comparing the query terms that different places may be found for, some of those terms are considered high-quality terms, such as terms that indicate categories, such as “food”. Some query terms may be considered lower quality, such as location terms or navigational terms.

A location term used in a query, such as “NYC” may appear in queries such as “pizza NYC” or “Chinese Food NYC”, but that doesn’t indicate similarities the way that a term such as “pizza” indicates a restaurant that specifically serves pizza. A query term that might be used as navigational terms, such as a neighborhood name or the name of a shopping center might, such as “Lombardi’s”. Local entities in the same shopping center or neighborhood may not be very similar. The presence of those query terms in a query log for comparison may be more helpful for the terms that are higher quality, than the ones that are lower quality. Two places that are identified as being found for “seafood” are possibly more similar than two places that are found for “NYC.”

The patent goes into much more depth on the quality of query terms, and category terms versus terms that indicate a location or a navigational bias. They seem to favor categories as the type of query term that indicates the similarity of local entities.

But they may pay attention to both high quality and low-quality query terms, especially when selections for those terms are similar:

A “similarity” of relative selection values means that the distribution of selections for a query term is similar. For example, assume a first entity is a restaurant and a second is a casino. Both entities may have 5,000 clicks from queries with the term “restaurant,” but the restaurant entity has 7,000 total clicks from all queries attributed to it, while the casino has 1,000,000 total clicks from all queries attributed to it. Because the relative distributions are very different, there is very little similarity attributed to the “restaurant” term for these two entities. Conversely, another entity with 4,000 clicks from queries with the term “restaurant,” 6,000 total clicks would be considered to be similar to the restaurant entity for the term “restaurant.” Likewise, yet another entity with 6,000 clicks from queries with the term “restaurant,” and 975,000 total clicks would be considered to be similar to the casino entity for the term “restaurant.”

This use of selections from all query types can point out whether two different places may be similar or very different from each other. This isn’t using click-through information to determine how to rank entities, but rather to decide whether to show that two local entities are similar or not.

About Bill Slawski

With more than 26 years of SEO experience and a Juris Doctor Degree, Bill Slawski is the foremost expert on Google’s patents as related to SEO. Patent Exploration is one of the quickest and most detailed ways to find new information about SEO. Bill is the Editor of SEO by the Sea, a prominent search engine optimization blog, where he is the author of over 1,300 posts. Bill’s experience includes Fortune 500 brands and some of the largest websites in the world. Bill is a contributing author for Moz, Search Engine Land, and Search Engine Journal. In 2014-2021, he spoke at industry-leading international conferences about topics including search engine algorithms, universal and blended search, personalization in search, search and social, and duplicate content problems, structured data, and schema

MORE TO EXPLORE