“The perfect search engine would understand exactly what you mean and give you back exactly what
you want.” Larry Page, How Search Works
Using Knowledge Bases to Answer Queries about Entities
Three years ago, I wrote How Knowledge Base Entities can be Used in Searches about how you could search using a query such as, “What is the movie where Robert Duvall plays a character who says how much he loves the smell of Napalm in the morning.” That search for a movie where a well-known actor says a well-known line is an example of Google using facts that it may learn from knowledge bases so that it may answer queries. It doesn’t answer with a featured snippet. Instead, it shows a couple of videos followed by other documents also answering that question.
A patent granted to Google last month also looks at information from a knowledge graph that it may have learned from knowledge bases to respond to queries. Not as featured snippets, but rather search results that learn about the entities in a query, and related properties. This is a semantic search that goes beyond understanding synonyms and semantically related words, to knowing some properties about the things involved in a search (remember, the Google Knowledge Graph is about “Things, and not Strings”, so it goes beyond just matching keywords from a query to the same (or related) keywords on a document. It introduces the concept of related entity scores as well.
The Knowledge Graph Collects Entity Information to Answer Queries
Yes, the knowledge graph is like an encyclopedia, but that isn’t why it exists. It tries to learn about entities, so that it can assist in answering queries about them, in search results
This new patent tells us how it may use information about specific entities to answer queries:
In some implementations, a computer-implemented method comprises identifying in a knowledge graph, using at least one processor, at least one entity and related entities related to the at least one entity by respective properties. The computer-implemented method comprises, for each respective one of the related entities, determining, using at least one processor, a related entity score associated with a respective property that relates the at least one entity and the respective one of the related entities. The computer-implemented method comprises, for each respective property, generating a property score, using at least one processor, based on related entity scores associated with that respective property. The computer-implemented method comprises generating, using at least one processor, and causing to be stored a data structure of sortable properties based on the generated property scores, wherein the data structure is used to provide sorted search results in response to a query.
If you ask Google a question like “Where was George Washington a Surveyor?” the search engine provides search results which detail where he acted as a teenaged surveyor before he went into the military.
A town called Washington Virginia (which calls itself the first Washington) commemorates the 17-year old who surveyed the surrounding area during his younger days.
Related Entity Scores
How does the process of this patent work? This is how the knowledge graph works in helping return search results, using related entity scores:
In some implementations, a system comprises a data structure comprising a knowledge graph and one or more processors. The one or more processors are configured to perform operations comprising identifying in the knowledge graph at least one entity and related entities related to the at least one entity by respective properties. The one or more processors are configured to perform operations comprising, for each respective one of the related entities, determining a related entity score associated with a respective property that relates the at least one entity and the respective one of the related entities. The one or more processors are configured to perform operations comprising, for each respective property, generating a property score based on related entity scores associated with that respective property. The one or more processors are configured to perform operations comprising generating and causing to be stored a data structure of sortable properties based on the generated property scores, wherein the data structure is used to provide sorted search results in response to a query.
The newly granted patent is
Providing search results based on sorted properties
Inventors: Yiming Li, and Zhenyu Gu
Assignee: Google LLC
United States Patent 9,875,320
Granted: January 23, 2018
Filed: February 8, 2016
An entity may be related to multiple related entities by one or more properties, and the entity may also be associated with one or more entity types. A system for providing sorted results may include identifying the entity, related entities, and types. The system may also determine related entity scores for each respective related entity, relative to the entity. For each property, the related entity scores of the related entities related to the entity by that property are combined to generate a property score. The properties are then sorted based on their property scores. The sorting may occur for properties associated with an entity type, and sorted search results may be provided as output for one or more entity types of interest.
Search Results and Featured Snippets
We have seen answers to some queries that provide a combination of search results and featured snippets as I wrote about in the post Is Google Going to Marry their Knowledge Base with their Search Engine? Google hasn’t shown too much of a preference for answering a query with a search result or a featured snippet or a structured snippet.
Sometimes an answer to a query about a movie may seem to be very appropriate. I like when a question about geography, such as what is the capital of XXXX? shows a map in a featured snippet, because the location of a capital can be useful information.
No Specific Knowledge Bases about How Google Uses it’s Knowledge Graph to Answer Questions about Entities
There are no knowledge bases yet, that tell us how Google uses a knowledge graph. The closest we do have are patents like this one, which has a considerable amount of information contained within it. This section was rich in concepts and hints at how Google might treat information about properties:
A particular entity may be associated with several types, and may also be related to multiple other entities by one or more properties. As used herein, an entity is a thing or concept that is singular, unique, well-defined, and distinguishable. For example, an entity may be a person, place, item, idea, topic, abstract concept, concrete element, other suitable thing, or any combination thereof. In some implementations, search results include results in identifying entity references. As used herein, an entity reference is an identifier, e.g., text, or other information that refers to an entity. For example, an entity may be the physical embodiment of George Washington, while an entity reference is an abstract concept that refers to George Washington. Where appropriate, based on context, it will be understood that the term entity as used herein may correspond to an entity reference, and the term entity reference as used herein may correspond to an entity. In some implementations, the search system may identify an entity type associated with an entity reference. The entity type may be a categorization or classification used to identify entity references in the data structure. For example, the entity reference “George Washington” may be associated with the entity types “U. S. President,” “Person,” and “Military Officer.” Properties describe relationships between entities, in other words, how one entity is related to another entity. The most important properties associated with an entity may depend on which of its types are of interest. For example, for entity “Tom Hanks,” a user may want search results to include his movies or other information about his acting. However, for entity “Albert Einstein,” users may want search results to include his theories, technical papers, and other information related to his contributions to physics. The disclosed techniques may be used to determine the important attributes, and accordingly provide search results that the user likely wants.
Want to know more insights into how a knowledge base may be used in a semantic search? It may be worth it for you to read this patent. Keep in mind that Google considers many sites to be helpful knowledge bases that go beyond Wikipedia and Wikidata. It may look at sources such as IMDB and Yahoo Finance as helpful information about facts.
To illustrate this patent, I decided to show George Washington as a surveyor. Not many people know he did that as a teenager! It’s also possible this position played a significant role in positions he held later on like military commander and politician. The following passage from the patent about entity types and understanding information within a graph influenced my illustration of choice:
A node representing organizational data may be included in a knowledge graph. These may be referred to herein as entity type nodes. As used herein, an entity type node may refer to a node in a knowledge graph, while an entity type may refer to the concept represented by an entity type node. An entity type may be a defining characteristic of an entity. For example, entity type node Y may be connected to an entity node X by an “Is A” edge or link, discussed further below, such that the graph represents the information “The Entity X Is Type Y.” For example, the entity node “George Washington” may be connected to the entity type node “President.” An entity node may be connected to multiple entity type nodes, for example, “George Washington” may also be connected to entity type node “Person” and to entity type node “Military Commander.”
I recommend reading through this patent and trying to understand it. It might help to relate how a search engine might capture and return queries about properties and their different aspects. This is a much more semantic-based search, considering information about things and how they might be related to one another. It isn’t about matching strings of text from queries to documents. Rather, focusing on knowing about entities, their types, their properties, and how related they may be to other entities.
I would like to see a movie about a young Washington surveying the hills of Virginia. I asked in Google, “is there a movie about George Washington as a surveyor?” It appears that an animated feature started by covering those days: General George Washington
Will We are using Related Entity Scores and Property Scores When We Create Content in the Future?
Will Related Entity Scores and property scores be important things to consider in the future? The patent provides hints in how the search engines might use them like this:
Step 606 includes one or more processors generating a property score for each property based on the related entity scores associated with the property. Related entity scores associated with each particular property may be combined for that property. For example, referencing FIG. 4, the related entity scores for related entities “Forrest Gump,” “Big,” and “Saving Private Ryan” may be summed to give a sum for the property “Movies acted in,” e.g., 0.8+0.8+0.8=2.4. In a further example, the related entity scores may be combined as a weighted sum. Any suitable combination of related entity scores may be used to generate the property score. In some implementations, one or more types may be a subtype of another entity type. For example, referencing data structure 550 of FIG. 5, the type “Actor” may be a subtype of the entity type “Person,” which may be referred to as a parent type relative to the subtype. In some such implementations, for the parent type, the property score for each property of each subtype may be summed with the same property of the parent type. For example, referencing data structure 550 of FIG. 5, property “Movies acted in” is included in type “Actor” and “Person,” and accordingly, the property score of 9.0 for the entity type “Actor” may be aggregated to the property score 1.0 for the entity type “Person.” The one or more processors may renormalize, scale, weight, or otherwise alter the scores within the parent type after incorporating the subtype.
That may make a little more sense after reading this. Google has been using its knowledge graph to answer search queries with search results for at least 3 years. They are getting more sophisticated about it too. It’s likely to continue evolving as Google tries new things, and experiments more with how they display search results.