Place Names in Google’s Knowledge Graph
Before Google had a Knowledge Graph, it had built a Fact Repository. Google had filed a patent for their Browseable Fact Repository in 2006, and I wrote about that patent in a post titled, Google’s Browseable Fact Repository – an Early Knowledge Graph. One of the co-inventors of the Fact Repository was Andrew W. Hogue, who was in charge of a project at Google referred to as the Annotation Framework, and also was involved in the acquisition by Google of Meta Web, which had built the knowledge base known as Freebase, and which lead to the creation of the Knowledge Graph at Google. When you see Fact Repository referenced at Google, think Knowledge Base.
One of the important types of facts that appear in a collection of documents such as the Web are place names.
Google had been granted a patent that was originally filed in 2007 by the name of Determining geographic locations for place names in a fact repository.
As I recently wrote in a post Related Entity Scores in Knowledge Based Searches, Google can use information about properties of entities to provide search results. So, when Google extracts data for a knowledge base, such as a place name, that is an important fact that it can be helpful to get correct. The knowledge base is richer and more useful for having that kind of information correct.
An Updated Place Names Patent from Google
Google was granted a continuation version of their patent about place names and fact repositories this week. A continuation patent is an updated patent that uses the original filing date of the patent being continued, and usually contains the same of extremely similar description text and images, but has updated claims. If we look at the original patent filed in 2007, and the continuation patent which was filed in 2012, we can see that the claims have been changed from the earlier patent to this newer and recently granted patent.
One of the concerns that seem to have played an important role in the first version of the patent was in getting facts about place names correct. The patent tells us about this concern:
Place names extracted from different sources have a variety of formats and may contain typographical errors, omissions, or unclear language. There may also be ambiguity as to whether a word represents a place name and whether different place names represent the same location. It is useful to have a way to identify the precise location of a place name.
The claims from the first version of the patent reminded me of a post that I wrote called How Google was Corroborating Facts for Direct Answers.
We see lines like these which have been removed from the claims in the second version of the patent:
2. The method of claim 1, wherein the identifying a first potential place name comprises examining sequences of one or more capitalized words.
3. The method of claim 1, wherein the identifying a first potential place name comprises identifying a second potential place name in the value and examining words surrounding the second potential place name.
4. The method of claim 1, wherein the identifying a first potential place name comprises identifying various representations of the same place name.
5. The method of claim 1, wherein the attribute has been determined to correspond to a place name by comparing facts containing the same attribute
I am reminded of the NAP (Name, address, phone number) consistency that is talked about in mentions of a place in local search for Google.
The new version of the patent seems to focus a lot more on tagging place names as they are mentioned with geographical coordinates, such as latitude and longitude (as seen in the illustrations in the patent. If we look at some of the initial claims for the new patent, we see this new focus:
2. The method of claim 1, wherein storing the first geographic location coordinates includes tagging the first potential place name with the first geographic location coordinates.
3. The method of claim 2, wherein tagging includes converting the first potential place name into a hyperlink to a map view.
4. The method of claim 1, wherein disambiguating between the conflicting possible geographic location coordinates includes examining a source document from the one or more source documents for context.
5. The method of claim 1, wherein determining geographic location coordinates for the first potential place name comprises examining a plurality of place names, wherein each of the plurality of place names has been tagged previously with its respective geographic location coordinates.
The new version of the patent is here:
Determining geographic locations for place names in a fact repository
Inventors: David J. Vespe and Andrew Hogue
Assignee: Google LLC
US Patent: 9,892,132
Granted: February 13, 2018
Filed: December 31, 2012
A system and method for tagging place names with geographic location coordinates, the place names associated with a collection of objects in a memory of a computer system. The system and method process a text string within an object stored in memory to identify a first potential place name. The system and method determine whether geographic location coordinates are known for the first potential place name. The system and method identify the first potential place name associated with an object in the memory as a place name. The system and method tag the first identified place name associated with an object in the memory with its geographic location coordinates when the geographic location coordinates for the first identified place name are known. The system and method disambiguate place names when multiple place names are found.
Afterthoughts on Place Names
I had someone ask me in Twitter if patents sometimes became useless and stopped being used by search engines, or if search engines were forced to use inventions that they patented because there was still some time left to them. Patents do expire. The one-click patent that Amazon had acquired has expired, and the original PageRank patent owned by Stanford University (and licensed to Google) has expired. When a process is developed and patented, the process behind the patent may change, and a continuation patent like the one I’ve written about in this post may be filed.
If you keep an eye out for them, continuation patents may provide hints of changes of approaches that a company may be taking. For instance, in this place name patent, the focus appears to be shifting from corraborating facts based upon consistency in spelling and facts mentioned about specific entities to some facts that may not change such as geographic coordinates. So, yes, patents do change as do processes behind them. It is interesting to find a continuation patent and try to understand what may have changed.